python报错Max retries exceeded with url 的问题，tryCatch反复抓取内容方法-网页编程网

当前位置：主页 >> 爬虫技术 >> 正文

python报错Max retries exceeded with url 的问题，tryCatch反复抓取内容方法

阅读：4939 输入：2021-07-16 19:23:02

问题

在使用requests多次访问同一个ip时，尤其是在高频率访问下，很容易出现Max retries exceeded with url的错误。

分析

一是要及时close()关闭，再是用tryCatch来反复抓取。循环继续访问并用sleep控制访问频率。

def get(url):
	try:
		res = requests.get(url)
		# 如果响应状态码不是 200，就主动抛出异常
		res.raise_for_status()
		# 关闭连接 ！！！--非常重要
		res.close()
	except Exception as e:
		logger.error(e)
	else:
		return res.json()

别人写的抓取知乎文章。也用此法：

html=""
    while html == "":#因为请求可能被知乎拒绝，采用循环+sleep的方式重复发送，但保持频率不太高
        try:
            proxies = get_random_ip(ipList)
            print("这次试用ip：{}".format(proxies))
            r = requests.request("GET", url, headers=headers, params=querystring, proxies=proxies)
            r.encoding = 'utf-8'
            html = r.text
            return html
        except:
            print("Connection refused by the server..")
            print("Let me sleep for 5 seconds")
            print("ZZzzzz...")
            sleep(5)
            print("Was a nice sleep, now let me continue...")
            continue

综合

如何实现，整个抓取过程不断呢？就是python3报错了，只是跳出本次循环，仍会继续运行呢？

if __name__ == '__main__':
    for i in range(80):
        url = 'https://www.xxxx.com/qiye/{}.htm'.format(1000-i)
        try:
            getData(url)#此处写抓取函数，出错误也不怕
        except:
            print(url+"Let me sleep for 5 seconds")
            print("ZZzzzz...")
            time.sleep(5)
            print("Was a nice sleep, now let me continue...")
            continue

上一篇：python函数re模块中findall()匹配换行所有格式
下一篇：python读csv画图matplotlib，三条折线比较

相关阅读: python应用正则爬铃声网站并实现下载; python3应用selenium从天眼查批量采集公司邮箱; python爬斗鱼主播信息（读json范例）