想写一个小程式自动下载网页 http://www.sse.com.cn/assortm... 里面的下载链接 http://query.sse.com.cn/secur...
用urllib提示403,于是加了user-agent返回200,但之后使用urlretrieve就提示正则匹配错误,网上没找到答案,请问大家要怎么解决这个问题?
代码如下:
from urllib import request
from datetime import datetime
url = 'http://query.sse.com.cn/secur...'
user_agent = 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Mobile Safari/537.36'
myheaders = {'User - Agent': user_agent}
req = request.Request(url, headers=myheaders)
local = "/Users/Mty/Downloads/s_data/" + str(datetime.now().date()) + " .xls"
request.urlretrieve(req, local)
报错:
Traceback (most recent call last):
File "/Users/Mty/PycharmProjects/get_data/date.py", line 20, in <module>
request.urlretrieve(req, local)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 186, in urlretrieve
url_type, path = splittype(url)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/parse.py", line 861, in splittype
match = _typeprog.match(url)
TypeError: expected string or bytes-like object
使用request.build_opener 添加head可解决