84669 person learning
152542 person learning
20005 person learning
5487 person learning
7821 person learning
359900 person learning
3350 person learning
180660 person learning
48569 person learning
18603 person learning
40936 person learning
1549 person learning
1183 person learning
32909 person learning
我在做一只淘宝的爬虫,但是用的是香港的服务器,但是比较困惑:因为每次爬淘宝的首页时候,就自动给我跳转到香港淘宝~~导致源代码和内容都不一样~请问如果遇到这种情况要怎么处理呢?
简单来说,比如采集58同城如果我是泉州的,我想采集北京的,要怎么采集?
因为我用我的ip打开会总跳转到北京,但是直接想采集58首页的
?![图片上传中...]
ringa_lee
Disable redirection, take requests as an example:
r = requests.get('http://github.com/', allow_redirects=False) r.status_code # 302 r.url # http://github.com, not https. r.headers['Location'] # https://github.com/ -- the redirect destination
If you want to collect from Beijing, just enter the city name, but it is protected by PGTID
http://bj.58.com/?PGTID=0d000...
Jianyi uses selenium
Sometimes the server will redirect based on the geographical location information corresponding to your IP. You should have no other way except to find a proxy. .
Disable redirection, take requests as an example:
If you want to collect from Beijing, just enter the city name, but it is protected by PGTID
http://bj.58.com/?PGTID=0d000...
Jianyi uses selenium
Sometimes the server will redirect based on the geographical location information corresponding to your IP. You should have no other way except to find a proxy. .