84669 person learning
152542 person learning
20005 person learning
5487 person learning
7821 person learning
359900 person learning
3350 person learning
180660 person learning
48569 person learning
18603 person learning
40936 person learning
1549 person learning
1183 person learning
32909 person learning
我爬虫的目标网址是http://jobs.monster.com/search/software_5想要保存这个网站上每一条工作的标题、链接、公司和发布时间
我自己检查的时候用sites = hxs.select('//p')获取所有的p结果发现本来只能得到一条工作的信息
例如:每个工作里都该有个p class=jobTitle,但是在数据里只能找到一个这样的p
这个网站刚刚改版,之前都还能顺利获取数据,请有经验的大神指点我一个解决方案。
走同样的路,发现不同的人生
Resolved
The data is all in js, and the data in js is obtained directly through response.body and regular expressions. The method is not very good. Students who have the same problem can study Python-webkit.
Resolved
The data is all in js, and the data in js is obtained directly through response.body and regular expressions. The method is not very good. Students who have the same problem can study Python-webkit.