我想爬电影票房的数据,网站是http://www.cbooo.cn/movieweek,我要爬网页最下面的【票房日期:2016-11-14至2016-11-20 单周票房:57271万 单周场次:1463995场 单周人次:1781万】这些数据,代码如下:
from bs4 import BeautifulSoup import urllib.request z = input("请输入网址:") a = urllib.request.urlopen(z).read() b = BeautifulSoup(a,"html.parser") c = b.select("#content > p.alldate") for i in c: print(i.get_text())
输出结果是
票房日期:
单月票房:万
单月场次:万场
单月人次:万
关键的数据没有啊,这是怎么回事呢,我最想要的是那些数据,怎么弄也没有,跪求解决办法
谢谢
谢谢
谢谢
Because the data you need is dynamically generated by ajax and cannot be found in the html source code, so you need to be able to dynamically load js tools. You can use this
selenium+PhantomJS
to execute js content, but this is relatively slow. .But for the website you need to crawl, use the browser to capture the packet and find that the ajax request path is
So you can make a request directly,
No need to use phantomJS above. It is found that the returned json string contains the data you need, and the data you need is in data2 at the end.