如图,爬取途中的时间部分,网址在此:http://sh.huodongxing.com/event/6313289154400?utm_source=%E5%8F%91%E7%8E%B0%E6%B4%BB%E5%8A%A8%E5%88%97%E8%A1%A8%E9%A1%B5&utm_medium=&utm_campaign=eventspage
我用的是scrapy的selector(基于lxml),
用的xpath语句是://p[1]/p/p[1]/text())[7]
在火狐的xpath checker上测试都可以定位到时间部分,但在爬取时都是\R\N等空字符,后来看到网友的办法:
sel = Selector(response, response.body_as_unicode().replace('\r','').replace('\n',''), 'html')
尝试了下,依然有问题(只是把\r\n换成了空格),想请问到底是在哪里出了问题呢
Use chrome dev to see the final page effect.
You try to view the source code and look for it. Because this html may be processed by javascript.
If you confirm that it is in the source code, you can: right click --> Copy --> Copy XPath