python - 为什么我的Scrapy爬不出数据?
天蓬老师
天蓬老师 2017-04-17 14:29:17
0
3
450

向各位老师请教,我在做一个爬虫,第一步是想爬下来所有股票的代码和名字,网址是http://app.finance.ifeng.com/list/stock.php?t=ha&f=symbol&o=asc&p=1

我的items.py是这样的:

pythonimport scrapy class NameItem(scrapy.Item): code = scrapy.Field() name = scrapy.Field()

我的爬取脚本是这样的:

pythonfrom scrapy.spider import BaseSpider from Stock.items import NameItem from scrapy.selector import Selector from scrapy.http import Request class StockNameSpider(BaseSpider): name = "stock_name" allowed_domains = ["http://app.finance.ifeng.com"] start_urls = ["http://app.finance.ifeng.com/list/stock.php?t=ha"] def parse(self, response): sel = Selector(response) links = sel.xpath('//*[@class= "tab01"]/table/tbody/tr') for link in links: code = link.xpath('td[1]/a/text()').extract() name = link.xpath('td[2]/a/text()').extract() nameitem = NameItem() nameitem['code'] = code[0] if code else None nameitem['name'] = name[0] if name else None yield nameitem

Xpath没有写错,在Shell已经测试过了

运行期间没有报任何错误,

下列是运行log

2015-02-19 20:22:49+0800 [scrapy] INFO: Scrapy 0.24.4 started (bot: Stock)
2015-02-19 20:22:49+0800 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-02-19 20:22:49+0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'Stock.spiders', 'SPIDER_MODULES': ['Stock.spiders'], 'LOG_FILE': 'test.log', 'BOT_NAME': 'Stock'}
2015-02-19 20:22:50+0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-02-19 20:22:51+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-02-19 20:22:51+0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-02-19 20:22:51+0800 [scrapy] INFO: Enabled item pipelines:
2015-02-19 20:22:51+0800 [stock_name] INFO: Spider opened
2015-02-19 20:22:51+0800 [stock_name] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-02-19 20:22:51+0800 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-02-19 20:22:51+0800 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2015-02-19 20:22:51+0800 [stock_name] DEBUG: Crawled (200) < GET http://app.finance.ifeng.com/list/stock.php?t=ha> (referer: None)
2015-02-19 20:22:51+0800 [stock_name] INFO: Closing spider (finished)
2015-02-19 20:22:51+0800 [stock_name] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 239,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 11784,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 2, 19, 12, 22, 51, 897000),
'log_count/DEBUG': 3,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2015, 2, 19, 12, 22, 51, 352000)}
2015-02-19 20:22:51+0800 [stock_name] INFO: Spider closed (finished)


但是结果没有爬取到任何数据。
各位老师,请问是为什么?我是新手,在线等,十分感谢

天蓬老师
天蓬老师

欢迎选择我的课程,让我们一起见证您的进步~~

全部回复 (3)
PHPzhong

唉好吧,原来是用FireBug查出的网页HTML跟直接浏览器右键看到的网页源代码结构上有些不一致

    大家讲道理

    你好,我也遇到这种情况,但是我查看了html没有问题?
    可否问下具体情况?

      大家讲道理

      请问你最后是怎么处理的?

        最新下载
        更多>
        网站特效
        网站源码
        网站素材
        前端模板
        关于我们 免责声明 Sitemap
        PHP中文网:公益在线PHP培训,帮助PHP学习者快速成长!