python - 爬取人大经济论坛问题
PHP中文网
PHP中文网 2017-04-18 09:45:44
0
3
376

这是搜索页
http://s.pinggu.org/search.ph...

这是请求时候看到的内容

这里有两个问题,
在form data里面有两个数据不知怎么获取。
一个是srchtxt,这个应该是搜索的那个字段,要怎么处理拼接进来。
另外一个是:formhash,这个可以在元素页面可以看到

但是这个也是在post搜索之后才有的,所以就是不知要怎么处理这两个字段。

PHP中文网
PHP中文网

认证0级讲师

reply all(3)
Ty80

The first formhash 的获取,再搜索之前,事先访问一下http://s.pinggu.org/search.php this connection, at this time, the formhash field will be generated. Another field has garbled characters. I guess the reason is that this web page uses gbk encoding, so when you pass the value, you can also change the encoding.

Peter_Zhu

Forformhash, just visit it twice as mentioned above
srchtxt. . . When crawling by yourself, you must write the keywords yourself. . . Why not climb?

Peter_Zhu

srchtxt As the name suggests, you can guess why the search_text search keyword appears "unable to decode" on chrome. It is probably because the url cannot be displayed after being translated into Chinese. This does not affect it. You submit it according to your text when submitting the form. That’s it, the url will be converted automatically

formhash When you get this page, you already have the value of the input. Get this value first, and then construct the form submission url

To summarize:

1. srchtxt = 需要搜索的内容
2. formhash = 页面字段 (在提交时先去get页面这个字段value,构造完整的表单)

   
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template