Web crawler - python crawls websites and parses non-json content
学习ing
学习ing 2017-06-28 09:26:28
0
2
1017

I just learned to get json content, but the website I crawled today does not return json content, and a random number is generated after each request link

I don’t know if it will affect the content I want to crawl

The content that needs to be obtained is the content in the middle of the picture below


Website link http://www.szse.cn/main/discl...

Code I tried myself:

import requests dir = '/Users/S1Lence/Desktop/new_html/szse/许可类重组问询函' headers = {'Host': 'www.szse.cn', 'Referer': 'http://www.szse.cn/main/disclosure/jgxxgk/wxhj/', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.109 Safari/537.36' } payload= {'ACTIONID': '7', 'AJAX': 'AJAX-TRUE', 'CATALOGID': 'main_wxhj', 'TABKEY': 'tab1', 'selecthjlb': '许可类重组问询函', 'tab1PAGENO': '1', 'tab1PAGECOUNT': '7', 'tab1RECORDCOUNT': '63', 'REPORT_ACTION': 'navigate'} res = requests.post('http://www.szse.cn/szseWeb/FrontControllere', data=payload) print(res.text)

The output content is not what I want. How to climb

学习ing
学习ing

reply all (2)
黄舟

Copy his header information and use it. .

    漂亮男人

    The url address of your post is wrong, it should be

    http://www.szse.cn/szseWeb/FrontController.szse
      Latest Downloads
      More>
      Web Effects
      Website Source Code
      Website Materials
      Front End Template
      About us Disclaimer Sitemap
      php.cn:Public welfare online PHP training,Help PHP learners grow quickly!