python - 请教这个页面中的这两个信息能否不用无头浏览器爬取到?

Question

在爬取"http://www.haodf.com/doctor/DE4r08xQdKSLBVM8i9sHYQ8uQGIO.htm"这个页面的时候, 发现"擅长"和"执业经历"这两个信息通过beautifulsoup是取不到的, 我选取这两个信息的代码如下: {代码...} 查询页面发现这...

PHP中文网 · Answer

Maybe on this page, the data you want to capture is rendered using js after the page is loaded. In other words, the data in this #full_DoctorSpecialize
is ajax, retrieved from the server. Specifically how to get such data, you can download phantomjs from Baidu, and you will definitely gain something.

PHP中文网 · Answer

These two pieces of information can be obtained directly, but the information is included in the JS block BigPipe.onPageletArrive({这个里面}) , 可以通过正则表达式获取。这个里面 is a string in JSON format. After matching, it is easy to convert to json. If you want to obtain it through the query interface, you should It's possible, but you have to analyze the JS code, which is too troublesome. You can use a packet capture tool to capture the http request, and then look at the data returned by the request. In comparison, it is faster to write a regular match.

怪我咯 · Answer

This is like the one mentioned above that is rendered by js. The content is in the js code. You can regularly match the elements in the js code to get the information you want