I want to crawl Jianshu’s posts in a certain period of time, such as April 13, 2013 - May 13, 2013
The ideas I want to try are as follows:
Baidu
Using Baidu’s site syntax
Limited date
Observed about 70 posts
google’s site syntax
Limited date
Observed about 120 posts
Implementation: Use Python to directly request the search results, then redirect the obtained URL to get the real short book URL, and then request the real URL
question
Are the results obtained using this method reliable? Is there a more reliable method?
Should I use Google or Baidu?
In order to be comprehensive, you can use all mainstream search engine interfaces, and it does not have to be limited to a certain search engine interface. This is how some of our teammates search for certain topics, because some websites do not provide a search method that meets the needs. At this time, we can only use search engines. However, the information searched through search engines may not be comprehensive. The robots agreement stipulates that if you cannot search, the search engine will not include it