I am currently learning some knowledge about crawlers and using selenium to crawl some complex websites.
I encountered a problem. The work order website I need to crawl (I don’t know the password) needs to log in to an authentication system first, and then click on the work order system connection on the authentication system page, and it will automatically jump without logging in. Go to the work order system website. How should I use a crawler to crawl the data of this system?
The following is the html obtained by the authentication system selenium about the work order system
<a href="/link-test001" target="_blank" title="工单系统" rel="link-test001" data="1" datasrc="工单系统|||/files/link/test001.gif|||new|||/link-test001">
<img src="/files/link/test001.gif" width="25" height="25" alt="工单系统" align="absmiddle"><span>工单系统</span>
</a>
Use selenium ide, a firefox extension, to record the operation.
Then export to python file.
Just change it and run it.
I suggest you read the book written by the insect master.
For example, if you use the requests library as a crawler, create session() first, A logs in, and B is the page to jump to.
The created T represents the stored cookie, which will be retained forever