python写了个爬虫脚本怎么通过web 方式控制开始暂停呢
阿神
阿神 2017-04-17 11:21:40
0
5
525

我用python写了个爬虫脚本,怎么通过web 方式控制开始,暂停呢?

阿神
阿神

闭关修行中......

reply all(5)
洪涛

The poster’s question can be divided into two parts. The first is to find a suitable IPC (inter-process communication) method to achieve the purpose of controlling the script, and the second is to find a suitable web framework to write a useful web UI.

First, there are many methods of IPC, and the signal/env/process above are good methods. I also recommend socket, after all, this is also a very common method. In terms of libraries, I recommend Twisted or ZeroMQ. The former provides a variety of IPC methods, but it is a bit slow to learn. The latter provides a socket-based IPC method, which is very flexible.
For two, Django/bottle/web.py etc., just grab any one and use it.

左手右手慢动作

1. You can send signal
http://docs.python.org/2/library/sign...
2. Modify the environment variables through python, and then determine the logo in the crawler
3. Call the script through subprocess and kill the process when shutting down
4. Rewritten with scrapy, it seems to support control through the web.
http://doc.scrapy.org/en/latest/topic...

Ty80

I have tried to write a framework before, which is actually to execute the shell through python's os.popen
Write a tornado API service on each machine that needs to be started by the web control script, and then have a front page. The API can query the specified script (essentially, get the parameters of the script name, and then python executes the shell command - ps aux | grep Script name) and then read the result and return to the front page, and then get the corresponding process ID, and then pass the process ID parameter to perform the kill operation. You can also pass the command parameter, and then tornado processes it.

This is lazy, but not safe.

Peter_Zhu

The laziest way is to write a cgi. The content of cgi is to execute a bash script. Cgi can be written in perl python php

左手右手慢动作

It is recommended to control through variables. The variables can be internal to the crawler, or they can be a mark in a file or db. The crawler runs round by round, and just reads this variable each time.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template