The poster’s question can be divided into two parts. The first is to find a suitable IPC (inter-process communication) method to achieve the purpose of controlling the script, and the second is to find a suitable web framework to write a useful web UI.
First, there are many methods of IPC, and the signal/env/process above are good methods. I also recommend socket, after all, this is also a very common method. In terms of libraries, I recommend Twisted or ZeroMQ. The former provides a variety of IPC methods, but it is a bit slow to learn. The latter provides a socket-based IPC method, which is very flexible.
For two, Django/bottle/web.py etc., just grab any one and use it.
1. You can send signal http://docs.python.org/2/library/sign...
2. Modify the environment variables through python, and then determine the logo in the crawler
3. Call the script through subprocess and kill the process when shutting down
4. Rewritten with scrapy, it seems to support control through the web. http://doc.scrapy.org/en/latest/topic...
I have tried to write a framework before, which is actually to execute the shell through python's os.popen
Write a tornado API service on each machine that needs to be started by the web control script, and then have a front page. The API can query the specified script (essentially, get the parameters of the script name, and then python executes the shell command - ps aux | grep Script name) and then read the result and return to the front page, and then get the corresponding process ID, and then pass the process ID parameter to perform the kill operation. You can also pass the command parameter, and then tornado processes it.
It is recommended to control through variables. The variables can be internal to the crawler, or they can be a mark in a file or db. The crawler runs round by round, and just reads this variable each time.
The poster’s question can be divided into two parts. The first is to find a suitable IPC (inter-process communication) method to achieve the purpose of controlling the script, and the second is to find a suitable web framework to write a useful web UI.
First, there are many methods of IPC, and the signal/env/process above are good methods. I also recommend socket, after all, this is also a very common method. In terms of libraries, I recommend
Twisted
orZeroMQ
. The former provides a variety of IPC methods, but it is a bit slow to learn. The latter provides a socket-based IPC method, which is very flexible.For two,
Django
/bottle
/web.py
etc., just grab any one and use it.1. You can send signal
http://docs.python.org/2/library/sign...
2. Modify the environment variables through python, and then determine the logo in the crawler
3. Call the script through subprocess and kill the process when shutting down
4. Rewritten with scrapy, it seems to support control through the web.
http://doc.scrapy.org/en/latest/topic...
I have tried to write a framework before, which is actually to execute the shell through python's os.popen
Write a tornado API service on each machine that needs to be started by the web control script, and then have a front page. The API can query the specified script (essentially, get the parameters of the script name, and then python executes the shell command - ps aux | grep Script name) and then read the result and return to the front page, and then get the corresponding process ID, and then pass the process ID parameter to perform the kill operation. You can also pass the command parameter, and then tornado processes it.
This is lazy, but not safe.
The laziest way is to write a cgi. The content of cgi is to execute a bash script. Cgi can be written in perl python php
It is recommended to control through variables. The variables can be internal to the crawler, or they can be a mark in a file or db. The crawler runs round by round, and just reads this variable each time.