What is the concept of thread pool?
In object-oriented programming, creating and destroying objects is very time-consuming, because creating an object requires obtaining memory resources or other more resources. This is especially true in Java, where the virtual machine will try to track every object so that it can be garbage collected after the object is destroyed. Therefore, one way to improve the efficiency of service programs is to reduce the number of object creation and destruction as much as possible, especially the creation and destruction of some resource-intensive objects. How to use existing objects to serve is a key issue that needs to be solved. In fact, this is the reason for the emergence of some "pooled resource" technologies.
I understand that the thread pool is a unit that stores many threads, and there is also a corresponding task queue. The entire execution process is actually to use the limited threads in the thread pool to complete the tasks in the task queue. The advantage of this is that you don't need to create a thread for each task, because when you create the 100th thread to perform the 100th task, there may be 50 previous threads that have finished their work. Therefore, threads are reused to perform tasks and reduce the overhead of system resources.
An inappropriate metaphor is that there are 100 computer mainframes that need to be moved from the first floor to the second floor. You don’t need to call 100 people to help move them. You only need to call ten or twenty people. Everyone Allocate ten or five or even whoever moves faster will move more to know the completion is unknown. (This metaphor seems...)
Anyway, I generally understand the concept of thread pool. So how to implement it using python?
The code is as follows
# !/usr/bin/env python # -*- coding:utf-8 -*- # ref_blog:http://www.open-open.com/home/space-5679-do-blog-id-3247.html import Queue import threading import time class WorkManager(object): def __init__(self, work_num=1000,thread_num=2): self.work_queue = Queue.Queue() self.threads = [] self.__init_work_queue(work_num) self.__init_thread_pool(thread_num) """ 初始化线程 """ def __init_thread_pool(self,thread_num): for i in range(thread_num): self.threads.append(Work(self.work_queue)) """ 初始化工作队列 """ def __init_work_queue(self, jobs_num): for i in range(jobs_num): self.add_job(do_job, i) """ 添加一项工作入队 """ def add_job(self, func, *args): self.work_queue.put((func, list(args)))#任务入队,Queue内部实现了同步机制 """ 检查剩余队列任务 """ def check_queue(self): return self.work_queue.qsize() """ 等待所有线程运行完毕 """ def wait_allcomplete(self): for item in self.threads: if item.isAlive():item.join() class Work(threading.Thread): def __init__(self, work_queue): threading.Thread.__init__(self) self.work_queue = work_queue self.start() def run(self): #死循环,从而让创建的线程在一定条件下关闭退出 while True: try: do, args = self.work_queue.get(block=False)#任务异步出队,Queue内部实现了同步机制 do(args) self.work_queue.task_done()#通知系统任务完成 except Exception,e: print str(e) break #具体要做的任务 def do_job(args): print args time.sleep(0.1)#模拟处理时间 print threading.current_thread(), list(args) if __name__ == '__main__': start = time.time() work_manager = WorkManager(10, 2)#或者work_manager = WorkManager(10000, 20) work_manager.wait_allcomplete() end = time.time() print "cost all time: %s" % (end-start)
This code is clear and easy to understand.
There are only two classes in the entire code: WorkManager and Work. The former is indeed a manager as the name indicates, managing the thread pool and task queue, while the latter is a specific thread.
Its entire operating logic is to allocate the specified number of tasks and threads to WorkManager, and then each thread obtains tasks from the task queue to execute until there are no tasks in the queue. The internal synchronization mechanism of Queue is also used here (as for the synchronization mechanism, I have not yet studied it).
To summarize the role of such a thread pool, for my original purpose, this thing is actually never available, because I need to control the starting and stopping of threads on the web page, and this thread pool seems to be only used for concurrent completion. task. However, I think that although it has no effect in controlling threads, its role in executing tasks concurrently is quite good, and it may be used in crawling web pages.