Home  >  Article  >  Backend Development  >  Problems with python multi-threaded task distribution

Problems with python multi-threaded task distribution

WBOY
WBOYOriginal
2016-12-01 00:56:561103browse

I now want to crawl a website content with multiple threads. Assume that the website content has 105 pages, but due to machine limitations, only ten threads can be enabled for crawling. So how do I make the first thread responsible for crawling pages 1-10? The two threads capture pages 11-20 and so on, until the tenth thread is responsible for grabbing pages 91-105. How should this idea be written into python code?

Reply content:

I now want to crawl a website content with multiple threads. Assume that the website content has 105 pages, but due to machine limitations, only ten threads can be enabled for crawling. So how do I make the first thread responsible for crawling pages 1-10? The two threads capture pages 11-20 and so on, until the tenth thread is responsible for grabbing pages 91-105. How should this idea be written into python code?

python3


import urllib
import queue
import threading

def download(queue,lck):
    """
    工作者,当队列中没有任务的时候就执行退出。
    """
    while not queue.empty():
        pg = queue.get()
        
        #在此写 抓取网页的代码
        #然后把抓到的内容写入文件
        
        lck.acquire()
        print ('第 %d 页已完成'%pg) 
        lck.release()
        queue.task_done()


def main():
    """
    主线程, 
    """
    print ('开始下载……')
    lck = threading.Lock()
    q = queue.Queue()
    for pg in range(1,106): #网站内容有105页
        q.put(pg)
        
    for i in range(10):#十个线程
        t = threading.Thread(target=download, args=(q,lck))
        t.start()
    q.join()       # 等待所以任务完成
    print ('结束')

if __name__ == '__main__':
    main()
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn