Home > Article > Backend Development > Problems with python multi-threaded task distribution
I now want to crawl a website content with multiple threads. Assume that the website content has 105 pages, but due to machine limitations, only ten threads can be enabled for crawling. So how do I make the first thread responsible for crawling pages 1-10? The two threads capture pages 11-20 and so on, until the tenth thread is responsible for grabbing pages 91-105. How should this idea be written into python code?
I now want to crawl a website content with multiple threads. Assume that the website content has 105 pages, but due to machine limitations, only ten threads can be enabled for crawling. So how do I make the first thread responsible for crawling pages 1-10? The two threads capture pages 11-20 and so on, until the tenth thread is responsible for grabbing pages 91-105. How should this idea be written into python code?
python3
import urllib
import queue
import threading
def download(queue,lck):
"""
工作者,当队列中没有任务的时候就执行退出。
"""
while not queue.empty():
pg = queue.get()
#在此写 抓取网页的代码
#然后把抓到的内容写入文件
lck.acquire()
print ('第 %d 页已完成'%pg)
lck.release()
queue.task_done()
def main():
"""
主线程,
"""
print ('开始下载……')
lck = threading.Lock()
q = queue.Queue()
for pg in range(1,106): #网站内容有105页
q.put(pg)
for i in range(10):#十个线程
t = threading.Thread(target=download, args=(q,lck))
t.start()
q.join() # 等待所以任务完成
print ('结束')
if __name__ == '__main__':
main()