如果将mutiple.py的第54行改为t.daemon=False,那么所有图片下载完成后,程序会一直卡在这里,不会退出。
$ python mutiple.py
一共下载了 253 张图片
Took 57.710124015808105s
...现在卡死不动了,只能通过kill -9来杀
接下来我用$ pstree -h | grep python,显然主线程和它的子线程现在没有退出,这是为什么呢?因为Queue已经设置了join(),而且print语句也成功打印出来,所以说子线程应该已经完工了呀。
python(6591)-+-{python}(6596)
|-{python}(6597)
|-{python}(6598)
|-{python}(6599)
|-{python}(6600)
|-{python}(6601)
|-{python}(6602)
'-{python}(6603)
mutiple.py的代码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from Queue import Queue
from threading import Thread
from time import time
from itertools import chain
from download import setup_download_dir, get_links, download_link
class DownloadWorker(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
# Get the work from the queue and expand the tuple
item = self.queue.get()
if item is None:
break
directory, link = item
download_link(directory, link)
self.queue.task_done()
def main():
ts = time()
url1 = 'http://www.toutiao.com/a6333981316853907714'
url2 = 'http://www.toutiao.com/a6334459308533350658'
url3 = 'http://www.toutiao.com/a6313664289211924737'
url4 = 'http://www.toutiao.com/a6334337170774458625'
url5 = 'http://www.toutiao.com/a6334486705982996738'
download_dir = setup_download_dir('thread_imgs')
# Create a queue to communicate with the worker threads
queue = Queue()
links = list(chain(
get_links(url1),
get_links(url2),
get_links(url3),
get_links(url4),
get_links(url5),
))
# Create 8 worker threads
for x in range(8):
worker = DownloadWorker(queue)
# Setting daemon to True will let the main thread exit even though the
# workers are blocking
worker.daemon = True
worker.start()
# Put the tasks into the queue as a tuple
for link in links:
queue.put((download_dir, link))
# Causes the main thread to wait for the queue to finish processing all
# the tasks
queue.join()
print u'一共下载了 {} 张图片'.format(len(links))
print u'Took {}s'.format(time() - ts)
if __name__ == '__main__':
main()
"""
一共下载了 253 张图片
Took 57.710124015808105s
"""
download.py的代码
#!/usr/bin/env python
import os
import requests
from pathlib import Path
from bs4 import BeautifulSoup
def get_links(url):
'''
return the links in a list
'''
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
return [img.attrs.get('src') for img in
soup.find_all('p', class_='img-wrap')
if img.attrs.get('src') is not None]
def download_link(directory, link):
'''
download the img by the link and save it
'''
img_name = '{}.jpg'.format(os.path.basename(link))
download_path = directory / img_name
r = requests.get(link)
with download_path.open('wb') as fd:
fd.write(r.content)
def setup_download_dir(directory):
'''
set the dir and create a new dir if not exists
'''
download_dir = Path(directory)
if not download_dir.exists():
download_dir.mkdir()
return download_dir
程序运行中,执行一个主线程,如果主线程又创建一个子线程,主线程和子线程就分兵两路,分别运行,那么当主线程完成想退出时,会检验子线程是否完成。如果子线程未完成,则主线程会等待子线程完成后再退出。但是有时候我们需要的是,只要主线程完成了,不管子线程是否完成,都要和主线程一起退出,这时就可以用setDaemon(True)方法了。
Pemahaman saya ialah:
setdaemon(True) bermaksud utas daemon, iaitu, apabila anda menetapkannya kepada True, apabila utas utama tamat, utas anak terpaksa keluar.
queue.join() akan menyebabkan utas utama menunggu sehingga semua sub-utas selesai sebelum utas utama meneruskan pelaksanaan.
Benang tidak menyediakan fungsi keluar
Untuk meringkaskan tiga perkara di atas, jika setdaemon(Salah) digunakan, utas utama akan menunggu sehingga utas anak keluar. Sangat tersekat