java - python只要占用内存达到1.9G之后httplib模块就开始报内存溢出错误
大家讲道理
大家讲道理 2017-04-18 10:03:15
0
1
407

RT,我最近写一个爬虫,需要从一个网站抓取大量数据,先用开启十个线程使用httplib模块从一个list页面中获取大概一百多万条id,塞入一个queue队列中,然后再开十个线程使用httplib模块从刚才那个queue队列中取出id,通过这些id再去一个view页面把正文内容抓取出来写入mongo数据库,但是现在发现只要python进程内存占用达到1.9G之后httplib模块就开始报内存溢出错误,请问这是什么原因?(我现在初步猜测是queue中存了几百万的id数据导致内存占用过多,httplib模块申请不到内存了)有什么办法可以解决吗?(因为访问那个list页面需要带上cookie,所以我才用httplib模块,如果有其他模块可以带上cookie发起请求的话也麻烦告诉我一下),求高人指点,感激不尽

大家讲道理
大家讲道理

光阴似箭催人老,日月如移越少年。

reply all(1)
小葫芦

Your python is a 32-bit process. The memory address space of the 32-bit process is 4GB, of which only 2GB is used by the user process, and the other 2GB is reserved for the kernel. Switching to 64-bit Python can alleviate this problem. But the best way is to limit the number of processes you can open, and limit the number of threads in each process to not too many. In fact, the best performance is when the total number of threads reaches about twice the CPU core, and more is not always better.
In addition, try to save memory usage, do not read everything into the memory, reuse large variables as much as possible, and do not copy too much.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template