A novice would like to ask for advice on how to write a dict loop into a csv file in python3 (problems encountered when crawling)?
我想大声告诉你
我想大声告诉你 2017-05-18 10:49:20
0
3
1306

After the crawler generated the dict, I wanted to write it into a csv file, but an error occurred.
Use jupyter notebook and window environment.

The specific code is as follows

import requests from multiprocessing.dummy import Pool as ThreadPool from lxml import etree import sys import time import random import csv def spider(url): header={ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' } timeout=random.choice(range(31,50)) html = requests.get(url,header,timeout=timeout) time.sleep(random.choice(range(8,16))) selector = etree.HTML(html.text) content_field = selector.xpath('//*[@class="inner"]/p[3]/p[2]/ul/li') item ={} for each in content_field: g = each.xpath('a/p[1]/p[1]/h3/span/text()') go = each.xpath('a/p[1]/p[2]/p/h3/text()') h = each.xpath('a/p[1]/p[2]/p/p/text()[1]') j= each.xpath('a/p[1]/p[1]/p/text()[2]') ge = each.xpath('a/p[1]/p[2]/p/p/text()[3]') x = each.xpath('a/p[1]/p[1]/p/text()[3]') city = each.xpath('a/p[1]/p[1]/p/text()[1]') gg = each.xpath('a/p[2]/span/text()') item['city']="".join(city) item['hangye']="".join(hangye) item['guimo']="".join(guimo) item['gongsi']="".join(gongsi) item['gongzi']="".join(gongzi) item['jingyan']="".join(jingyan) item['xueli']="".join(xueli) item['gongzuoneirong']="".join(gongzuoneirong) fieldnames =['city','hangye','guimo','gongsi','gongzi','jingyan','xueli','gongzuoneirong'] with open('bj.csv','a',newline='',errors='ignore')as f: f_csv=csv.DictWriter(f,fieldnames=fieldnames) f_csv.writeheader() f_csv.writerow(item) if __name__ == '__main__': pool = ThreadPool(4) f=open('bj.csv','w') page = [] for i in range(1,100): newpage = 'https://www.zhipin.com/c101010100/h_101010100/?query=%E6%95%B0%E6%8D%AE%E8%BF%90%E8%90%A5&page='+str(i) + '&ka=page-' + str(i) page.append(newpage) results = pool.map(spider,page) pool.close() pool.join() f.close()

Run the above code and the error message is

ValueError: too many values to unpack (expected 2)
The reason for querying is to traverse the dict, which requires the form of dict.items(). But how to implement it in the above code has not been straightened out. Please give me some advice

我想大声告诉你
我想大声告诉你

reply all (3)
習慣沉默

Sorry, I only have time to answer your question now. I saw that you changed the code according to my suggestion. I will post the changed code below. I have run it and there is no problem.

import requests from multiprocessing.dummy import Pool from lxml import etree import time import random import csv def spider(url): header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' } timeout = random.choice(range(31, 50)) html = requests.get(url, headers=header, timeout=timeout) time.sleep(random.choice(range(8, 16))) selector = etree.HTML(html.text) content_field = selector.xpath('//*[@class="inner"]/p[3]/p[2]/ul/li') item = {} for each in content_field: g = each.xpath('a/p[1]/p[1]/h3/span/text()') go = each.xpath('a/p[1]/p[2]/p/h3/text()') h = each.xpath('a/p[1]/p[2]/p/p/text()[1]') j = each.xpath('a/p[1]/p[1]/p/text()[2]') ge = each.xpath('a/p[1]/p[2]/p/p/text()[3]') x = each.xpath('a/p[1]/p[1]/p/text()[3]') city = each.xpath('a/p[1]/p[1]/p/text()[1]') gg = each.xpath('a/p[2]/span/text()') item['city'] = "".join(city) item['hangye'] = "".join(g) item['guimo'] = "".join(go) item['gongsi'] = "".join(h) item['gongzi'] = "".join(j) item['jingyan'] = "".join(ge) item['xueli'] = "".join(x) item['gongzuoneirong'] = "".join(gg) fieldnames = ['city', 'hangye', 'guimo', 'gongsi', 'gongzi', 'jingyan', 'xueli', 'gongzuoneirong'] with open('bj.csv', 'a', newline='', errors='ignore')as f: f_csv = csv.DictWriter(f, fieldnames=fieldnames) f_csv.writeheader() f_csv.writerow(item) if __name__ == '__main__': f = open('bj.csv', 'w') page = [] for i in range(1, 100): newpage = 'https://www.zhipin.com/c101010100/h_101010100/?query=%E6%95%B0%E6%8D%AE%E8%BF%90%E8%90%A5&page=' + str( i) + '&ka=page-' + str(i) page.append(newpage) print(page) pool = Pool(4) results = pool.map(spider, page) pool.close() pool.join() f.close()

The main thing here isheader, you are theset类型,我修改后是dicttype

I still need some advice for you here

  1. Do you run your code in an IDE or a text editor? Some things will obviously report errors under the IDE

  2. It is recommended that novices abide by the PEP8 specifications from the beginning of learning. Don't develop bad habits. Take a look at your naming

    过去多啦不再A梦
    item = {'a':1, 'b':2} fieldnames = ['a', 'b'] with open('test.csv', 'a') as f: f_csv = DictWriter(f, fieldnames=fieldnames) f_csv.writeheader() f_csv.writerow(item)

    I didn’t get an error when I wrote like this

    Writerrow just receives the dict directly. I think your problem is because the key of the item does not correspond to the header of your table

      漂亮男人

      Because some column names specified in fields do not exist in item

        Latest Downloads
        More>
        Web Effects
        Website Source Code
        Website Materials
        Front End Template
        About us Disclaimer Sitemap
        php.cn:Public welfare online PHP training,Help PHP learners grow quickly!