First of all, there is absolutely no need to write this requirement in this waycsv这个模块来做, csv默认以半角逗号分隔不同的列, 但是如果单列内容有半角逗号的话, excel读取就有点尴尬. 我建议用TAB来做分隔符(定界符), 然后直接用with open(...) as fh
In addition, there are two small problems with your code:
Function
In fact, you only need to call it once, there is no need to call it twiceget_data
There is an extra slash in the url
/
# -*- coding:utf-8 -*-
import requests
from bs4 import BeautifulSoup
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
URL = 'http://finance.qq.com'
def get_data(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'lxml')
soup = soup.find('p', {'id': 'listZone'}).findAll('a')
return soup
def main():
with open("hello.tsv", "w") as fh:
fh.write("url\ttitile\n")
for item in get_data(URL + "/gdyw.htm"):
fh.write("{}\t{}\n".format(URL + item.get("href"), item.get_text()))
if __name__ == "__main__":
main()
First of all, there is absolutely no need to write this requirement in this way
In addition, there are two small problems with your code:csv
这个模块来做,csv
默认以半角逗号分隔不同的列, 但是如果单列内容有半角逗号的话,excel
读取就有点尴尬. 我建议用TAB
来做分隔符(定界符), 然后直接用with open(...) as fh
In fact, you only need to call it once, there is no need to call it twice
get_data
/
Because you wrote csvrow1 first and then csvrow2, which resulted in this result. You should traverse csvrow1 and 2 at the same time, like this: