Home  >  Article  >  Backend Development  >  How to get web content in python

How to get web content in python

(*-*)浩
(*-*)浩Original
2019-06-28 11:36:2514921browse

Python is quite good for data processing. If you want to do a crawler, Python is a good choice. It has many pre-written class packages that can complete many complex functions as long as they are called.

How to get web content in python

1 Pyhton gets the content of the web page (that is, the source code) (recommended learning: Python video tutorial)

page = urllib2.urlopen(url)   
contents = page.read()   
#获得了整个网页的内容也就是源代码  
print(contents)

url represents the URL, contents represents the source code corresponding to the URL, urllib2 is the package that needs to be used, the above three lines of code can get the entire source code of the web page

2 Obtain the desired content in the webpage (first obtain the webpage source code, then analyze the webpage source code, find the corresponding tag, and then extract the content in the tag)

Take Douban movie ranking as an example

Now I need to get the names, ratings, number of reviews, and links of all movies on the current page

#coding:utf-8  
''''' 
@author: jsjxy 
'''  
import urllib2   
import re   
from bs4 import BeautifulSoup  
from distutils.filelist import findall  

page = urllib2.urlopen('http://movie.douban.com/top250?format=text')   
contents = page.read()   
 #print(contents)  
soup = BeautifulSoup(contents,"html.parser")  
print("豆瓣电影TOP250" + "\n" +" 影片名              评分       评价人数     链接 ")    
for tag in soup.find_all('div', class_='info'):    
   # print tag  
    m_name = tag.find('span', class_='title').get_text()        
    m_rating_score = float(tag.find('span',class_='rating_num').get_text())          
    m_people = tag.find('div',class_="star")  
    m_span = m_people.findAll('span')  
    m_peoplecount = m_span[3].contents[0]  
    m_url=tag.find('a').get('href')  
    print( m_name+"        "  +  str(m_rating_score)   + "           " + m_peoplecount + "    " + m_url )

Console output, you can also write it to a file

More Python related technologies Article, please visit the Python Tutorial column to learn!

The above is the detailed content of How to get web content in python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn