Use a Python web crawler to see what movies are currently playing in theaters-Python Tutorial-php.cn

Use a Python web crawler to see what movies are currently playing in theaters

Release： 2023-07-25 17:21:57

forward

2040 people have browsed it

/1 Foreword/

## Maoyan Movies is a platform for Taobao to jointly create the most comprehensive movie categories. Inform users as soon as possible when the latest movies will be online. Today I will teach you how to get the details of upcoming movies from Maoyan Movies.

Use a Python web crawler to see what movies are currently playing in theaters

#/2 Project Goal/

Get details of upcoming movies from Maoyan Movies.

##/3 Project preparation/ Software:

PyCharm

Required libraries:

requests

, lxml、random、time

Plug-in:Xpath

##The website is as follows:
https://maoyan.com/films?showType=2&offset={}
Copy after login
Click the button on the next page and observe the changes in the website as follows:
https://maoyan.com/films?showType=2&offset=30 https://maoyan.com/films?showType=2&offset=60 https://maoyan.com/films?showType=2&offset=90
Copy after login
When you click the next page, the page offset=() increases by 30 each time, so you can use {} to replace the transformed variable, and then use a for loop to traverse the URL. , to implement multiple URL requests.
#/4 Project Implementation/
1. Define a class class to inherit object, define the init method to inherit self, and the main function main to inherit self. Import the required libraries and URLs, the code is as follows.
import requests from lxml import etree import time import random class MaoyanSpider(object): def __init__(self): self.url = "https://maoyan.com/films?showType=2&offset={}" def main(self): pass if __name__ == '__main__': spider = MaoyanSpider() spider.main()
Copy after login

2、随机产生UserAgent。
for i in range(1, 50): # ua.random,一定要写在这里,每次请求都会随机选择。 self.headers = { 'User-Agent': ua.random, }
Copy after login

3、发送请求，获取页面响应。
def get_page(self, url): # random.choice一定要写在这里,每次请求都会随机选择 res = requests.get(url, headers=self.headers) res.encoding = 'utf-8' html = res.text self.parse_page(html)
Copy after login

4、xpath解析一级页面数据，获取页面信息。
1）基准xpath节点对象列表。
# 　创建解析对象 parse_html = etree.HTML(html) # 基准xpath节点对象列表 dd_list = parse_html.xpath('//dl[@class="movie-list"]//dd')
Copy after login

2）依次遍历每个节点对象，提取数据。
for dd in dd_list: name = dd.xpath('.//div[@class="movie-hover-title"]//span[@class="name noscore"]/text()')[0].strip() star = dd.xpath('.//div[@class="movie-hover-info"]//div[@class="movie-hover-title"][3]/text()')[1].strip() type = dd.xpath('.//div[@class="movie-hover-info"]//div[@class="movie-hover-title"][2]/text()')[1].strip() dowld=dd.xpath('.//div[@class="movie-item-hover"]/a/@href')[0].strip() # print(movie_dict) movie = '''【即将上映】
Copy after login

5、定义movie，保存打印数据。
movie = '''【即将上映】电影名字: %s 主演：%s 类型：%s 详情链接：https://maoyan.com%s ========================================================= ''' % (name, star, type,dowld) print( movie)
Copy after login

6、random.randint()方法，设置时间延时。
time.sleep(random.randint(1, 3))
Copy after login

7、调用方法，实现功能。

html = self.get_page(url) self.parse_page(html)
Copy after login

/5 Effect display/

1. Click the green triangle to run the input Start page, end page.

2. After running the program, the result is displayed on the console, as shown below shown.

##3. Click the blue download link to view details online .

#/6 Summary/
1. It is not recommended to capture too much data, as it will easily cause load on the server. Just try it briefly.
2. This article is based on Python web crawler and uses the crawler library to crawl Maoyan movies.

The above is the detailed content of Use a Python web crawler to see what movies are currently playing in theaters. For more information, please follow other related articles on the PHP Chinese website!