How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?-Python Tutorial-php.cn

How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?

Linda Hamilton

Release： 2024-12-11 11:06:10

Original

597 people have browsed it

How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?

Retrieving Links from Web Pages with Python and BeautifulSoup

This article demonstrates how to retrieve the links from a web page and gather their URL addresses using Python and the BeautifulSoup library.

Problem:

How do you extract the URLs of links embedded in a webpage using Python?

Solution:

To achieve this, you can utilize the SoupStrainer class provided by BeautifulSoup. The following code snippet exemplifies the process:

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, 'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

Copy after login

This code establishes a connection to a specified webpage, namely 'http://www.nytimes.com' in the example. Using BeautifulSoup, it parses the HTML response and applies the SoupStrainer('a') filter, which focuses on 'a' tags (representing links) within the page. For each link found, the code retrieves its 'href' attribute, which contains the actual URL address.

The above is the detailed content of How Can I Extract Hyperlinks from a Webpage Using Python and BeautifulSoup?. For more information, please follow other related articles on the PHP Chinese website!