如何使用 BeautifulSoup 有效率地從 HTML 中提取 HREF 屬性？-Python教學-PHP中文網

如何使用 BeautifulSoup 有效率地從 HTML 中提取 HREF 屬性？

Mary-Kate Olsen

發布： 2024-10-30 18:36:03

原創

817 人瀏覽過

How to Efficiently Extract HREF Attributes from HTML Using BeautifulSoup?

從 BeautifulSoup 中提取 HREF

使用 BeautifulSoup 處理 HTML 文件時，提取特定屬性（例如 href）可能至關重要。本文提供了即使在存在多個標籤的情況下也能高效檢索 href 值的解決方案。

使用find_all 進行HREF 檢索

僅定位具有href 屬性的標籤，使用find_all 方法，如下所示：

<code class="python"># Python2
from BeautifulSoup import BeautifulSoup

html = '''<a href="some_url">next</a>
<span class="class"><a href="another_url">later</a></span>'''

soup = BeautifulSoup(html)

for a in soup.find_all('a', href=True):
    print "Found the URL:", a['href']</code>

登入後複製

此方法允許您迭代所有找到的a 標籤並列印它們的href 值。注意，BeautifulSoup 4 之前的版本，方法名為 findAll。

透過 HREF 擷取所有標籤

如果你想要取得所有具有 href 屬性的標籤，可以只省略名稱參數：

<code class="python">href_tags = soup.find_all(href=True)</code>

登入後複製

以上是如何使用 BeautifulSoup 有效率地從 HTML 中提取 HREF 屬性？的詳細內容。更多資訊請關注PHP中文網其他相關文章！