How to read text content in html file

下次还敢
Release: 2024-04-11 13:57:24
Original
359 people have browsed it

To read the text content in an HTML file, perform the following steps: Load the HTML file Parse the HTML Extract text using the text attribute or get_text() method Optional: Clean text (remove whitespace, special characters and convert to lowercase ) Output text (print, write to file, etc.)

How to read text content in html file

How to read text content in HTML files

To extract text content from an HTML file, you can use the following steps:

1. Load the HTML file

import requests url = 'https://example.com' response = requests.get(url)
Copy after login

2. Parse the HTML

from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser')
Copy after login

3. Extract text content

There are two ways to extract text content:

  • UsetextAttributes:Extract all text within the HTML tag, including the tag itself.
text = soup.text
Copy after login
  • Useget_text()Method:Extract the text within the HTML tag, but ignore the tag itself.
text = soup.get_text()
Copy after login

4. Clean text content (optional)

If you need to further clean up text content, you can perform the following operations:

  • Remove white space characters:
text = text.replace(' ', '')
Copy after login
  • Remove special characters:
import string text = text.translate(str.maketrans('', '', string.punctuation))
Copy after login
  • Convert to lowercase:
text = text.lower()
Copy after login

5. Output text content

You can output text content in a variety of ways:

  • Print to console:
print(text)
Copy after login
  • Write to file:
with open('output.txt', 'w') as f: f.write(text)
Copy after login

The above is the detailed content of How to read text content in html file. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!