Removing HTML Elements for Plain Text Extraction with JavaScript
When dealing with HTML content, there are situations where you may need to extract the pure text without the HTML element tags. JavaScript provides a convenient way to achieve this.
Problem Statement:
You have an HTML document with a button and text enclosed within a
element. Upon clicking the button, you want to remove all HTML element tags from the text within that
element, leaving only the plain text.
Solution:
To achieve this, follow these steps using JavaScript:
<code class="javascript">function get_content() { // Get the element by its ID var element = document.getElementById('txt'); // Extract the plain text using either innerText or textContent // Depending on the browser support and specific requirements, you can use either of these methods. var text = element.innerText || element.textContent; // Replace HTML elements with the pure text element.innerHTML = text; }</code>
By using this function, when the user clicks the button, the HTML tags within the
element will be removed, leaving only the plain text.
innerText vs. textContent:
The choice between innerText and textContent depends on your requirements. innerText mimics the visible text, including any spaces, line breaks, and hidden text. On the other hand, textContent strips out any formatting or hidden elements.
Compatibility:
innerText has better compatibility with older IE browsers but may not be suitable for complex HTML structures. textContent is recommended for more robust and consistent behavior across browsers.
The above is the detailed content of How to Extract Plain Text from HTML with JavaScript?. For more information, please follow other related articles on the PHP Chinese website!