Home > Backend Development > PHP Tutorial > How can DOMDocument and XPath be used to Target and Extract Specific Text Content from HTML?

How can DOMDocument and XPath be used to Target and Extract Specific Text Content from HTML?

Mary-Kate Olsen
Release: 2024-10-30 09:51:27
Original
981 people have browsed it

How can DOMDocument and XPath be used to Target and Extract Specific Text Content from HTML?

DOMDocument Parsing for Targeting Specific Content

Using "DOMDocument", a powerful PHP library, allows for precise parsing of HTML documents. Unlike "getElementsByTagName", which retrieves all tags with a specific name, this method utilizes XPath queries to effectively target desired elements.

Capture Text Nodes within Specific Contexts

To extract specific text content, the process involves:

  • Loading the HTML string into a DOM object using "DOMDocument::loadHTML".
  • Initiating an "XPath" object using "new DOMXPath($dom)".
  • Employing an XPath query which specifies the target nodes. For instance:
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
Copy after login

This query retrieves all

tags with the "text" class that are nested within
tags with the "main" class.

Iterating through the resulting list of elements using a "foreach" loop allows for the extraction of "nodeValue", which contains the actual text:

foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}
Copy after login

Example Implementation

Consider the following HTML snippet:

<code class="html"><div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div></code>
Copy after login

Using the provided query, the output would be:

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
Copy after login

This demonstrates the ability to precisely extract specific text content within a hierarchical HTML structure using "DOMDocument" and XPath.

The above is the detailed content of How can DOMDocument and XPath be used to Target and Extract Specific Text Content from HTML?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template