Home > Backend Development > PHP Tutorial > How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

DDD
Release: 2024-11-15 17:18:03
Original
930 people have browsed it

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Extracting Flat Text from Elements with a Designated Class Using PHP DOM

Extracting text from specific HTML elements is a common task in web development. PHP DOM provides robust tools for parsing HTML and accessing its contents. This article addresses a specific requirement to extract text from elements with a nominated class into two flat arrays.

Problem

Given HTML content containing text distributed between multiple p elements with alternating class names, the task is to save the text into two arrays: one for headings and one for content. For instance, given the following HTML:

<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
Copy after login

We need to obtain the following output:

$heading = ['Chapter 1', 'Chapter 2', 'Chapter 3'];
$content = ['This is chapter 1', 'This is chapter 2', 'This is chapter 3'];
Copy after login

Solution

To accomplish this extraction using PHP DOM, we employ DOMDocument and DOMXPath. The solution involves the following steps:

  1. Load the HTML into a DOMDocument object:
$dom = new DOMDocument();
$dom->loadHTML($test);
Copy after login
  1. Create a DOMXPath object to perform XPaths:
$xpath = new DOMXPath($dom);
Copy after login
  1. Use parseToArray() function to extract text from elements with specified class:
$heading = parseToArray($xpath, 'Heading1-H');
$content = parseToArray($xpath, 'Normal-H');
Copy after login

In the parseToArray() function:

  • It performs an XPath query for the designated class.
  • Iterates through the matched nodes and extracts their text content.
  • Stores the extracted text in an array, which is returned.

Here's the complete PHP code:

query($xpathquery);

    $resultarray = [];
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            $resultarray[] = $node->nodeValue;
        }
    }

    return $resultarray;
}

$test = <<
    Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

HTML; $dom = new DOMDocument(); $dom->loadHTML($test); $xpath = new DOMXPath($dom); $heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H'); var_dump($heading); echo "
"; var_dump($content); echo "
";
Copy after login

This approach utilizes the power of PHP DOM and XPath to efficiently extract text from HTML documents, allowing for more complex and targeted content manipulation.

The above is the detailed content of How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template