Extracting Flat Text from Elements with a Designated Class Using PHP DOM
Extracting text from specific HTML elements is a common task in web development. PHP DOM provides robust tools for parsing HTML and accessing its contents. This article addresses a specific requirement to extract text from elements with a nominated class into two flat arrays.
Problem
Given HTML content containing text distributed between multiple p elements with alternating class names, the task is to save the text into two arrays: one for headings and one for content. For instance, given the following HTML:
<p class="Heading1-P"> <span class="Heading1-H">Chapter 1</span> </p> <p class="Normal-P"> <span class="Normal-H">This is chapter 1</span> </p>
We need to obtain the following output:
$heading = ['Chapter 1', 'Chapter 2', 'Chapter 3']; $content = ['This is chapter 1', 'This is chapter 2', 'This is chapter 3'];
Solution
To accomplish this extraction using PHP DOM, we employ DOMDocument and DOMXPath. The solution involves the following steps:
$dom = new DOMDocument(); $dom->loadHTML($test);
$xpath = new DOMXPath($dom);
$heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H');
In the parseToArray() function:
Here's the complete PHP code:
query($xpathquery); $resultarray = []; foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { $resultarray[] = $node->nodeValue; } } return $resultarray; } $test = << Chapter 2This is chapter 2
Chapter 3
This is chapter 3
HTML; $dom = new DOMDocument(); $dom->loadHTML($test); $xpath = new DOMXPath($dom); $heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H'); var_dump($heading); echo "
"; var_dump($content); echo "
";
This approach utilizes the power of PHP DOM and XPath to efficiently extract text from HTML documents, allowing for more complex and targeted content manipulation.
The above is the detailed content of How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?. For more information, please follow other related articles on the PHP Chinese website!