高級XPATH功能功能強大的查詢-XML/RSS教程-PHP中文網

使用高级XPath函数可显著提升在XML或HTML数据中节点选择的精确性与灵活性。1. 字符串函数如contains()、starts-with()和normalize-space()可用于匹配包含子串、前缀或去除多余空格的文本；2. 位置函数position()和last()支持按索引或倒序选择元素，如选取首个或末尾节点；3. 布尔函数not()及and、or可用于组合条件，实现复杂逻辑筛选；4. 节点集函数count()和string-length()可基于子节点数量或文本长度过滤元素；5. 轴（axes）如following-sibling::、ancestor::可实现跨层级导航，定位兄弟、祖先或后代节点；6. 使用string()函数可匹配包含子元素的完整文本内容，并结合normalize-space()增强匹配鲁棒性；7. 若环境支持，XPath 2.0 提供matches()、replace()等正则与字符串处理功能。实际应用中应避免表达式过长，优先使用稳定属性，结合代码处理并使用浏览器控制台测试，从而构建高效且抗页面变动的XPath查询。

Advanced XPath Functions for Powerful Queries

When working with XML or HTML data—especially in web scraping, test automation, or data extraction—XPath is a powerful tool for navigating and selecting nodes. While basic XPath expressions like //div[@class='example'] are common, leveraging advanced XPath functions can dramatically improve precision, efficiency, and flexibility in your queries.

Here’s a breakdown of key advanced XPath functions and how to use them effectively:

1. String Functions: Refine Text-Based Selections

XPath includes several string functions that help match elements based on partial or transformed text content.

`contains()`

Finds elements whose attribute or text contains a substring.

//a[contains(@href, 'example.com')]
//p[contains(text(), 'Welcome')]

Useful for dynamic attributes (e.g., classes with changing order).

`starts-with()` and `ends-with()`

Match attributes or text based on prefix or suffix.

//input[starts-with(@id, 'user_')]
//span[ends-with(text(), ':')]

Note: ends-with() is XPath 2.0 , so not supported in all tools (e.g., Selenium uses XPath 1.0 by default). For XPath 1.0, simulate it using substring():
//span[substring(text(), string-length(text()) - 1) = '!!']

`normalize-space()`

Removes extra whitespace (leading, trailing, and multiple internal spaces).

//p[normalize-space(text()) = 'Hello World']

Essential when dealing with inconsistently formatted HTML.

2. Positional and Indexing Functions

XPath allows you to select elements based on their position in the DOM or result set.

`position()` and `last()`

Select nodes by their index or from the end.

//li[position() = 1]     <!-- First item -->
//li[last()]             <!-- Last item -->
//li[position() > 5]     <!-- Items after the 5th -->
//li[position() mod 2 = 0] <!-- Every even item -->

`last()` with ranges

//tr[position() >= last() - 5]  <!-- Last 6 rows -->

Handy for tables where the footer or latest entries are at the end.

3. Boolean and Comparison Functions

XPath supports logical operations that return true/false for filtering.

`not()`

Negate a condition.

//input[not(@disabled)]
//div[not(contains(@class, 'hidden'))]

Combining conditions with `and`, `or`

//input[@type='text' and @required]
//button[@class='btn' or @class='button']

Value comparisons

//product[price > 100]
//user[age >= 18 and age <= 65]

Works if your XPath engine supports numeric comparisons (common in XPath 2.0 ).

4. Node Set Functions

These help manipulate or evaluate collections of nodes.

`count()`

Check the number of matching child nodes.

//div[count(p) > 3]           <!-- Divs with more than 3 paragraphs -->
//form[count(.//input[@required]) = 0] <!-- Forms with no required fields -->

`string-length()`

Filter based on text length.

//a[string-length(text()) > 20]

5. Axes: Navigate Beyond Basic Hierarchy

XPath axes let you traverse non-linear paths (siblings, ancestors, descendants, etc.).

`following-sibling::`, `preceding-sibling::`

Select siblings relative to current node.

//label[text()='Username']/following-sibling::input
//h3[text()='Contact']/following-sibling::p[1]

`ancestor::`, `descendant::`

Go up or down multiple levels.

//span[@class='error']/ancestor::form
//div[@id='content']//descendant::a[@href]

`parent::` and `child::`

More explicit than / and //.

//input[@name='email']/parent::div

6. Advanced Text Matching

Sometimes text is split across child elements. Use string() to get concatenated text.

//div[string() = 'Total: $50.00']

string() returns the full text content of a node and all its children.

Or combine normalize-space() and contains() for robust matching:

//div[contains(normalize-space(), 'Error occurred')]

7. XPath 2.0 Functions (If Supported)

Some tools (like XML databases or XSLT processors) support XPath 2.0 , which adds powerful functions:

matches(text(), 'regex') – Regex pattern matching
replace(text(), 'old', 'new') – String replacement
tokenize() – Split strings
upper-case(), lower-case() – Case manipulation

Example:

//a[matches(@href, '^https://.*\.pdf$')]

Note: Browsers and Selenium typically support only XPath 1.0, so these may not work in all environments.

Pro Tips for Real-World Use

Avoid over-complexity: Long XPath expressions can break easily with minor HTML changes.
Prefer stable attributes: Use id, semantic classes, or data attributes when possible.
Combine with other tools: Use XPath to narrow results, then process with code (e.g., Python, JavaScript).
Test in browser console: Use $x("//your/xpath") in DevTools to validate.

Advanced XPath functions give you surgical precision when extracting or validating structured data. While basic selectors work for simple cases, mastering functions like contains, position, normalize-space, and axes like following-sibling or ancestor turns XPath into a robust querying language.

With careful use, you can write expressions that are both powerful and resilient—especially when dealing with messy or dynamic markup.

以上是高級XPATH功能功能強大的查詢的詳細內容。更多資訊請關注PHP中文網其他相關文章！

本網站聲明

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

熱AI工具

Undress AI Tool

免費脫衣圖片

Undresser.AI Undress

人工智慧驅動的應用程序，用於創建逼真的裸體照片

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

ArtGPT

Stock Market GPT

人工智慧支援投資研究，做出更明智的決策

熱工具

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

中文版，非常好用

禪工作室 13.0.1

強大的PHP整合開發環境

Dreamweaver CS6

視覺化網頁開發工具

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

熱門話題

PHP教程

1679

276

NYT連接提示和答案

331

836

Related knowledge

了解maven中的pom.xml文件 Sep 21, 2025 am 06:00 AM

pom.xml是Maven項目的核心配置文件，它定義了項目的構建方式、依賴關係及打包部署行為。 1.項目坐標（groupId、artifactId、version）唯一標識項目；2.dependencies聲明項目依賴，Maven自動下載；3.properties定義可複用變量；4.build配置編譯插件和源碼目錄；5.parentPOM實現配置繼承；6.dependencyManagement統一管理依賴版本。 Maven通過解析pom.xml執行構建生命週期，合理使用BOM和依賴管理可提升項目穩

用node.js構建簡單的RSS饋送聚合器 Sep 20, 2025 am 05:47 AM

要構建一個RSS聚合器，需使用Node.js結合axios和rss-parser包來抓取並解析多個RSS源，首先初始化項目並安裝依賴，然後在aggregator.js中定義包含HackerNews、TechCrunch等源的URL列表，通過Promise.all並發獲取並處理各源數據，提取標題、鏈接、發佈時間和來源，合併後按時間倒序排列，接著可通過控制台輸出或用Express創建服務器將結果以JSON格式返回，最後可添加緩存機制避免頻繁請求，提升性能，從而實現一個高效、可擴展的RSS聚合系統。

使用XML屬性與元素：設計選擇 Sep 14, 2025 am 01:21 AM

useattributesformetadatasuchasid，狀態，orunit，descriveThelementButarenotCorecontent，senuringSimplicityAndCompactnesswhendataIsatomic.2.useElingSelelementForactualDataContent，尤其是whenenitmayrequirstructure，尤其是whenenitmayrequirstructure

XSLT 3.0的XML轉換：什麼新功能？ Sep 19, 2025 am 02:40 AM

XSLT3.0introducesmajoradvancementsthatmodernizeXMLandJSONprocessingthroughsevenkeyfeatures:1.Streamingwithxsl:modestreamable="yes"enableslow-memory,forward-onlyprocessingoflargeXMLfileslikelogsorfinancialdata;2.Packagesviaxsl:packagesupport

如何有效地流和解析千兆字節的XML文件 Sep 18, 2025 am 04:01 AM

要高效解析GB級XML文件，必須使用流式解析避免內存溢出，1.使用流式解析器如Python的xml.etree.iterparse或lxml，逐事件處理並及時調用elem.clear()釋放內存；2.僅處理目標標籤元素，通過標籤名或命名空間過濾無關數據，減少處理量；3.支持從磁盤或網絡流式讀取，結合requests和BytesIO或直接使用lxml迭代文件對象實現邊下載邊解析；4.優化性能，清除父節點引用、避免存儲已處理元素、僅提取必要字段，並可結合生成器或異步處理提升效率；5.超大文件可考慮預

XML最佳實踐：編寫乾淨有效的XML文檔 Sep 15, 2025 am 01:19 AM

XMLISCONSEDED時期可讀，可維護，和AdherestoxmlStandardSAndSchemas.1）CleanxmlrequireSproperIndentation and MeaningFufleElementNamesforredability.2）有效XMMLMLMLMUSTBEWELLMUSTBEWELL-MUSTBEWELL-FORMEDENDAGENSTASSTASSTASSTASSCHEMAMEODDD

如何刮擦網站數據並從中創建RSS feed Sep 19, 2025 am 02:16 AM

Checklegalconsiderationsbyreviewingrobots.txtandTermsofService,avoidserveroverload,andusedataresponsibly.2.UsetoolslikePython’srequests,BeautifulSoup,andfeedgentofetch,parse,andgenerateRSSfeeds.3.ScrapearticledatabyidentifyingHTMLelementswithDevTools

Python生態系統中XML庫的比較 Sep 09, 2025 am 02:19 AM

forbasicxmltaskswithNodipencies，USEXML.Etree.ElementTree; 2.ForadVancedFeaturesLikeXpathandXsslt，chooselxml; 3. 3. forverylargefi les，usexml.Saxorlxml’siterParseFormeMoryQuicy; 4. forlearningorlegacycode，xml.dom.minidomisacceptable; 5.formalformedorinco

See all articles

高級XPATH功能功能強大的查詢

1. String Functions: Refine Text-Based Selections

contains()

starts-with() and ends-with()

normalize-space()

2. Positional and Indexing Functions

position() and last()

last() with ranges

3. Boolean and Comparison Functions

not()

Combining conditions with and, or

Value comparisons

4. Node Set Functions

count()

string-length()

5. Axes: Navigate Beyond Basic Hierarchy

following-sibling::, preceding-sibling::

ancestor::, descendant::

parent:: and child::

6. Advanced Text Matching

7. XPath 2.0 Functions (If Supported)

Pro Tips for Real-World Use

熱AI工具

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

ArtGPT

Stock Market GPT

熱門文章

熱工具

記事本++7.3.1

SublimeText3漢化版

禪工作室 13.0.1

Dreamweaver CS6

SublimeText3 Mac版

熱門話題

`contains()`

`starts-with()` and `ends-with()`

`normalize-space()`

`position()` and `last()`

`last()` with ranges

`not()`

Combining conditions with `and`, `or`

`count()`

`string-length()`

`following-sibling::`, `preceding-sibling::`

`ancestor::`, `descendant::`

`parent::` and `child::`