java uses xpath and dom4j to parse xml-XML/RSS Tutorial-php.cn

Home

Backend Development

XML/RSS Tutorial

java uses xpath and dom4j to parse xml

高洛峰

Jan 11, 2017 pm 01:08 PM

1 XML文件解析的4种方法

通常解析XML文件有四种经典的方法。基本的解析方式有两种，一种叫SAX，另一种叫DOM。SAX是基于事件流的解析，DOM是基于XML文档树结构的解析。在此基础上，为了减少DOM、SAX的编码量，出现了JDOM，其优点是，20-80原则（帕累托法则），极大减少了代码量。通常情况下JDOM使用时满足要实现的功能简单，如解析、创建等要求。但在底层，JDOM还是使用SAX（最常用）、DOM、Xanan文档。另外一种是DOM4J，是一个非常非常优秀的Java XML API，具有性能优异、功能强大和极端易用的特点，同时它也是一个开放源代码的软件。如今你可以看到越来越多的 Java 软件都在使用 DOM4J 来读写 XML，特别值得一提的是连 Sun 的 JAXM 也在用 DOM4J。具体四种方法的使用，百度一下，会有众多详细的介绍。

2 XPath简单介绍

XPath是一门在XML文档中查找信息的语言。XPath用于在 XML 文档中通过元素和属性进行导航，并对元素和属性进行遍历。XPath 是 W3C XSLT 标准的主要元素，并且 XQuery 和 XPointer 同时被构建于 XPath 表达之上。因此，对 XPath 的理解是很多高级 XML 应用的基础。XPath非常类似对数据库操作的SQL语言，或者说JQuery，它可以方便开发者抓起文档中需要的东西。其中DOM4J也支持XPath的使用。

3 DOM4J使用XPath

DOM4J使用XPath解析XML文档是，首先需要在项目中引用两个JAR包：

dom4j-1.6.1.jar：DOM4J软件包，下载地址http://sourceforge.net/projects/dom4j/；

jaxen-xx.xx.jar：通常不添加此包，会引发异常（java.lang.NoClassDefFoundError: org/jaxen/JaxenException），下载地址http://www.jaxen.org/releases.html。

3.1 命名空间（namespace）的干扰

在处理由excel文件或其他格式文件转换的xml文件时，通常会遇到通过XPath解析得不到结果的情况。这种情况通常是由于命名空间的存在导致的。以下述内容的XML文件为例，通过XPath=" // Workbook/ Worksheet / Table / Row[1]/ Cell[1]/Data[1] "进行简单的检索，通常是没有结果出现的。这就是由于命名空间namespace（xmlns="urn:schemas-microsoft-com:office:spreadsheet"）导致的。

&lt;Workbook xmlns=&quot;urn:schemas-microsoft-com:office:spreadsheet&quot; xmlns:o=&quot;urn:schemas-microsoft-com:office:office&quot; xmlns:x=&quot;urn:schemas-microsoft-com:office:excel&quot; xmlns:ss=&quot;urn:schemas-microsoft-com:office:spreadsheet&quot; xmlns:html=&quot;http://www.w3.org/TR/REC-html40&quot;&gt;
  &lt;Worksheet ss:Name=&quot;Sheet1&quot;&gt;
    &lt;Table ss:ExpandedColumnCount=&quot;81&quot; ss:ExpandedRowCount=&quot;687&quot; x:FullColumns=&quot;1&quot; x:FullRows=&quot;1&quot; ss:DefaultColumnWidth=&quot;52.5&quot; ss:DefaultRowHeight=&quot;15.5625&quot;&gt;
      &lt;Row ss:AutoFitHeight=&quot;0&quot;&gt;
  &lt;Cell&gt;
   &lt;Data ss:Type=&quot;String&quot;&gt;敲代码的耗子&lt;/Data&gt;
  &lt;/Cell&gt; 
      &lt;/Row&gt;
      &lt;Row ss:AutoFitHeight=&quot;0&quot;&gt;
  &lt;Cell&gt;
   &lt;Data ss:Type=&quot;String&quot;&gt;Sunny&lt;/Data&gt;
  &lt;/Cell&gt; 
      &lt;/Row&gt;
    &lt;/Table&gt;
  &lt;/Worksheet&gt;
&lt;/Workbook&gt;

3.2 XPath对带有命名空间的xml文件解析

第一种方法（read1()函数）：使用XPath语法中自带的local-name() 和 namespace-uri() 指定你要使用的节点名和命名空间。 XPath表达式书写较为麻烦。

第二种方法（read2()函数）：设置XPath的命名空间，利用setNamespaceURIs()函数。

第三种方法（read3()函数）：设置DocumentFactory()的命名空间，使用的函数是setXPathNamespaceURIs()。二和三两种方法的XPath表达式书写相对简单。

第四种方法（read4()函数）：方法和第三种一样，但是XPath表达式不同（程序具体体现），主要是为了检验XPath表达式的不同，主要指完整程度，是否会对检索效率产生影响。

（以上四种方法均通过DOM4J结合XPath对XML文件进行解析）

第五种方法（read5()函数）：使用DOM结合XPath对XML文件进行解析，主要是为了检验性能差异。

没有什么能够比代码更能说明问题的了！果断上代码！

packageXPath;
importjava.io.IOException;
importjava.io.InputStream;
importjava.util.HashMap;
importjava.util.List;
importjava.util.Map;
importjavax.xml.parsers.DocumentBuilder;
importjavax.xml.parsers.DocumentBuilderFactory;
importjavax.xml.parsers.ParserConfigurationException;
importjavax.xml.xpath.XPathConstants;
importjavax.xml.xpath.XPathExpression;
importjavax.xml.xpath.XPathExpressionException;
importjavax.xml.xpath.XPathFactory;
importorg.dom4j.Document;
importorg.dom4j.DocumentException;
importorg.dom4j.Element;
importorg.dom4j.XPath;
importorg.dom4j.io.SAXReader;
importorg.w3c.dom.NodeList;
importorg.xml.sax.SAXException;
/**
*DOM4JDOMXMLXPath
*/
publicclassTestDom4jXpath{
publicstaticvoidmain(String[]args){
read1();
read2();
read3();
read4();//read3（）方法一样，但是XPath表达式不同
read5();
}
publicstaticvoidread1(){
/*
*uselocal-name()andnamespace-uri()inXPath
*/
try{
longstartTime=System.currentTimeMillis();
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream(&quot;XPath\\XXX.xml&quot;);
Documentdoc=reader.read(in);
/*Stringxpath=&quot;//*[local-name()=&#39;Workbook&#39;andnamespace-uri()=&#39;urn:schemas-microsoft-com:office:spreadsheet&#39;]&quot;
+&quot;/*[local-name()=&#39;Worksheet&#39;]&quot;
+&quot;/*[local-name()=&#39;Table&#39;]&quot;
+&quot;/*[local-name()=&#39;Row&#39;][4]&quot;
+&quot;/*[local-name()=&#39;Cell&#39;][3]&quot;
+&quot;/*[local-name()=&#39;Data&#39;][1]&quot;;*/
Stringxpath=&quot;//*[local-name()=&#39;Row&#39;][4]/*[local-name()=&#39;Cell&#39;][3]/*[local-name()=&#39;Data&#39;][1]&quot;;
System.err.println(&quot;=====uselocal-name()andnamespace-uri()inXPath====&quot;);
System.err.println(&quot;XPath：&quot;+xpath);
@SuppressWarnings(&quot;unchecked&quot;)
List&lt;Element&gt;list=doc.selectNodes(xpath);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println(&quot;show=&quot;+show);
longendTime=System.currentTimeMillis();
System.out.println(&quot;程序运行时间：&quot;+(endTime-startTime)+&quot;ms&quot;);
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread2(){
/*
*setxpathnamespace(setNamespaceURIs)
*/
try{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put(&quot;Workbook&quot;,&quot;urn:schemas-microsoft-com:office:spreadsheet&quot;);
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream(&quot;XPath\\XXX.xml&quot;);
Documentdoc=reader.read(in);
Stringxpath=&quot;//Workbook:Row[4]/Workbook:Cell[3]/Workbook:Data[1]&quot;;
System.err.println(&quot;=====usesetNamespaceURIs()tosetxpathnamespace====&quot;);
System.err.println(&quot;XPath：&quot;+xpath);
XPathx=doc.createXPath(xpath);
x.setNamespaceURIs(map);
@SuppressWarnings(&quot;unchecked&quot;)
List&lt;Element&gt;list=x.selectNodes(doc);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println(&quot;show=&quot;+show);
longendTime=System.currentTimeMillis();
System.out.println(&quot;程序运行时间：&quot;+(endTime-startTime)+&quot;ms&quot;);
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread3(){
/*
*setDocumentFactory()namespace(setXPathNamespaceURIs)
*/
try{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put(&quot;Workbook&quot;,&quot;urn:schemas-microsoft-com:office:spreadsheet&quot;);
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream(&quot;XPath\\XXX.xml&quot;);
reader.getDocumentFactory().setXPathNamespaceURIs(map);
Documentdoc=reader.read(in);
Stringxpath=&quot;//Workbook:Row[4]/Workbook:Cell[3]/Workbook:Data[1]&quot;;
System.err.println(&quot;=====usesetXPathNamespaceURIs()tosetDocumentFactory()namespace====&quot;);
System.err.println(&quot;XPath：&quot;+xpath);
@SuppressWarnings(&quot;unchecked&quot;)
List&lt;Element&gt;list=doc.selectNodes(xpath);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println(&quot;show=&quot;+show);
longendTime=System.currentTimeMillis();
System.out.println(&quot;程序运行时间：&quot;+(endTime-startTime)+&quot;ms&quot;);
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread4(){
/*
*同read3（）方法一样，但是XPath表达式不同
*/
try{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put(&quot;Workbook&quot;,&quot;urn:schemas-microsoft-com:office:spreadsheet&quot;);
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream(&quot;XPath\\XXX.xml&quot;);
reader.getDocumentFactory().setXPathNamespaceURIs(map);
Documentdoc=reader.read(in);
Stringxpath=&quot;//Workbook:Worksheet/Workbook:Table/Workbook:Row[4]/Workbook:Cell[3]/Workbook:Data[1]&quot;;
System.err.println(&quot;=====usesetXPathNamespaceURIs()tosetDocumentFactory()namespace====&quot;);
System.err.println(&quot;XPath：&quot;+xpath);
@SuppressWarnings(&quot;unchecked&quot;)
List&lt;Element&gt;list=doc.selectNodes(xpath);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println(&quot;show=&quot;+show);
longendTime=System.currentTimeMillis();
System.out.println(&quot;程序运行时间：&quot;+(endTime-startTime)+&quot;ms&quot;);
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread5(){
/*
*DOMandXPath
*/
try{
longstartTime=System.currentTimeMillis();
DocumentBuilderFactorydbf=DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
DocumentBuilderbuilder=dbf.newDocumentBuilder();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream(&quot;XPath\\XXX.xml&quot;);
org.w3c.dom.Documentdoc=builder.parse(in);
XPathFactoryfactory=XPathFactory.newInstance();
javax.xml.xpath.XPathx=factory.newXPath();
//选取所有class元素的name属性
Stringxpath=&quot;//Workbook/Worksheet/Table/Row[4]/Cell[3]/Data[1]&quot;;
System.err.println(&quot;=====DomXPath====&quot;);
System.err.println(&quot;XPath：&quot;+xpath);
XPathExpressionexpr=x.compile(xpath);
NodeListnodes=(NodeList)expr.evaluate(doc,XPathConstants.NODE);
for(inti=0;i&lt;nodes.getLength();i++){
System.out.println(&quot;show=&quot;+nodes.item(i).getNodeValue());
longendTime=System.currentTimeMillis();
System.out.println(&quot;程序运行时间：&quot;+(endTime-startTime)+&quot;ms&quot;);
}
}catch(XPathExpressionExceptione){
e.printStackTrace();
}catch(ParserConfigurationExceptione){
e.printStackTrace();
}catch(SAXExceptione){
e.printStackTrace();
}catch(IOExceptione){
e.printStackTrace();
}
}
}

更多java使用xpath和dom4j解析xml相关文章请关注PHP中文网！

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undresser.AI Undress

AI-powered app for creating realistic nude photos

ArtGPT

AI image generator for creative art from text prompts.

Stock Market GPT

AI powered investment research for smarter decisions

Hot Article

How to correctly migrate jQuery's drag and drop events to native JavaScript

4 weeks ago By DDD

The Notepad upgrade, cheaper YouTube TV, and Nova Launcher's new owner: News roundup

3 weeks ago By DDD

How to get Iron Ore in Pokémon Pokopia

4 weeks ago By Jack chen

Solve the error of multidict build failure when installing Python package

4 weeks ago By DDD

How to apply the facade pattern (Facade) in Golang Go language simplifies the API of complex systems

3 weeks ago By DDD

Popular tool

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Douyin level price list 1-75

20518

wifi shows no ip assigned

13631

Virtual mobile phone number to receive verification code

11966

Where is the login entrance for gmail email?

8986

How to turn off windows security center

8505

Related knowledge

How to install the XML Tools plugin in Notepad ? (Plugin Manager) Mar 05, 2026 am 12:37 AM

Notepad v8.6.1 has completely removed the PluginManager. XMLTools cannot be installed because it has not been migrated to the new plug-in system and the author has stopped updating it. Manual installation is only applicable to v8.5.7 and earlier versions. It is recommended to use built-in functions or alternatives such as VSCode.

How to convert XML to YAML for DevOps? (Configuration Management) Mar 12, 2026 am 12:11 AM

xmltodict PyYAMListhesafestcomboforDevOpsconfigfilesbecauseitpreservescomments,CDATA,namespaces,andattributesaccurately,unlikerawXML-to-YAMLtoolsorCLIutilitieslikeyqandxmllintwhichsilentlydropcriticalmetadata.

How to format and beautify XML code in Notepad ? (Pretty Print) Mar 07, 2026 am 12:20 AM

Notepad needs to manually install and enable the XMLTools plug-in to format XML; if the tags are messed up or the content is lost after formatting, it means that the XML itself is illegal, and there are problems such as unclosed tags or illegal characters.

How to convert an XML file to a Word document? (Reporting) Mar 09, 2026 am 01:05 AM

python-docx does not support direct reading of XML files. You need to use xml.etree.ElementTree or lxml to parse the XML extraction fields first, and then write them into the Document object segment by segment. Explicit declaration of prefixes is required to process namespaces, and manual manipulation of the underlying XML is required for table merging and styling. Chinese paths should be avoided when saving.

How to minify XML files for faster web loading? (Performance Optimization) Mar 08, 2026 am 12:16 AM

RunningminifyonXMLwithoutunderstandingitsrulesbreaksparsingoralterssemanticsbecausewhitespacecanbemeaningful;safeminificationrequiresdata-orientedXML,controlledgeneration/consumption,andstrictparserawareness.

How to parse XML data from a URL API? (Rest Services) Mar 13, 2026 am 12:06 AM

To parse remote XML API in Python, you need to use requests to get the response and then check the status code and Content-Type. Prioritize using r.text with xml.etree.ElementTree to parse; when encountering a namespace, you need to pass the namespace dictionary; use iterparse to stream large files and clear them manually; front-end JS requires CORS support or proxy.

How to use Attributes vs Elements in XML? (Design Best Practices) Mar 16, 2026 am 12:26 AM

You should use attributes to store short metadata (such as id, type), and use elements to store scalable content data; because attributes do not support namespaces, duplication, nesting, and internationalization, their parsing is error-prone and maintenance is difficult.

How to open and view XML files in Windows 11? (Beginner Guide) Mar 12, 2026 am 01:02 AM

The XML file cannot be opened by double-clicking because it is associated with Notepad by default, causing confusion in the display. You should use Notepad, VSCode or Edge instead; Edge can format and report errors, while VSCode requires the installation of extensions such as RedHatXML for normal highlighting, indentation and verification.