Frequently asked questions about getting started with XML (3)-XML/RSS Tutorial-php.cn

Home

Backend Development

XML/RSS Tutorial

Frequently asked questions about getting started with XML (3)

黄舟

Dec 22, 2016 pm 05:38 PM

xml

How to load documents with foreign and special characters?

　Documents can contain foreign characters, for example:

　foreign characters (úóí?)

　For example, foreign characters such as 磲 must be preceded by an escape sequence. Foreign characters can be UTF-8 encoded or specified with a different encoding, as shown below:

　　foreign characters (磲)

　 XML now loads correctly.

　Other characters are reserved in XML and need to be handled differently. The following XML:

This & that
produces the following error:
No spaces are allowed here.
　 Line 0000001: This & that
　 Position 0000012: ----------^

　 Here & is part of the XML syntax structure. If it is only placed inside the XML data source, it cannot be interpreted as & . You need to replace special character sequences called "entities".

　　This & that
　 The following characters require corresponding entities:

　< <
　& &
　delimiter and therefore generally cannot be used inside an attribute value. For example, the following will return an error:

The single quote here is used both as an attribute delimiter and within the attribute value itself. In order to correct this problem, you can change the attribute delimiter to double quotes:

　Or you can escape the single quotes to the entity'

　Both of the above methods will return the attribute value John's Stuff through the getAttribute method in the XML object model . Likewise, for double quotes, you can use the entity

　 You can also handle special characters in element content by placing the text in a CDATA section. The following is correct:

　 In this example, the XML Object Model Display the CDATA node as the child node of the xml node, it will return the string

　　This & that is just "text" content.
　　as nodeValue

　How to use MSXML COM component in Visual Studio 6.0 C++?

　In Visual C++. The easiest way to use MSXML COM components in 6.0 is to use the #import directive:

　#import "msxml.dll" named_guids no_namespace#import "msxml.dll" named_guids no_namespace

　It defines all IXML* interfaces and interface IDs, so you can Use them in your application. The MSXML type library and header files (in English) are also available from INETSDK, as well as uuid.lib which contains class IIDs.

　　How to use HTML entities in XML?

　　The following XML contains HTML entities. ：

　It generates the following error:

　　Reference to undefined entity 'copy'.

　 Line: 1, Location: 23, Error code: 0xC00CE002
　Copyright ? 2000, ...

　--------------------------^

　This is because XML has only five built-in entities. Details about built-in entities. See How to load documents with foreign and special characters? To use HTML entities, they need to be defined with a DTD. For details on the DTD, see the W3C XML Recommendation (English). To use this DTD, please refer to it. Included in the DOCTYPE tag as follows: Copyright ? 2000, Microsoft Inc, All rights reserved.

To load it, you need to turn off the validateOnParse attribute of the IXMLDOMDocument interface. Try pasting it into the Validator test page. , turn off DTD validation, and then click Validate. Notice that the document loads and copyright characters appear in the DOM tree at the end of the validator page.

　If DTD validation has been completed, then the HTML entities as parameter entities must be included in the existing DTD as follows:

　%HTMLENT;
　%HTMLENT;

　It will define all HTML entities, to use them in XML documents.

　 How to deal with whitespace characters in element content?

　 XML DOM has three ways of accessing the textual content of elements:

　 Property Behavior

　 nodeValue Returns the original textual content (including whitespace characters) on TEXT, CDATA, COMMENT and PI nodes, as specified in the original XML source. For ELEMENT nodes and DOCUMENT itself, null is returned.

　 Data Same as nodeValue

　 Text Repeatedly connect multiple TEXT and CDATA nodes in the specified subtree and return the combined result.

　Note: Whitespace characters include new lines, tabs and spaces.

　The nodeValue attribute usually returns the content in the original document, regardless of how the document was loaded and the current xml:space scope.

　The text attribute connects all text in the specified subtree and expands the entity. This has to do with how the document is loaded, the current state of the PReserveWhiteSpace switch and the current xml:space scope, see below:

　preserveWhiteSpace = true when the document is loaded

preserveWhiteSpace=true preserveWhiteSpace=true preserveWhiteSpace=false preserveWhiteSpace=false

xml:space=preserve xml:space=default xml:space=preserve xml :space=default

reserved reserved Preserve Preserve and truncate PreserveWhiteSpace = false when the document is loaded

preserveWhiteSpace=true preserveWhiteSpace=false preserveWhiteSpace=false

xml:space=preserve xml:space=default xml:space=preserve xml :space=default

Half-preserved Half-preserved and truncated Half-preserved Half-preserved and truncated

The reserved here means that the original text content is exactly the same as in the original XML document, and truncation means that leading and trailing spaces have been removed, Semi-preserved means that "significant whitespace characters" are preserved and "unimportant whitespace characters" are normalized. Important whitespace characters are whitespace characters within the text content. Unimportant whitespace characters are the whitespace characters between tokens, look like this:

　n
　 t Janen

tSmith n

　 In this example, red is unimportant whitespace characters that can be ignored, while green is The whitespace character is important because it is part of the text content and therefore has important meaning that cannot be ignored. So in this example, the text property returns the following:

Status return value

Keep "nt JanentSmith n"

Keep and truncate "JanentSmith"

Half-preserve Jane Smith "

Please note that "semi-preserved" will normalize unimportant whitespace characters, for example, newline and tab characters will be reduced to a single space. If you change the xml:space attribute and preserveWhiteSpace switch, the text properties will return correspondingly different values.

　CDATA and xml:space="preserve" subtree boundaries
　In the example below, the contents of CDATA nodes or "preserve" nodes will be concatenated because they do not participate in normalization of unimportant whitespace characters. For example:

　t Jane n

　t Smith ]>n

　In this case, whitespace characters inside the CDATA node are no longer "merged" with "unimportant" whitespace characters and are not truncated. So the "semi-preserved and truncated" case will return the following:

　　　"Jane Smith

Here, unimportant whitespace characters between the and tags will be included, regardless of the contents of the CDATA node. If you replace CDATA with the following, the same result will be returned:

　　Smith

Entities are special

Entities are loaded and parsed as part of the DTD, and are displayed under the DOCTYPE node. They don't have to have any xml:space scope. For example:

　　Jane n
　tn

　&Jane;

　 Assuming preserveWhiteSpace=false (in DOCTYPE tag scope), unimportant whitespace characters are lost when parsing entities. Entities will not There are blank character nodes. The tree will look like:

　　DOCTYPE foo

　　ENTITY: Jane

　　ELEMENT: name
　　ELEMENT: title

　　TEXT>:Software Design Engineer

　　ELEMENT: foo
　　ATTRI BUTE:xml :space="preserve"
　　ENTITYREF: Jane

　Please note that the DOM tree exposed under the ENTITY node inside the DOCTYPE does not contain any WHITESPACE nodes. This means that the children of the ENTITYREF node do not have WHITESPACE nodes, even if the entity reference is in the xml The same goes for :space="preserve". Each instance of an ENTITY referenced in a given document usually has the same tree. If an entity must absolutely preserve whitespace characters, it must specify itself within itself. The xml:space attribute, or the document preserveWhiteSpace switch must be set to true.

　How to deal with whitespace characters in attributes?

　There are several ways to access the attribute value. The IXMLDOMAttribute interface has the nodeValue attribute, which is equivalent to the Microsoft extension. nodeValue and text properties. These properties return: The text returned by the property

　attrNode.nodeValue
　attrNode.value
　getAttribute("name") returns exactly the same content (and extended entities) as in the original document.
　attrNode.nodeTypedValue Null
　attrNode.text is the same as nodeValue except that leading and trailing whitespace characters have been truncated.

　The "XML Language" specification defines the following behavior for XML applications: Text returned by attribute type
CDATA ID, IDREF, IDREFS, ENTITY, ENTITIES, NOTATION, enumeration

Semi-normalized Full normalized

Semi-normalized here It means converting new lines and tab characters into spaces, but multiple spaces will not degenerate into one space.

The above is the content of the FAQ (3) for getting started with XML. For more related content, please pay attention to the PHP Chinese website (m.sbmmt.com)!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undresser.AI Undress

AI-powered app for creating realistic nude photos

ArtGPT

AI image generator for creative art from text prompts.

Stock Market GPT

AI powered investment research for smarter decisions

Hot Article

How to correctly migrate jQuery's drag and drop events to native JavaScript

1 months ago By DDD

The Notepad upgrade, cheaper YouTube TV, and Nova Launcher's new owner: News roundup

3 weeks ago By DDD

How to get Iron Ore in Pokémon Pokopia

4 weeks ago By Jack chen

Solve the error of multidict build failure when installing Python package

4 weeks ago By DDD

How to apply the facade pattern (Facade) in Golang Go language simplifies the API of complex systems

3 weeks ago By DDD

Popular tool

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Douyin level price list 1-75

20518

wifi shows no ip assigned

13631

Virtual mobile phone number to receive verification code

11966

Where is the login entrance for gmail email?

8993

How to turn off windows security center

8505

Related knowledge

JSON vs. XML: Why RSS Chose XML May 05, 2025 am 12:01 AM

RSS chose XML instead of JSON because: 1) XML's structure and verification capabilities are better than JSON, which is suitable for the needs of RSS complex data structures; 2) XML was supported extensively at that time; 3) Early versions of RSS were based on XML and have become a standard.

Understanding RSS Documents: A Comprehensive Guide May 09, 2025 am 12:15 AM

RSS documents are a simple subscription mechanism to publish content updates through XML files. 1. The RSS document structure consists of and elements and contains multiple elements. 2. Use RSS readers to subscribe to the channel and extract information by parsing XML. 3. Advanced usage includes filtering and sorting using the feedparser library. 4. Common errors include XML parsing and encoding issues. XML format and encoding need to be verified during debugging. 5. Performance optimization suggestions include cache RSS documents and asynchronous parsing.

Building XML Applications with C : Practical Examples May 03, 2025 am 12:16 AM

You can use the TinyXML, Pugixml, or libxml2 libraries to process XML data in C. 1) Parse XML files: Use DOM or SAX methods, DOM is suitable for small files, and SAX is suitable for large files. 2) Generate XML file: convert the data structure into XML format and write to the file. Through these steps, XML data can be effectively managed and manipulated.

RSS, XML and the Modern Web: A Content Syndication Deep Dive May 08, 2025 am 12:14 AM

RSS and XML are still important in the modern web. 1.RSS is used to publish and distribute content, and users can subscribe and get updates through the RSS reader. 2. XML is a markup language and supports data storage and exchange, and RSS files are based on XML.

XML in C : Handling Complex Data Structures May 02, 2025 am 12:04 AM

Working with XML data structures in C can use the TinyXML or pugixml library. 1) Use the pugixml library to parse and generate XML files. 2) Handle complex nested XML elements, such as book information. 3) Optimize XML processing code, and it is recommended to use efficient libraries and streaming parsing. Through these steps, XML data can be processed efficiently.

Beyond Basics: Advanced RSS Features Enabled by XML May 07, 2025 am 12:12 AM

RSS enables multimedia content embedding, conditional subscription, and performance and security optimization. 1) Embed multimedia content such as audio and video through tags. 2) Use XML namespace to implement conditional subscriptions, allowing subscribers to filter content based on specific conditions. 3) Optimize the performance and security of RSSFeed through CDATA section and XMLSchema to ensure stability and compliance with standards.

Understanding RSS: An XML Perspective Apr 25, 2025 am 12:14 AM

RSS is an XML-based format used to publish frequently updated content. 1. RSSfeed organizes information through XML structure, including title, link, description, etc. 2. Creating RSSfeed requires writing in XML structure, adding metadata such as language and release date. 3. Advanced usage can include multimedia files and classified information. 4. Use XML verification tools during debugging to ensure that the required elements exist and are encoded correctly. 5. Optimizing RSSfeed can be achieved by paging, caching and keeping the structure simple. By understanding and applying this knowledge, content can be effectively managed and distributed.

Inside the RSS Document: Essential XML Tags and Attributes May 03, 2025 am 12:12 AM

The core structure of RSS documents includes XML tags and attributes. The specific parsing and generation steps are as follows: 1. Read XML files, process and tags. 2. Extract,,, etc. tag information. 3. Handle custom tags and attributes to ensure version compatibility. 4. Use cache and asynchronous processing to optimize performance to ensure code readability.