Document object model

pugixml 0.9 manual \| +	+pugixml 1.0 manual \| Overview \| Installation \| Document: @@ -46,10 +47,10 @@ pugixml stores XML data in DOM-like way: the entire XML document (both document structure and element data) is stored in memory as a tree. The tree can be - loaded from character stream (file, string, C++ I/O stream), then traversed - via special API or XPath expressions. The whole tree is mutable: both node - structure and node/attribute data can be changed at any time. Finally, the - result of document transformations can be saved to a character stream (file, + loaded from a character stream (file, string, C++ I/O stream), then traversed + with the special API or XPath expressions. The whole tree is mutable: both + node structure and node/attribute data can be changed at any time. Finally, + the result of document transformations can be saved to a character stream (file, C++ I/O stream or custom transport). @@ -58,12 +59,11 @@ The XML document is represented with a tree data structure. The root of the - tree is the document itself, which corresponds to C++ type `xml_document`. Document has one or more - child nodes, which correspond to C++ type `xml_node`. - Nodes have different types; depending on a type, a node can have a collection - of child nodes, a collection of attributes, which correspond to C++ type - `xml_attribute`, and some additional - data (i.e. name). + tree is the document itself, which corresponds to C++ type xml_document. + Document has one or more child nodes, which correspond to C++ type xml_node. Nodes have different types; depending + on a type, a node can have a collection of child nodes, a collection of attributes, + which correspond to C++ type xml_attribute, + and some additional data (i.e. name). The tree nodes can be of one of the following types (which together form @@ -73,13 +73,13 @@ Document node (`node_document`) - this is the root of the tree, which consists of several child nodes. This - node corresponds to `xml_document` - class; note that `xml_document` - is a sub-class of `xml_node`, - so the entire node interface is also available. However, document node - is special in several ways, which will be covered below. There can be - only one document node in the tree; document node does not have any XML - representation. + node corresponds to xml_document + class; note that xml_document is + a sub-class of xml_node, so the entire + node interface is also available. However, document node is special in + several ways, which are covered below. There can be only one document + node in the tree; document node does not have any XML representation. + @@ -87,13 +87,13 @@ is the most common type of node, which represents XML elements. Element nodes have a name, a collection of attributes and a collection of child nodes (both of which may be empty). The attribute is a simple name/value - pair. The example XML representation of element node is as follows: + pair. The example XML representation of element nodes is as follows: <node attr="value"><child/></node> - There are two element nodes here; one has name `"node"`, + There are two element nodes here: one has name `"node"`, single attribute `"attr"` and single child `"child"`, another has name `"child"` @@ -102,10 +102,10 @@ Plain character data nodes (`node_pcdata`) represent plain text in XML. PCDATA nodes have a value, but do not have - name or children/attributes. Note that plain character data is not a - part of the element node but instead has its own node; for example, an - element node can have several child PCDATA nodes. The example XML representation - of text node is as follows: + a name or children/attributes. Note that plain character data is not + a part of the element node but instead has its own node; for example, + an element node can have several child PCDATA nodes. The example XML + representation of text nodes is as follows: <node> text1 <child/> text2 </node> @@ -128,9 +128,9 @@ Comment nodes (`node_comment`) represent - comments in XML. Comment nodes have a value, but do not have name or - children/attributes. The example XML representation of comment node is - as follows: + comments in XML. Comment nodes have a value, but do not have a name or + children/attributes. The example XML representation of a comment node + is as follows: <!-- comment text --> @@ -138,14 +138,14 @@ Here the comment node has value `"comment text"`. By default comment nodes are treated as non-essential part of XML markup and are not loaded during XML parsing. You can override - this behavior by adding `parse_comments` + this behavior with parse_comments flag. Processing instruction node (`node_pi`) represent processing instructions (PI) in XML. PI nodes have a name and an optional value, but do not have children/attributes. The example XML representation - of PI node is as follows: + of a PI node is as follows: <?name value?> @@ -153,17 +153,17 @@ Here the name (also called PI target) is `"name"`, and the value is `"value"`. By default PI nodes are treated as non-essential part of XML markup and - are not loaded during XML parsing. You can override this behavior by adding - `parse_pi` flag. + are not loaded during XML parsing. You can override this behavior with + parse_pi flag. Declaration node (`node_declaration`) represents document declarations in XML. Declaration nodes have a name (`"xml"`) and an - optional collection of attributes, but does not have value or children. + optional collection of attributes, but do not have value or children. There can be only one declaration node in a document; moreover, it should be the topmost node (its parent should be the document). The example - XML representation of declaration node is as follows: + XML representation of a declaration node is as follows: <?xml version="1.0"?> @@ -172,12 +172,28 @@ and a single attribute with name `"version"` and value `"1.0"`. By default declaration nodes are treated as non-essential part of XML markup - and are not loaded during XML parsing. You can override this behavior by - adding `parse_declaration` - flag. Also, by default a dummy declaration is output when XML document - is saved unless there is already a declaration in the document; you can - disable this by adding `format_no_declaration` - flag. + and are not loaded during XML parsing. You can override this behavior with + parse_declaration flag. Also, + by default a dummy declaration is output when XML document is saved unless + there is already a declaration in the document; you can disable this with + format_no_declaration flag. + + + Document type declaration node (`node_doctype`) + represents document type declarations in XML. Document type declaration + nodes have a value, which corresponds to the entire document type contents; + no additional nodes are created for inner elements like `<!ENTITY>`. There can be only one document type + declaration node in a document; moreover, it should be the topmost node + (its parent should be the document). The example XML representation of + a document type declaration node is as follows: + + <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> + + + Here the node has value `"greeting [ <!ELEMENT + greeting (#PCDATA)> ]"`. By default document type + declaration nodes are treated as non-essential part of XML markup and are + not loaded during XML parsing. You can override this behavior with parse_doctype flag. Finally, here is a complete example of XML document and the corresponding @@ -227,40 +243,45 @@	Note
- All pugixml classes and functions are located in `pugi` + All pugixml classes and functions are located in the `pugi` namespace; you have to either use explicit name qualification (i.e. `pugi::xml_node`), or to gain access to relevant symbols via `using` directive (i.e. `using pugi::xml_node;` or `using - namespace pugi;`). The namespace will be omitted from declarations - in this documentation hereafter; all code examples will use fully-qualified - names. + namespace pugi;). The namespace will be omitted from all + declarations in this documentation hereafter; all code examples will use + fully qualified names.

pugixml 0.9 manual | +

+pugixml 1.0 manual | Overview | Installation | Document: @@ -46,10 +47,10 @@

pugixml stores XML data in DOM-like way: the entire XML document (both document structure and element data) is stored in memory as a tree. The tree can be - loaded from character stream (file, string, C++ I/O stream), then traversed - via special API or XPath expressions. The whole tree is mutable: both node - structure and node/attribute data can be changed at any time. Finally, the - result of document transformations can be saved to a character stream (file, + loaded from a character stream (file, string, C++ I/O stream), then traversed + with the special API or XPath expressions. The whole tree is mutable: both + node structure and node/attribute data can be changed at any time. Finally, + the result of document transformations can be saved to a character stream (file, C++ I/O stream or custom transport).

@@ -58,12 +59,11 @@

The XML document is represented with a tree data structure. The root of the - tree is the document itself, which corresponds to C++ type xml_document. Document has one or more - child nodes, which correspond to C++ type xml_node. - Nodes have different types; depending on a type, a node can have a collection - of child nodes, a collection of attributes, which correspond to C++ type - xml_attribute, and some additional - data (i.e. name). + tree is the document itself, which corresponds to C++ type xml_document. + Document has one or more child nodes, which correspond to C++ type xml_node. Nodes have different types; depending + on a type, a node can have a collection of child nodes, a collection of attributes, + which correspond to C++ type xml_attribute, + and some additional data (i.e. name).

The tree nodes can be of one of the following types (which together form @@ -73,13 +73,13 @@

Document node (node_document) - this is the root of the tree, which consists of several child nodes. This - node corresponds to xml_document - class; note that xml_document - is a sub-class of xml_node, - so the entire node interface is also available. However, document node - is special in several ways, which will be covered below. There can be - only one document node in the tree; document node does not have any XML - representation.

+ node corresponds to xml_document + class; note that xml_document is + a sub-class of xml_node, so the entire + node interface is also available. However, document node is special in + several ways, which are covered below. There can be only one document + node in the tree; document node does not have any XML representation. +

@@ -87,13 +87,13 @@ is the most common type of node, which represents XML elements. Element nodes have a name, a collection of attributes and a collection of child nodes (both of which may be empty). The attribute is a simple name/value - pair. The example XML representation of element node is as follows: + pair. The example XML representation of element nodes is as follows:

<node attr="value"><child/></node>

- There are two element nodes here; one has name "node", + There are two element nodes here: one has name "node", single attribute "attr" and single child "child", another has name "child" @@ -102,10 +102,10 @@
Plain character data nodes (node_pcdata) represent plain text in XML. PCDATA nodes have a value, but do not have - name or children/attributes. Note that plain character data is not a - part of the element node but instead has its own node; for example, an - element node can have several child PCDATA nodes. The example XML representation - of text node is as follows: + a name or children/attributes. Note that plain character data is not + a part of the element node but instead has its own node; for example, + an element node can have several child PCDATA nodes. The example XML + representation of text nodes is as follows:
<node> text1 <child/> text2 </node>
 
@@ -128,9 +128,9 @@

Comment nodes (node_comment) represent - comments in XML. Comment nodes have a value, but do not have name or - children/attributes. The example XML representation of comment node is - as follows: + comments in XML. Comment nodes have a value, but do not have a name or + children/attributes. The example XML representation of a comment node + is as follows:

<!-- comment text -->

@@ -138,14 +138,14 @@ Here the comment node has value

"comment
           text"

. By default comment nodes are treated as non-essential part of XML markup and are not loaded during XML parsing. You can override - this behavior by adding parse_comments + this behavior with parse_comments flag.

Processing instruction node (node_pi) represent processing instructions (PI) in XML. PI nodes have a name and an optional value, but do not have children/attributes. The example XML representation - of PI node is as follows: + of a PI node is as follows:

<?name value?>

@@ -153,17 +153,17 @@ Here the name (also called PI target) is "name", and the value is "value". By default PI nodes are treated as non-essential part of XML markup and - are not loaded during XML parsing. You can override this behavior by adding - parse_pi flag. + are not loaded during XML parsing. You can override this behavior with + parse_pi flag.

Declaration node (node_declaration) represents document declarations in XML. Declaration nodes have a name ("xml") and an - optional collection of attributes, but does not have value or children. + optional collection of attributes, but do not have value or children. There can be only one declaration node in a document; moreover, it should be the topmost node (its parent should be the document). The example - XML representation of declaration node is as follows: + XML representation of a declaration node is as follows:

<?xml version="1.0"?>

@@ -172,12 +172,28 @@ and a single attribute with name "version" and value "1.0". By default declaration nodes are treated as non-essential part of XML markup - and are not loaded during XML parsing. You can override this behavior by - adding parse_declaration - flag. Also, by default a dummy declaration is output when XML document - is saved unless there is already a declaration in the document; you can - disable this by adding format_no_declaration - flag. + and are not loaded during XML parsing. You can override this behavior with + parse_declaration flag. Also, + by default a dummy declaration is output when XML document is saved unless + there is already a declaration in the document; you can disable this with + format_no_declaration flag. +

+ Document type declaration node (node_doctype) + represents document type declarations in XML. Document type declaration + nodes have a value, which corresponds to the entire document type contents; + no additional nodes are created for inner elements like <!ENTITY>. There can be only one document type + declaration node in a document; moreover, it should be the topmost node + (its parent should be the document). The example XML representation of + a document type declaration node is as follows: +

<!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]>
+

+ Here the node has value "greeting [ <!ELEMENT + greeting (#PCDATA)> ]". By default document type + declaration nodes are treated as non-essential part of XML markup and are + not loaded during XML parsing. You can override this behavior with parse_doctype flag.

Finally, here is a complete example of XML document and the corresponding @@ -227,40 +243,45 @@

Note

- All pugixml classes and functions are located in pugi + All pugixml classes and functions are located in the pugi namespace; you have to either use explicit name qualification (i.e. pugi::xml_node), or to gain access to relevant symbols via using directive (i.e. using pugi::xml_node; or using - namespace pugi;). The namespace will be omitted from declarations - in this documentation hereafter; all code examples will use fully-qualified - names. + namespace pugi;). The namespace will be omitted from all + declarations in this documentation hereafter; all code examples will use + fully qualified names.

@@ -352,12 +382,14 @@

There are two choices of interface and internal representation when configuring pugixml: you can either choose the UTF-8 (also called char) interface or - UTF-16/32 (also called wchar_t) one. The choice is controlled via PUGIXML_WCHAR_MODE define; you can set - it via pugiconfig.hpp or via preprocessor options, as discussed in Additional configuration - options. - If this define is set, the wchar_t interface is used; otherwise (by default) - the char interface is used. The exact wide character encoding is assumed - to be either UTF-16 or UTF-32 and is determined based on size of wchar_t type. + UTF-16/32 (also called wchar_t) one. The choice is controlled via PUGIXML_WCHAR_MODE + define; you can set it via pugiconfig.hpp or via preprocessor options, as + discussed in Additional configuration + options. If this define is set, the wchar_t + interface is used; otherwise (by default) the char interface is used. The + exact wide character encoding is assumed to be either UTF-16 or UTF-32 and + is determined based on the size of wchar_t + type.

@@ -365,9 +397,9 @@

Note
- If size of `wchar_t` is 2, pugixml - assumes UTF-16 encoding instead of UCS-2, which means that some characters - are represented as two code points. + If the size of `wchar_t` is + 2, pugixml assumes UTF-16 encoding instead of UCS-2, which means that some + characters are represented as two code points.

@@ -399,7 +431,7 @@ pugi::char_t upon document saving happen automatically, which also carries minor performance penalty. The general advice however is to select the character mode based on usage scenario, i.e. if UTF-8 is - inconvenient to process and most of your XML data is localized, wchar_t mode + inconvenient to process and most of your XML data is non-ASCII, wchar_t mode is probably a better choice.

@@ -410,13 +442,18 @@ std::wstring as_wide(const char* str);

- Both functions accept null-terminated string as an argument str, and return the converted string. + Both functions accept a null-terminated string as an argument str, and return the converted string. as_utf8 performs conversion from UTF-16/32 to UTF-8; as_wide performs conversion from UTF-8 to UTF-16/32. Invalid UTF sequences are silently discarded upon conversion. str has to be a valid string; passing null pointer results in undefined behavior. + There are also two overloads with the same semantics which accept a string + as an argument:

std::string as_utf8(const std::wstring& str);
+std::wstring as_wide(const std::string& str);
+

@@ -425,8 +462,8 @@

Most examples in this documentation assume char interface and therefore - will not compile with PUGIXML_WCHAR_MODE. - This is to simplify the documentation; usually the only changes you'll + will not compile with PUGIXML_WCHAR_MODE. + This is done to simplify the documentation; usually the only changes you'll have to make is to pass wchar_t string literals, i.e. instead of

@@ -453,7 +490,7 @@

- it is safe to call free functions from multiple threads + it is safe to call free (non-member) functions from multiple threads
it is safe to perform concurrent read-only accesses to the same tree @@ -470,7 +507,7 @@ structure and altering individual node/attribute data, i.e. changing names/values.

- The only exception is set_memory_management_functions; + The only exception is set_memory_management_functions; it modifies global variables and as such is not thread-safe. Its usage policy has more restrictions, see Custom memory allocation/deallocation functions. @@ -488,15 +525,16 @@ This is not applicable to functions that operate on STL strings or IOstreams; such functions have either strong guarantee (functions that operate on strings) or basic guarantee (functions that operate on streams). Also functions that - call user-defined callbacks (i.e. xml_node::traverse - or xml_node::find_node) do not provide any exception - guarantees beyond the ones provided by callback. + call user-defined callbacks (i.e. xml_node::traverse + or xml_node::find_node) do not + provide any exception guarantees beyond the ones provided by the callback.

- XPath functions may throw xpath_exception - on parsing error; also, XPath implementation uses STL, and thus may throw - i.e. std::bad_alloc in low memory conditions. Still, - XPath functions provide strong exception guarantee. + If exception handling is not disabled with PUGIXML_NO_EXCEPTIONS + define, XPath functions may throw xpath_exception + on parsing errors; also, XPath functions may throw std::bad_alloc + in low memory conditions. Still, XPath functions provide strong exception + guarantee.

@@ -514,10 +552,10 @@ functions

- All memory for tree structure/data is allocated via globally specified - functions, which default to malloc/free. You can set your own allocation - functions with set_memory_management functions. The function interfaces - are the same as that of malloc/free: + All memory for tree structure, tree data and XPath objects is allocated + via globally specified functions, which default to malloc/free. You can + set your own allocation functions with set_memory_management function. + The function interfaces are the same as that of malloc/free:

typedef void* (*allocation_function)(size_t size);
 typedef void (*deallocation_function)(void* ptr);
@@ -532,14 +570,18 @@

Allocation function is called with the size (in bytes) as an argument and - should return a pointer to memory block with alignment that is suitable - for pointer storage and size that is greater or equal to the requested - one. If the allocation fails, the function has to return null pointer (throwing - an exception from allocation function results in undefined behavior). Deallocation - function is called with the pointer that was returned by the previous call - or with a null pointer; null pointer deallocation should be handled as - a no-op. If memory management functions are not thread-safe, library thread - safety is not guaranteed. + should return a pointer to a memory block with alignment that is suitable + for storage of primitive types (usually a maximum of void* and double + types alignment is sufficient) and size that is greater than or equal to + the requested one. If the allocation fails, the function has to return + null pointer (throwing an exception from allocation function results in + undefined behavior). +

+ Deallocation function is called with the pointer that was returned by some + call to allocation function; it is never called with a null pointer. If + memory management functions are not thread-safe, library thread safety + is not guaranteed.

This is a simple example of custom memory management (samples/custom_memory_management.cpp): @@ -572,16 +614,6 @@ are destroyed, the new deallocation function will be called with the memory obtained by the old allocation function, resulting in undefined behavior.

- - - - - -

	Note
	- Currently memory for XPath objects is allocated using default operators - new/delete; this will change in the next version. -

@@ -590,7 +622,7 @@

Constructing a document object using the default constructor does not result - in any allocations; document node is stored inside the xml_document + in any allocations; document node is stored inside the xml_document object.

@@ -598,11 +630,11 @@ function is used (see Loading document from memory), a complete copy of character stream is made; all names/values of nodes and attributes are allocated in this buffer. This buffer is allocated via a single large allocation - and is only freed when document memory is reclaimed (i.e. if the xml_document object is destroyed or if - another document is loaded in the same object). Also when loading from - file or stream, an additional large allocation may be performed if encoding - conversion is required; a temporary buffer is allocated, and it is freed - before load function returns. + and is only freed when document memory is reclaimed (i.e. if the xml_document object is destroyed or if another + document is loaded in the same object). Also when loading from file or + stream, an additional large allocation may be performed if encoding conversion + is required; a temporary buffer is allocated, and it is freed before load + function returns.

All additional memory, such as memory for document structure (node/attribute @@ -632,7 +664,8 @@

pugixml 0.9 manual | +

+pugixml 1.0 manual | Overview | Installation | Document: -- cgit v1.2.3