From f9a2dec792d9a52e1b9004793cfca9b0a463049a Mon Sep 17 00:00:00 2001 From: "arseny.kapoulkine" Date: Sun, 11 Jul 2010 16:27:23 +0000 Subject: docs: Added generated HTML documentation git-svn-id: http://pugixml.googlecode.com/svn/trunk@596 99668b35-9821-0410-8761-19e4c4f06640 --- docs/manual.html | 191 ++++++++ docs/manual/access.html | 721 +++++++++++++++++++++++++++++ docs/manual/apiref.html | 1151 ++++++++++++++++++++++++++++++++++++++++++++++ docs/manual/changes.html | 574 +++++++++++++++++++++++ docs/manual/dom.html | 649 ++++++++++++++++++++++++++ docs/manual/install.html | 445 ++++++++++++++++++ docs/manual/loading.html | 840 +++++++++++++++++++++++++++++++++ docs/manual/modify.html | 541 ++++++++++++++++++++++ docs/manual/saving.html | 473 +++++++++++++++++++ docs/manual/toc.html | 130 ++++++ docs/manual/xpath.html | 494 ++++++++++++++++++++ docs/quickstart.html | 828 +++++++++++++++++++++++++++++++++ 12 files changed, 7037 insertions(+) create mode 100644 docs/manual.html create mode 100644 docs/manual/access.html create mode 100644 docs/manual/apiref.html create mode 100644 docs/manual/changes.html create mode 100644 docs/manual/dom.html create mode 100644 docs/manual/install.html create mode 100644 docs/manual/loading.html create mode 100644 docs/manual/modify.html create mode 100644 docs/manual/saving.html create mode 100644 docs/manual/toc.html create mode 100644 docs/manual/xpath.html create mode 100644 docs/quickstart.html diff --git a/docs/manual.html b/docs/manual.html new file mode 100644 index 0000000..940b1cc --- /dev/null +++ b/docs/manual.html @@ -0,0 +1,191 @@ + + + +pugixml 0.9 + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
Next
+
+
+ + +
+ +

+ pugixml is a light-weight C++ XML processing library. It consists of a DOM-like + interface with rich traversal/modification capabilities, an extremely fast + XML parser which constructs the DOM tree from an XML file/buffer, and an + XPath 1.0 implementation for complex data-driven tree queries. Full Unicode + support is also available, with two Unicode + interface variants and conversions between different Unicode encodings + (which happen automatically during parsing/saving). The library is extremely portable and easy to + integrate and use. pugixml is developed and maintained since 2006 and has + many users. All code is distributed under the MIT license, making it completely + free to use in both open-source and proprietary applications. +

+

+ pugixml enables very fast, convenient and memory-efficient XML document processing. + However, since pugixml has a DOM parser, it can't process XML documents that + do not fit in memory; also the parser is a non-validating one, so if you + need DTD/Schema validation, the library is not for you. +

+

+ This is the complete manual for pugixml, which describes all features of + the library in detail. If you want to start writing code as quickly as possible, + you are advised to read the quick start guide + first. +

+
+ + + + + +
[Note]Note

+ No documentation is perfect, neither is this one. If you encounter a description + that is unclear, please file an issue as described in Feedback. + Also if you can spare the time for a full proof-reading, including spelling + and grammar, that would be great! Please send me + an e-mail; as a token of appreciation, your name will be included + into the corresponding section + of this documentation. +

+
+
+ +

+ If you believe you've found a bug in pugixml (bugs include compilation problems + (errors/warnings), crashes, performance degradation and incorrect behavior), + please file an issue via issue + submission form. Be sure to include the relevant information so that + the bug can be reproduced: the version of pugixml, compiler version and target + architecture, the code that uses pugixml and exhibits the bug, etc. +

+

+ Feature requests can be reported the same way as bugs, so if you're missing + some functionality in pugixml or if the API is rough in some places and you + can suggest an improvement, file an issue. However please note that there + are many factors when considering API changes (compatibility with previous + versions, API redundancy, etc.), so generally features that can be implemented + via a small function without pugixml modification are not accepted. However, + all rules have exceptions. +

+

+ If you have a contribution to pugixml, such as build script for some build + system/IDE, or a well-designed set of helper functions, or a binding to some + language other than C++, please file an issue. You can include the relevant + patches as issue attachments. Your contribution has to be distributed under + the terms of a license that's compatible with pugixml license; i.e. GPL/LGPL + licensed code is not accepted. +

+

+ If filing an issue is not possible due to privacy or other concerns, you + can contact pugixml author by e-mail directly: arseny.kapoulkine@gmail.com. +

+
+
+ +

+ pugixml could not be developed without the help from many people; some of + them are listed in this section. If you've played a part in pugixml development + and you can not find yourself on this list, I'm truly sorry; please send me an e-mail so I can fix this. +

+

+ Thanks to Kristen Wegner for pugxml parser, + which was used as a basis for pugixml. +

+

+ Thanks to Neville Franks for contributions + to pugxml parser. +

+

+ Thanks to Artyom Palvelev for suggesting + a lazy gap contraction approach. +

+

+ Thanks to Vyacheslav Egorov for documentation + proofreading. +

+
+
+ +

+ The pugixml library is distributed under the MIT license: +

+
+

+ Copyright (c) 2006-2010 Arseny Kapoulkine +

+

+ Permission is hereby granted, free of charge, to any person obtaining a + copy of this software and associated documentation files (the "Software"), + to deal in the Software without restriction, including without limitation + the rights to use, copy, modify, merge, publish, distribute, sublicense, + and/or sell copies of the Software, and to permit persons to whom the Software + is furnished to do so, subject to the following conditions: +

+

+ The above copyright notice and this permission notice shall be included + in all copies or substantial portions of the Software. +

+

+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + IN THE SOFTWARE. +

+
+
+
+ + + +

Last revised: July 11, 2010 at 16:10:06 GMT

+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
Next
+ + diff --git a/docs/manual/access.html b/docs/manual/access.html new file mode 100644 index 0000000..4581583 --- /dev/null +++ b/docs/manual/access.html @@ -0,0 +1,721 @@ + + + +Accessing document data + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ + +

+ pugixml features an extensive interface for getting various types of data from + the document and for traversing the document. This section provides documentation + for all such functions that do not modify the tree except for XPath-related + functions; see XPath for XPath reference. As discussed in C++ interface, + there are two types of handles to tree data - xml_node + and xml_attribute. The handles have special + null (empty) values which propagate through various functions and thus are + useful for writing more concise code; see this description + for details. The documentation in this section will explicitly state the results + of all function in case of null inputs. +

+
+ +

+ The internal representation of the document is a tree, where each node has + a list of child nodes (the order of children corresponds to their order in + the XML representation), and additionally element nodes have a list of attributes, + which is also ordered. Several functions are provided in order to let you + get from one node in the tree to the other. These functions roughly correspond + to the internal representation, and thus are usually building blocks for + other methods of traversing (i.e. XPath traversals are based on these functions). +

+
xml_node xml_node::parent() const;
+xml_node xml_node::first_child() const;
+xml_node xml_node::last_child() const;
+xml_node xml_node::next_sibling() const;
+xml_node xml_node::previous_sibling() const;
+
+xml_attribute xml_node::first_attribute() const;
+xml_attribute xml_node::last_attribute() const;
+xml_attribute xml_attribute::next_attribute() const;
+xml_attribute xml_attribute::previous_attribute() const;
+
+

+ parent function returns the + node's parent; all nodes except the document have non-null parent. first_child and last_child + return the first and last child of the node, respectively; note that only + document nodes and element nodes can have non-empty child node list. If node + has no children, both functions return null nodes. next_sibling + and previous_sibling return + the node that's immediately to the right/left of this node in the children + list, respectively - for example, in <a/><b/><c/>, + calling next_sibling for + a handle that points to <b/> + results in a handle pointing to <c/>, + and calling previous_sibling + results in handle pointing to <a/>. + If node does not have next/previous sibling (this happens if it is the last/first + node in the list, respectively), the functions return null nodes. first_attribute, last_attribute, + next_attribute and previous_attribute functions behave the + same way as corresponding child node functions and allow to iterate through + attribute list in the same way. +

+
+ + + + + +
[Note]Note

+ Because of memory consumption reasons, attributes do not have a link to + their parent nodes. Thus there is no xml_attribute::parent() function. +

+

+ Calling any of the functions above on the null handle results in a null handle + - i.e. node.first_child().next_sibling() + returns the second child of node, + and null handle if there is no children at all or if there is only one. +

+

+ With these functions, you can iterate through all child nodes and display + all attributes like this (samples/traverse_base.cpp): +

+

+ +

+
for (pugi::xml_node tool = tools.first_child(); tool; tool = tool.next_sibling())
+{
+    std::cout << "Tool:";
+
+    for (pugi::xml_attribute attr = tool.first_attribute(); attr; attr = attr.next_attribute())
+    {
+        std::cout << " " << attr.name() << "=" << attr.value();
+    }
+
+    std::cout << std::endl;
+}
+
+

+

+
+
+ +

+ Apart from structural information (parent, child nodes, attributes), nodes + can have name and value, both of which are strings. Depending on node type, + name or value may be absent. node_document + nodes do not have name or value, node_element + and node_declaration nodes + always have a name but never have a value, node_pcdata, + node_cdata and node_comment nodes never have a name but + always have a value (it may be empty though), node_pi + nodes always have a name and a value (again, value may be empty). In order + to get node's name or value, you can use the following functions: +

+
const char_t* xml_node::name() const;
+const char_t* xml_node::value() const;
+
+

+ In case node does not have a name or value or if the node handle is null, + both functions return empty strings - they never return null pointers. +

+

+ It is common to store data as text contents of some node - i.e. <node><description>This is a node</description></node>. + In this case, <description> node does not have a value, but instead + has a child of type node_pcdata + with value "This is a node". + pugixml provides two helper functions to parse such data: +

+
const char_t* xml_node::child_value() const;
+const char_t* xml_node::child_value(const char_t* name) const;
+
+

+ child_value() + returns the value of the first child with type node_pcdata + or node_cdata; child_value(name) is + a simple wrapper for child(name).child_value(). + For the above example, calling node.child_value("description") and description.child_value() will both produce string "This is a node". If there is no + child with relevant type, or if the handle is null, child_value + functions return empty string. +

+

+ There is an example of using some of these functions at + the end of the next section. +

+
+
+ +

+ All attributes have name and value, both of which are strings (value may + be empty). There are two corresponding accessors, like for xml_node: +

+
const char_t* xml_attribute::name() const;
+const char_t* xml_attribute::value() const;
+
+

+ In case attribute handle is null, both functions return empty strings - they + never return null pointers. +

+

+ In many cases attribute values have types that are not strings - i.e. an + attribute may always contain values that should be treated as integers, despite + the fact that they are represented as strings in XML. pugixml provides several + accessors that convert attribute value to some other type. The accessors + are as follows: +

+
int xml_attribute::as_int() const;
+unsigned int xml_attribute::as_uint() const;
+double xml_attribute::as_double() const;
+float xml_attribute::as_float() const;
+bool xml_attribute::as_bool() const;
+
+

+ as_int, as_uint, + as_double and as_float convert attribute values to numbers. + If attribute handle is null or attribute value is empty, 0 + is returned. Otherwise, all leading whitespace characters are truncated, + and the remaining string is parsed as a decimal number (as_int + or as_uint) or as a floating + point number in either decimal or scientific form (as_double + or as_float). Any extra characters + are silently discarded, i.e. as_int + will return 1 for string "1abc". +

+

+ In case the input string contains a number that is out of the target numeric + range, the result is undefined. +

+
+ + + + + +
[Caution]Caution

+ Number conversion functions depend on current C locale as set with setlocale, so may return unexpected results + if the locale is different from "C". +

+

+ as_bool converts attribute + value to boolean as follows: if attribute handle is null or attribute value + is empty, false is returned. + Otherwise, true is returned + if first character is one of '1', 't', + 'T', 'y', 'Y'. + This means that strings like "true" + and "yes" are recognized + as true, while strings like + "false" and "no" are recognized as false. For more complex matching you'll have + to write your own function. +

+
+ + + + + +
[Note]Note

+ There are no portable 64-bit types in C++, so there is no corresponding + conversion function. If your platform has a 64-bit integer, you can easily + write a conversion function yourself. +

+

+ This is an example of using these functions, along with node data retrieval + ones (samples/traverse_base.cpp): +

+

+ +

+
for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+{
+    std::cout << "Tool " << tool.attribute("Filename").value();
+    std::cout << ": AllowRemote " << tool.attribute("AllowRemote").as_bool();
+    std::cout << ", Timeout " << tool.attribute("Timeout").as_int();
+    std::cout << ", Description '" << tool.child_value("Description") << "'\n";
+}
+
+

+

+
+
+ +

+ Since a lot of document traversal consists of finding the node/attribute + with the correct name, there are special functions for that purpose: +

+
xml_node xml_node::child(const char_t* name) const;
+xml_attribute xml_node::attribute(const char_t* name) const;
+xml_node xml_node::next_sibling(const char_t* name) const;
+xml_node xml_node::previous_sibling(const char_t* name) const;
+
+

+ child and attribute + return the first child/attribute with the specified name; next_sibling + and previous_sibling return + the first sibling in the corresponding direction with the specified name. + All string comparisons are case-sensitive. In case the node handle is null + or there is no node/attribute with the specified name, null handle is returned. +

+

+ child and next_sibling + functions can be used together to loop through all child nodes with the desired + name like this: +

+
for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+
+

+ Occasionally the needed node is specified not by the unique name but instead + by the value of some attribute; for example, it is common to have node collections + with each node having a unique id: <group><item id="1"/> <item id="2"/></group>. There are two functions for finding + child nodes based on the attribute values: +

+
xml_node xml_node::find_child_by_attribute(const char_t* name, const char_t* attr_name, const char_t* attr_value) const;
+xml_node xml_node::find_child_by_attribute(const char_t* attr_name, const char_t* attr_value) const;
+
+

+ The three-argument function returns the first child node with the specified + name which has an attribute with the specified name/value; the two-argument + function skips the name test for the node, which can be useful for searching + in heterogeneous collections. If the node handle is null or if no node is + found, null handle is returned. All string comparisons are case-sensitive. +

+

+ In all of the above functions, all arguments have to be valid strings; passing + null pointers results in undefined behavior. +

+

+ This is an example of using these functions (samples/traverse_base.cpp): +

+

+ +

+
std::cout << "Tool for *.dae generation: " << tools.find_child_by_attribute("Tool", "OutputFileMasks", "*.dae").attribute("Filename").value() << "\n";
+
+for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+{
+    std::cout << "Tool " << tool.attribute("Filename").value() << "\n";
+}
+
+

+

+
+
+ +

+ Child node lists and attribute lists are simply double-linked lists; while + you can use previous_sibling/next_sibling and other such functions for + iteration, pugixml additionally provides node and attribute iterators, so + that you can treat nodes as containers of other nodes or attributes: +

+
class xml_node_iterator;
+class xml_attribute_iterator;
+
+typedef xml_node_iterator xml_node::iterator;
+iterator xml_node::begin() const;
+iterator xml_node::end() const;
+
+typedef xml_attribute_iterator xml_node::attribute_iterator;
+attribute_iterator xml_node::attributes_begin() const;
+attribute_iterator xml_node::attributes_end() const;
+
+

+ begin and attributes_begin + return iterators that point to the first node/attribute, respectively; end and attributes_end + return past-the-end iterator for node/attribute list, respectively - this + iterator can't be dereferenced, but decrementing it results in an iterator + pointing to the last element in the list (except for empty lists, where decrementing + past-the-end iterator is not defined). Past-the-end iterator is commonly + used as a termination value for iteration loops (see sample below). If you + want to get an iterator that points to an existing handle, you can construct + the iterator with the handle as a single constructor argument, like so: + xml_node_iterator(node). + For xml_attribute_iterator, + you'll have to provide both an attribute and its parent node. +

+

+ begin and end + return equal iterators if called on null node; such iterators can't be dereferenced. + attributes_begin and attributes_end behave the same way. For + correct iterator usage this means that child node/attribute collections of + null nodes appear to be empty. +

+

+ Both types of iterators have bidirectional iterator semantics (i.e. they + can be incremented and decremented, but efficient random access is not supported) + and support all usual iterator operations - comparison, dereference, etc. + The iterators are invalidated if the node/attribute objects they're pointing + to are removed from the tree; adding nodes/attributes does not invalidate + any iterators. +

+

+ Here is an example of using iterators for document traversal (samples/traverse_iter.cpp): +

+

+ +

+
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
+{
+    std::cout << "Tool:";
+
+    for (pugi::xml_attribute_iterator ait = it->attributes_begin(); ait != it->attributes_end(); ++ait)
+    {
+        std::cout << " " << ait->name() << "=" << ait->value();
+    }
+
+    std::cout << std::endl;
+}
+
+

+

+
+ + + + + +
[Caution]Caution

+ Node and attribute iterators are somewhere in the middle between const + and non-const iterators. While dereference operation yields a non-constant + reference to the object, so that you can use it for tree modification operations, + modifying this reference by assignment - i.e. passing iterators to a function + like std::sort - will not give expected results, + as assignment modifies local handle that's stored in the iterator. +

+
+
+ +

+ The methods described above allow traversal of immediate children of some + node; if you want to do a deep tree traversal, you'll have to do it via a + recursive function or some equivalent method. However, pugixml provides a + helper for depth-first traversal of a subtree. In order to use it, you have + to implement xml_tree_walker + interface and to call traverse + function: +

+
class xml_tree_walker
+{
+public:
+    virtual bool begin(xml_node& node);
+    virtual bool for_each(xml_node& node) = 0;
+    virtual bool end(xml_node& node);
+
+    int depth() const;
+};
+
+bool xml_node::traverse(xml_tree_walker& walker);
+
+

+ The traversal is launched by calling traverse + function on traversal root and proceeds as follows: +

+
    +
  • + First, begin function + is called with traversal root as its argument. +
  • +
  • + Then, for_each function + is called for all nodes in the traversal subtree in depth first order, + excluding the traversal root. Node is passed as an argument. +
  • +
  • + Finally, end function + is called with traversal root as its argument. +
  • +
+

+ If begin, end + or any of the for_each calls + return false, the traversal + is terminated and false is returned + as the traversal result; otherwise, the traversal results in true. Note that you don't have to override + begin or end + functions; their default implementations return true. +

+

+ You can get the node's depth relative to the traversal root at any point + by calling depth function. + It returns -1 + if called from begin/end, and returns 0-based depth if called + from for_each - depth is + 0 for all children of the traversal root, 1 for all grandchildren and so + on. +

+

+ This is an example of traversing tree hierarchy with xml_tree_walker (samples/traverse_walker.cpp): +

+

+ +

+
struct simple_walker: pugi::xml_tree_walker
+{
+    virtual bool for_each(pugi::xml_node& node)
+    {
+        for (int i = 0; i < depth(); ++i) std::cout << "  "; // indentation
+
+        std::cout << node_types[node.type()] << ": name='" << node.name() << "', value='" << node.value() << "'\n";
+
+        return true; // continue traversal
+    }
+};
+
+

+

+

+ +

+
simple_walker walker;
+doc.traverse(walker);
+
+

+

+
+
+ +

+ While there are existing functions for getting a node/attribute with known + contents, they are often not sufficient for simple queries. As an alternative + to iterating manually through nodes/attributes until the needed one is found, + you can make a predicate and call one of find_ + functions: +

+
template <typename Predicate> xml_attribute xml_node::find_attribute(Predicate pred) const;
+template <typename Predicate> xml_node xml_node::find_child(Predicate pred) const;
+template <typename Predicate> xml_node xml_node::find_node(Predicate pred) const;
+
+

+ The predicate should be either a plain function or a function object which + accepts one argument of type xml_attribute + (for find_attribute) or + xml_node (for find_child and find_node), + and returns bool. The predicate + is never called with null handle as an argument. +

+

+ find_attribute function iterates + through all attributes of the specified node, and returns the first attribute + for which predicate returned true. + If predicate returned false + for all attributes or if there were no attributes (including the case where + the node is null), null attribute is returned. +

+

+ find_child function iterates + through all child nodes of the specified node, and returns the first node + for which predicate returned true. + If predicate returned false + for all nodes or if there were no child nodes (including the case where the + node is null), null node is returned. +

+

+ find_node function performs + a depth-first traversal through the subtree of the specified node (excluding + the node itself), and returns the first node for which predicate returned + true. If predicate returned + false for all nodes or if subtree + was empty, null node is returned. +

+

+ This is an example of using predicate-based functions (samples/traverse_predicate.cpp): +

+

+ +

+
bool small_timeout(pugi::xml_node node)
+{
+    return node.attribute("Timeout").as_int() < 20;
+}
+
+struct allow_remote_predicate
+{
+    bool operator()(pugi::xml_attribute attr) const
+    {
+        return strcmp(attr.name(), "AllowRemote") == 0;
+    }
+
+    bool operator()(pugi::xml_node node) const
+    {
+        return node.attribute("AllowRemote").as_bool();
+    }
+};
+
+

+

+

+ +

+
// Find child via predicate (looks for direct children only)
+std::cout << tools.find_child(allow_remote_predicate()).attribute("Filename").value() << std::endl;
+
+// Find node via predicate (looks for all descendants in depth-first order)
+std::cout << doc.find_node(allow_remote_predicate()).attribute("Filename").value() << std::endl;
+
+// Find attribute via predicate
+std::cout << tools.last_child().find_attribute(allow_remote_predicate()).value() << std::endl;
+
+// We can use simple functions instead of function objects
+std::cout << tools.find_child(small_timeout).attribute("Filename").value() << std::endl;
+
+

+

+
+
+ +

+ If you need to get the document root of some node, you can use the following + function: +

+
xml_node xml_node::root() const;
+
+

+ This function returns the node with type node_document, + which is the root node of the document the node belongs to (unless the node + is null, in which case null node is returned). Currently this function has + logarithmic complexity, since it simply finds such ancestor of the given + node which itself has no parent. +

+

+ While pugixml supports complex XPath expressions, sometimes a simple path + handling facility is needed. There are two functions, for getting node path + and for converting path to a node: +

+
string_t xml_node::path(char_t delimiter = '/') const;
+xml_node xml_node::first_element_by_path(const char_t* path, char_t delimiter = '/') const;
+
+

+ Node paths consist of node names, separated with a delimiter (which is / by default); also paths can contain self + (.) and parent (..) pseudo-names, so that this is a valid + path: "../../foo/./bar". + path returns the path to + the node from the document root, first_element_by_path + looks for a node represented by a given path; a path can be an absolute one + (absolute paths start with delimiter), in which case the rest of the path + is treated as document root relative, and relative to the given node. For + example, in the following document: <a><b><c/></b></a>, + node <c/> has path "a/b/c"; + calling first_element_by_path + for document with path "a/b" + results in node <b/>; calling first_element_by_path + for node <a/> with path "../a/./b/../." + results in node <a/>; calling first_element_by_path + with path "/a" results + in node <a/> for any node. +

+

+ In case path component is ambiguous (if there are two nodes with given name), + the first one is selected; paths are not guaranteed to uniquely identify + nodes in a document. If any component of a path is not found, the result + of first_element_by_path + is null node; also first_element_by_path + returns null node for null nodes, in which case the path does not matter. + path returns an empty string + for null nodes. +

+
+ + + + + +
[Note]Note

+ path function returns the + result as STL string, and thus is not available if PUGIXML_NO_STL + is defined. +

+

+ pugixml does not record row/column information for nodes upon parsing for + efficiency reasons. However, if the node has not changed in a significant + way since parsing (the name/value are not changed, and the node itself is + the original one, i.e. it was not deleted from the tree and re-added later), + it is possible to get the offset from the beginning of XML buffer: +

+
ptrdiff_t xml_node::offset_debug() const;
+
+

+ If the offset is not available (this happens if the node is null, was not + originally parsed from a stream, or has changed in a significant way), the + function returns -1. Otherwise it returns the offset to node's data from + the beginning of XML buffer in pugi::char_t + units. For more information on parsing offsets, see parsing + error handling documentation. +

+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/apiref.html b/docs/manual/apiref.html new file mode 100644 index 0000000..4648697 --- /dev/null +++ b/docs/manual/apiref.html @@ -0,0 +1,1151 @@ + + + +API Reference + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ +

+ This is the reference for all macros, types, enumerations, classes and functions + in pugixml. Each symbol is a link that leads to the relevant section of the + manual. +

+

+ Macros: +

+
+

+ Types: +

+
+

+ Enumerations: +

+ +

+ Constants: +

+
+

+ Classes: +

+
    +
  • + class xml_attribute +
    +
  • +
  • + class xml_node +
      +
    • + xml_node(); +

      + +
    • +
    • + bool empty() const; +
    • +
    • + operator unspecified_bool_type() const;

      + +
    • +
    • + bool operator==(const xml_node& + r) + const; +
    • +
    • + bool operator!=(const xml_node& + r) + const; +
    • +
    • + bool operator<(const xml_node& + r) + const; +
    • +
    • + bool operator>(const xml_node& + r) + const; +
    • +
    • + bool operator<=(const xml_node& + r) + const; +
    • +
    • + bool operator>=(const xml_node& + r) + const; +

      + +
    • +
    • + xml_node_type type() + const; +

      + +
    • +
    • + const char_t* name() const; +
    • +
    • + const char_t* value() const;

      + +
    • +
    • + xml_node parent() const; +
    • +
    • + xml_node first_child() const; +
    • +
    • + xml_node last_child() const; +
    • +
    • + xml_node next_sibling() const; +
    • +
    • + xml_node previous_sibling() const;

      + +
    • +
    • + xml_attribute first_attribute() const; +
    • +
    • + xml_attribute last_attribute() const;

      + +
    • +
    • + xml_node child(const char_t* + name) + const; +
    • +
    • + xml_attribute attribute(const char_t* name) const; +
    • +
    • + xml_node next_sibling(const char_t* + name) + const; +
    • +
    • + xml_node previous_sibling(const char_t* + name) + const; +
    • +
    • + xml_node find_child_by_attribute(const char_t* + name, + const char_t* attr_name, const + char_t* + attr_value) + const; +
    • +
    • + xml_node find_child_by_attribute(const char_t* + attr_name, + const char_t* attr_value) const;

      + +
    • +
    • + const char_t* child_value() const; +
    • +
    • + const char_t* child_value(const char_t* + name) + const; +

      + +
    • +
    • + typedef xml_node_iterator + iterator; +
    • +
    • + iterator begin() const; +
    • +
    • + iterator end() const;

      + +
    • +
    • + typedef xml_attribute_iterator + attribute_iterator; +
    • +
    • + attribute_iterator attributes_begin() const; +
    • +
    • + attribute_iterator attributes_end() const;

      + +
    • +
    • + bool traverse(xml_tree_walker& walker);

      + +
    • +
    • + template <typename Predicate> xml_attribute + find_attribute(Predicate + pred) + const; +
    • +
    • + template <typename Predicate> xml_node + find_child(Predicate + pred) + const; +
    • +
    • + template <typename Predicate> xml_node + find_node(Predicate + pred) + const; +

      + +
    • +
    • + string_t path(char_t + delimiter = + '/') + const; +
    • +
    • + xml_node xml_node::first_element_by_path(const char_t* + path, + char_t delimiter + = '/') const; +
    • +
    • + xml_node root() const; +
    • +
    • + ptrdiff_t offset_debug() const;

      + +
    • +
    • + bool set_name(const char_t* + rhs); +
    • +
    • + bool set_value(const char_t* + rhs); +

      + +
    • +
    • + xml_attribute append_attribute(const char_t* + name); +
    • +
    • + xml_attribute insert_attribute_after(const char_t* + name, + const xml_attribute& attr); +
    • +
    • + xml_attribute insert_attribute_before(const char_t* + name, + const xml_attribute& attr);

      + +
    • +
    • + xml_node append_child(xml_node_type + type = + node_element); +
    • +
    • + xml_node insert_child_after(xml_node_type + type, + const xml_node& node); +
    • +
    • + xml_node insert_child_before(xml_node_type + type, + const xml_node& node);

      + +
    • +
    • + xml_attribute append_copy(const xml_attribute& proto); +
    • +
    • + xml_attribute insert_copy_after(const xml_attribute& + proto, + const xml_attribute& attr); +
    • +
    • + xml_attribute insert_copy_before(const xml_attribute& + proto, + const xml_attribute& attr);

      + +
    • +
    • + xml_node append_copy(const xml_node& + proto); +
    • +
    • + xml_node insert_copy_after(const xml_node& + proto, + const xml_node& node); +
    • +
    • + xml_node insert_copy_before(const xml_node& + proto, + const xml_node& node);

      + +
    • +
    • + bool remove_attribute(const xml_attribute& + a); +
    • +
    • + bool remove_attribute(const char_t* + name); +
    • +
    • + bool remove_child(const xml_node& + n); +
    • +
    • + bool remove_child(const char_t* + name); +

      + +
    • +
    • + void print(xml_writer& writer, const + char_t* + indent = + "\t", + unsigned int + flags = + format_default, + xml_encoding encoding + = encoding_auto, unsigned + int depth + = 0) const; +
    • +
    • + void print(std::ostream& os, const + char_t* + indent = + "\t", + unsigned int + flags = + format_default, + xml_encoding encoding + = encoding_auto, unsigned + int depth + = 0) const; +
    • +
    • + void print(std::wostream& os, const + char_t* + indent = + "\t", + unsigned int + flags = + format_default, + unsigned int + depth = + 0) + const; +

      + +
    • +
    • + xpath_node select_single_node(const char_t* + query) + const; +
    • +
    • + xpath_node select_single_node(const xpath_query& + query) + const; +
    • +
    • + xpath_node_set select_nodes(const char_t* + query) + const; +
    • +
    • + xpath_node_set select_nodes(const xpath_query& + query) + const; +

      + +
    • +
    +
  • +
  • + class xml_document +
      +
    • + xml_document(); +
    • +
    • + ~xml_document();

      + +
    • +
    • + xml_parse_result load(std::istream& + stream, + unsigned int + options = + parse_default, + xml_encoding encoding + = encoding_auto); +
    • +
    • + xml_parse_result load(std::wistream& + stream, + unsigned int + options = + parse_default); +

      + +
    • +
    • + xml_parse_result load(const char_t* contents, unsigned + int options + = parse_default);

      + +
    • +
    • + xml_parse_result load_file(const char* path, unsigned + int options + = parse_default, xml_encoding + encoding = + encoding_auto); +

      + +
    • +
    • + xml_parse_result load_buffer(const void* contents, + size_t size, unsigned + int options + = parse_default, xml_encoding + encoding = + encoding_auto); +
    • +
    • + xml_parse_result load_buffer_inplace(void* contents, size_t + size, + unsigned int + options = + parse_default, + xml_encoding encoding + = encoding_auto); +
    • +
    • + xml_parse_result load_buffer_inplace_own(void* contents, size_t + size, + unsigned int + options = + parse_default, + xml_encoding encoding + = encoding_auto);

      + +
    • +
    • + bool save_file(const char* path, + const char_t* indent + = "\t", unsigned + int flags + = format_default, xml_encoding + encoding = + encoding_auto) + const; +

      + +
    • +
    • + void save(std::ostream& stream, const + char_t* + indent = + "\t", + unsigned int + flags = + format_default, + xml_encoding encoding + = encoding_auto) const; +
    • +
    • + void save(std::wostream& stream, const + char_t* + indent = + "\t", + unsigned int + flags = + format_default) + const; +

      + +
    • +
    • + void save(xml_writer& writer, const + char_t* + indent = + "\t", + unsigned int + flags = + format_default, + xml_encoding encoding + = encoding_auto) const;

      + +
    • +
    +
  • +
  • + struct xml_parse_result +
    +
  • +
  • + class xml_node_iterator +
  • +
  • + class xml_attribute_iterator +

    + +
  • +
  • + class xml_tree_walker +
      +
    • + virtual bool + begin(xml_node& node); +
    • +
    • + virtual bool + for_each(xml_node& node) = 0; +
    • +
    • + virtual bool + end(xml_node& node);

      + +
    • +
    • + int depth() const;

      + +
    • +
    +
  • +
  • + class xml_writer +
    • + virtual void + write(const void* data, + size_t size) = 0; +

      + +
    +
  • +
  • + class xml_writer_file: public xml_writer +
    +
  • +
  • + class xml_writer_stream: public xml_writer +
    +
  • +
  • + class xpath_query +
    +
  • +
  • + class xpath_exception: public std::exception +
    • + virtual const + char* + what() const + throw(); +

      + +
    +
  • +
  • + class xpath_node +
    +
  • +
  • + class xpath_node_set +
    +
  • +
+

+ Functions: +

+ +
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/changes.html b/docs/manual/changes.html new file mode 100644 index 0000000..48e8325 --- /dev/null +++ b/docs/manual/changes.html @@ -0,0 +1,574 @@ + + + +Changelog + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ +
+ + 1.07.2010 - version + 0.9 +
+

+ Major release, featuring extended and improved Unicode support, miscellaneous + performance improvements, bug fixes and more. +

+
    +
  • + Major Unicode improvements: +
      +
    1. + Introduced encoding support (automatic/manual encoding detection + on load, manual encoding selection on save, conversion from/to UTF8, + UTF16 LE/BE, UTF32 LE/BE) +
    2. +
    3. + Introduced wchar_t mode (you can set PUGIXML_WCHAR_MODE define to + switch pugixml internal encoding from UTF8 to wchar_t; all functions + are switched to their Unicode variants) +
    4. +
    5. + Load/save functions now support wide streams +
    6. +
    +
  • +
  • + Bug fixes: +
      +
    1. + Fixed document corruption on failed parsing bug +
    2. +
    3. + XPath string <-> number conversion improvements (increased + precision, fixed crash for huge numbers) +
    4. +
    5. + Improved DOCTYPE parsing: now parser recognizes all well-formed DOCTYPE + declarations +
    6. +
    7. + Fixed xml_attribute::as_uint() for large numbers (i.e. 2^32-1) +
    8. +
    9. + Fixed xml_node::first_element_by_path for path components that are + prefixes of node names, but are not exactly equal to them. +
    10. +
    +
  • +
  • + Specification changes: +
      +
    1. + parse() API changed to load_buffer/load_buffer_inplace/load_buffer_inplace_own; + load_buffer APIs do not require zero-terminated strings. +
    2. +
    3. + Renamed as_utf16 to as_wide +
    4. +
    5. + Changed xml_node::offset_debug return type and xml_parse_result::offset + type to ptrdiff_t +
    6. +
    7. + Nodes/attributes with empty names are now printed as :anonymous +
    8. +
    +
  • +
  • + Performance improvements: +
      +
    1. + Optimized document parsing and saving +
    2. +
    3. + Changed internal memory management: internal allocator is used for + both metadata and name/value data; allocated pages are deleted if + all allocations from them are deleted +
    4. +
    5. + Optimized memory consumption: sizeof(xml_node_struct) reduced from + 40 bytes to 32 bytes on x86 +
    6. +
    7. + Optimized debug mode parsing/saving by order of magnitude +
    8. +
    +
  • +
  • + Miscellaneous: +
      +
    1. + All STL includes except <exception> in pugixml.hpp are replaced + with forward declarations +
    2. +
    3. + xml_node::remove_child and xml_node::remove_attribute now return + the operation result +
    4. +
    +
  • +
  • + Compatibility: +
      +
    1. + parse() and as_utf16 are left for compatibility (these functions + are deprecated and will be removed in version 1.0) +
    2. +
    3. + Wildcard functions, document_order/precompute_document_order functions, + all_elements_by_name function and format_write_bom_utf8 flag are + deprecated and will be removed in version 1.0 +
    4. +
    5. + xpath_type_t enumeration was renamed to xpath_value_type; xpath_type_t + is deprecated and will be removed in version 1.0 +
    6. +
    +
  • +
+
+ + 8.11.2009 - version + 0.5 +
+

+ Major bugfix release. Changes: +

+
    +
  • + XPath bugfixes: +
      +
    1. + Fixed translate(), lang() and concat() functions (infinite loops/crashes) +
    2. +
    3. + Fixed compilation of queries with empty literal strings ("") +
    4. +
    5. + Fixed axis tests: they never add empty nodes/attributes to the resulting + node set now +
    6. +
    7. + Fixed string-value evaluation for node-set (the result excluded some + text descendants) +
    8. +
    9. + Fixed self:: axis (it behaved like ancestor-or-self::) +
    10. +
    11. + Fixed following:: and preceding:: axes (they included descendent + and ancestor nodes, respectively) +
    12. +
    13. + Minor fix for namespace-uri() function (namespace declaration scope + includes the parent element of namespace declaration attribute) +
    14. +
    15. + Some incorrect queries are no longer parsed now (i.e. foo: *) +
    16. +
    17. + Fixed text()/etc. node test parsing bug (i.e. foo[text()] failed + to compile) +
    18. +
    19. + Fixed root step (/) - it now selects empty node set if query is evaluated + on empty node +
    20. +
    21. + Fixed string to number conversion ("123 " converted to + NaN, "123 .456" converted to 123.456 - now the results + are 123 and NaN, respectively) +
    22. +
    23. + Node set copying now preserves sorted type; leads to better performance + on some queries +
    24. +
    +
  • +
  • + Miscellaneous bugfixes: +
      +
    1. + Fixed xml_node::offset_debug for PI nodes +
    2. +
    3. + Added empty attribute checks to xml_node::remove_attribute +
    4. +
    5. + Fixed node_pi and node_declaration copying +
    6. +
    7. + Const-correctness fixes +
    8. +
    +
  • +
  • + Specification changes: +
      +
    1. + xpath_node::select_nodes() and related functions now throw exception + if expression return type is not node set (instead of assertion) +
    2. +
    3. + xml_node::traverse() now sets depth to -1 for both begin() and end() + callbacks (was 0 at begin() and -1 at end()) +
    4. +
    5. + In case of non-raw node printing a newline is output after PCDATA + inside nodes if the PCDATA has siblings +
    6. +
    7. + UTF8 -> wchar_t conversion now considers 5-byte UTF8-like sequences + as invalid +
    8. +
    +
  • +
  • + New features: +
      +
    1. + Added xpath_node_set::operator[] for index-based iteration +
    2. +
    3. + Added xpath_query::return_type() +
    4. +
    5. + Added getter accessors for memory-management functions +
    6. +
    +
  • +
+
+ + 17.09.2009 - version + 0.42 +
+

+ Maintenance release. Changes: +

+
    +
  • + Bug fixes: +
      +
    1. + Fixed deallocation in case of custom allocation functions or if delete[] + / free are incompatible +
    2. +
    3. + XPath parser fixed for incorrect queries (i.e. incorrect XPath queries + should now always fail to compile) +
    4. +
    5. + Const-correctness fixes for find_child_by_attribute +
    6. +
    7. + Improved compatibility (miscellaneous warning fixes, fixed cstring + include dependency for GCC) +
    8. +
    9. + Fixed iterator begin/end and print function to work correctly for + empty nodes +
    10. +
    +
  • +
  • + New features: +
      +
    1. + Added PUGIXML_API/PUGIXML_CLASS/PUGIXML_FUNCTION configuration macros + to control class/function attributes +
    2. +
    3. + Added xml_attribute::set_value overloads for different types +
    4. +
    +
  • +
+
+ + 8.02.2009 - version + 0.41 +
+

+ Maintenance release. Changes: +

+
  • + Bug fixes: +
    1. + Fixed bug with node printing (occasionally some content was not written + to output stream) +
    +
+
+ + 18.01.2009 - version + 0.4 +
+

+ Changes: +

+
    +
  • + Bug fixes: +
      +
    1. + Documentation fix in samples for parse() with manual lifetime control +
    2. +
    3. + Fixed document order sorting in XPath (it caused wrong order of nodes + after xpath_node_set::sort and wrong results of some XPath queries) +
    4. +
    +
  • +
  • + Node printing changes: +
      +
    1. + Single quotes are no longer escaped when printing nodes +
    2. +
    3. + Symbols in second half of ASCII table are no longer escaped when + printing nodes; because of this, format_utf8 flag is deleted as it's + no longer needed and format_write_bom is renamed to format_write_bom_utf8. +
    4. +
    5. + Reworked node printing - now it works via xml_writer interface; implementations + for FILE* and std::ostream are available. As a side-effect, xml_document::save_file + now works without STL. +
    6. +
    +
  • +
  • + New features: +
      +
    1. + Added unsigned integer support for attributes (xml_attribute::as_uint, + xml_attribute::operator=) +
    2. +
    3. + Now document declaration (<?xml ...?>) is parsed as node with + type node_declaration when parse_declaration flag is specified (access + to encoding/version is performed as if they were attributes, i.e. + doc.child("xml").attribute("version").as_float()); + corresponding flags for node printing were also added +
    4. +
    5. + Added support for custom memory management (see set_memory_management_functions + for details) +
    6. +
    7. + Implemented node/attribute copying (see xml_node::insert_copy_* and + xml_node::append_copy for details) +
    8. +
    9. + Added find_child_by_attribute and find_child_by_attribute_w to simplify + parsing code in some cases (i.e. COLLADA files) +
    10. +
    11. + Added file offset information querying for debugging purposes (now + you're able to determine exact location of any xml_node in parsed + file, see xml_node::offset_debug for details) +
    12. +
    13. + Improved error handling for parsing - now load(), load_file() and + parse() return xml_parse_result, which contains error code and last + parsed offset; this does not break old interface as xml_parse_result + can be implicitly casted to bool. +
    14. +
    +
  • +
+
+ + 31.10.2007 - version + 0.34 +
+

+ Maintenance release. Changes: +

+
    +
  • + Bug fixes: +
      +
    1. + Fixed bug with loading from text-mode iostreams +
    2. +
    3. + Fixed leak when transfer_ownership is true and parsing is failing +
    4. +
    5. + Fixed bug in saving (\r and \n are now escaped in attribute values) +
    6. +
    7. + Renamed free() to destroy() - some macro conflicts were reported +
    8. +
    +
  • +
  • + New features: +
      +
    1. + Improved compatibility (supported Digital Mars C++, MSVC 6, CodeWarrior + 8, PGI C++, Comeau, supported PS3 and XBox360) +
    2. +
    3. + PUGIXML_NO_EXCEPTION flag for platforms without exception handling +
    4. +
    +
  • +
+
+ + 21.02.2007 - version + 0.3 +
+

+ Refactored, reworked and improved version. Changes: +

+
    +
  • + Interface: +
      +
    1. + Added XPath +
    2. +
    3. + Added tree modification functions +
    4. +
    5. + Added no STL compilation mode +
    6. +
    7. + Added saving document to file +
    8. +
    9. + Refactored parsing flags +
    10. +
    11. + Removed xml_parser class in favor of xml_document +
    12. +
    13. + Added transfer ownership parsing mode +
    14. +
    15. + Modified the way xml_tree_walker works +
    16. +
    17. + Iterators are now non-constant +
    18. +
    +
  • +
  • + Implementation: +
      +
    1. + Support of several compilers and platforms +
    2. +
    3. + Refactored and sped up parsing core +
    4. +
    5. + Improved standard compliancy +
    6. +
    7. + Added XPath implementation +
    8. +
    9. + Fixed several bugs +
    10. +
    +
  • +
+
+ + 6.11.2006 - version + 0.2 +
+

+ First public release. Changes: +

+
    +
  • + Bug fixes: +
      +
    1. + Fixed child_value() (for empty nodes) +
    2. +
    3. + Fixed xml_parser_impl warning at W4 +
    4. +
    +
  • +
  • + New features: +
      +
    1. + Introduced child_value(name) and child_value_w(name) +
    2. +
    3. + parse_eol_pcdata and parse_eol_attribute flags + parse_minimal optimizations +
    4. +
    5. + Optimizations of strconv_t +
    6. +
    +
  • +
+
+ + 15.07.2006 - version + 0.1 +
+

+ First private release for testing purposes +

+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/dom.html b/docs/manual/dom.html new file mode 100644 index 0000000..e4f1579 --- /dev/null +++ b/docs/manual/dom.html @@ -0,0 +1,649 @@ + + + +Document object model + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ + +

+ pugixml stores XML data in DOM-like way: the entire XML document (both document + structure and element data) is stored in memory as a tree. The tree can be + loaded from character stream (file, string, C++ I/O stream), then traversed + via special API or XPath expressions. The whole tree is mutable: both node + structure and node/attribute data can be changed at any time. Finally, the + result of document transformations can be saved to a character stream (file, + C++ I/O stream or custom transport). +

+
+ +

+ The XML document is represented with a tree data structure. The root of the + tree is the document itself, which corresponds to C++ type xml_document. Document has one or more + child nodes, which correspond to C++ type xml_node. + Nodes have different types; depending on a type, a node can have a collection + of child nodes, a collection of attributes, which correspond to C++ type + xml_attribute, and some additional + data (i.e. name). +

+

+ The tree nodes can be of one of the following types (which together form + the enumeration xml_node_type): +

+
    +
  • + Document node ( node_document) - this + is the root of the tree, which consists of several child nodes. This + node corresponds to xml_document + class; note that xml_document + is a sub-class of xml_node, + so the entire node interface is also available. However, document node + is special in several ways, which will be covered below. There can be + only one document node in the tree; document node does not have any XML + representation.

    + +
  • +
  • + Element/tag node ( node_element) - this + is the most common type of node, which represents XML elements. Element + nodes have a name, a collection of attributes and a collection of child + nodes (both of which may be empty). The attribute is a simple name/value + pair. The example XML representation of element node is as follows: +
  • +
+
<node attr="value"><child/></node>
+
+

+ There are two element nodes here; one has name "node", + single attribute "attr" + and single child "child", + another has name "child" + and does not have any attributes or child nodes. +

+
  • + Plain character data nodes ( node_pcdata) + represent plain text in XML. PCDATA nodes have a value, but do not have + name or children/attributes. Note that plain character data is not a + part of the element node but instead has its own node; for example, an + element node can have several child PCDATA nodes. The example XML representation + of text node is as follows: +
+
<node> text1 <child/> text2 </node>
+
+

+ Here "node" element + has three children, two of which are PCDATA nodes with values "text1" and "text2". +

+
  • + Character data nodes ( node_cdata) represent + text in XML that is quoted in a special way. CDATA nodes do not differ + from PCDATA nodes except in XML representation - the above text example + looks like this with CDATA: +
+
<node> <![CDATA[[text1]]> <child/> <![CDATA[[text2]]> </node>
+
+

+ CDATA nodes make it easy to include non-escaped <, & and > characters + in plain text. CDATA value can not contain the character sequence ]]>, + since it is used to determine the end of node contents. +

+
  • + Comment nodes ( node_comment) represent + comments in XML. Comment nodes have a value, but do not have name or + children/attributes. The example XML representation of comment node is + as follows: +
+
<!-- comment text -->
+
+

+ Here the comment node has value "comment + text". By default comment nodes are treated as non-essential + part of XML markup and are not loaded during XML parsing. You can override + this behavior by adding parse_comments + flag. +

+
  • + Processing instruction node ( node_pi) represent + processing instructions (PI) in XML. PI nodes have a name and an optional + value, but do not have children/attributes. The example XML representation + of PI node is as follows: +
+
<?name value?>
+
+

+ Here the name (also called PI target) is "name", + and the value is "value". + By default PI nodes are treated as non-essential part of XML markup and + are not loaded during XML parsing. You can override this behavior by adding + parse_pi flag. +

+
  • + Declaration node ( node_declaration) + represents document declarations in XML. Declaration nodes have a name + ("xml") and an + optional collection of attributes, but does not have value or children. + There can be only one declaration node in a document; moreover, it should + be the topmost node (its parent should be the document). The example + XML representation of declaration node is as follows: +
+
<?xml version="1.0"?>
+
+

+ Here the node has name "xml" + and a single attribute with name "version" + and value "1.0". + By default declaration nodes are treated as non-essential part of XML markup + and are not loaded during XML parsing. You can override this behavior by + adding parse_declaration + flag. Also, by default a dummy declaration is output when XML document + is saved unless there is already a declaration in the document; you can + disable this by adding format_no_declaration + flag. +

+

+ Finally, here is a complete example of XML document and the corresponding + tree representation (samples/tree.xml): +

+
++++ + + + + +
+

+ +

+
<?xml version="1.0"?>
+<mesh name="mesh_root">
+    <!-- here is a mesh node -->
+    some text
+    <![CDATA[someothertext]]>
+    some more text
+    <node attr1="value1" attr2="value2" />
+    <node attr1="value2">
+        <innernode/>
+    </node>
+</mesh>
+<?include somedata?>
+
+

+

+
+

+ dom_tree_thumb +

+
+
+
+ +
+ + + + + +
[Note]Note

+ All pugixml classes and functions are located in pugi + namespace; you have to either use explicit name qualification (i.e. pugi::xml_node), or to gain access to relevant + symbols via using directive + (i.e. using pugi::xml_node; or using + namespace pugi;). The namespace will be omitted from declarations + in this documentation hereafter; all code examples will use fully-qualified + names. +

+

+ Despite the fact that there are several node types, there are only three + C++ types representing the tree (xml_document, + xml_node, xml_attribute); + some operations on xml_node + are only valid for certain node types. They are described below. +

+

+ xml_document is the owner + of the entire document structure; it is a non-copyable class. The interface + of xml_document consists + of loading functions (see Loading document), saving functions (see Saving document) + and the interface of xml_node, + which allows for document inspection and/or modification. Note that while + xml_document is a sub-class + of xml_node, xml_node is not a polymorphic type; the + inheritance is only used to simplify usage. +

+

+ Default constructor of xml_document + initializes the document to the tree with only a root node (document node). + You can then populate it with data using either tree modification functions + or loading functions; all loading functions destroy the previous tree with + all occupied memory, which puts existing nodes/attributes from this document + to invalid state. Destructor of xml_document + also destroys the tree, thus the lifetime of the document object should exceed + the lifetimes of any node/attribute handles that point to the tree. +

+
+ + + + + +
[Caution]Caution

+ While technically node/attribute handles can be alive when the tree they're + referring to is destroyed, calling any member function of these handles + results in undefined behavior. Thus it is recommended to make sure that + the document is destroyed only after all references to its nodes/attributes + are destroyed. +

+

+ xml_node is the handle to + document node; it can point to any node in the document, including document + itself. There is a common interface for nodes of all types; the actual node + type can be queried via xml_node::type() method. Note that xml_node + is only a handle to the actual node, not the node itself - you can have several + xml_node handles pointing + to the same underlying object. Destroying xml_node + handle does not destroy the node and does not remove it from the tree. The + size of xml_node is equal + to that of a pointer, so it is nothing more than a lightweight wrapper around + pointer; you can safely pass or return xml_node + objects by value without additional overhead. +

+

+ There is a special value of xml_node + type, known as null node or empty node (such nodes have type node_null). It does not correspond to any + node in any document, and thus resembles null pointer. However, all operations + are defined on empty nodes; generally the operations don't do anything and + return empty nodes/attributes or empty strings as their result (see documentation + for specific functions for more detailed information). This is useful for + chaining calls; i.e. you can get the grandparent of a node like so: node.parent().parent(); if a node is a null node or it does not + have a parent, the first parent() call returns null node; the second parent() + call then also returns null node, so you don't have to check for errors twice. +

+

+ xml_attribute is the handle + to an XML attribute; it has the same semantics as xml_node, + i.e. there can be several xml_attribute + handles pointing to the same underlying object, there is a special null attribute + value, which propagates to function results. +

+

+ Both xml_node and xml_attribute have the default constructor + which initializes them to null objects. +

+

+ xml_node and xml_attribute try to behave like pointers, + that is, they can be compared with other objects of the same type, making + it possible to use them as keys of associative containers. All handles to + the same underlying object are equal, and any two handles to different underlying + objects are not equal. Null handles only compare as equal to themselves. + The result of relational comparison can not be reliably determined from the + order of nodes in file or other ways. Do not use relational comparison operators + except for search optimization (i.e. associative container keys). +

+

+ Additionally handles they can be implicitly cast to boolean-like objects, + so that you can test if the node/attribute is empty by just doing if (node) { ... + } or if + (!node) { ... + } else { ... }. + Alternatively you can check if a given xml_node/xml_attribute handle is null by calling + the following methods: +

+
bool xml_attribute::empty() const;
+bool xml_node::empty() const;
+
+

+ Nodes and attributes do not exist outside of document tree, so you can't + create them without adding them to some document. Once underlying node/attribute + objects are destroyed, the handles to those objects become invalid. While + this means that destruction of the entire tree invalidates all node/attribute + handles, it also means that destroying a subtree (by calling remove_child) or removing an attribute + invalidates the corresponding handles. There is no way to check handle validity; + you have to ensure correctness through external mechanisms. +

+
+
+ +

+ There are two choices of interface and internal representation when configuring + pugixml: you can either choose the UTF-8 (also called char) interface or + UTF-16/32 (also called wchar_t) one. The choice is controlled via PUGIXML_WCHAR_MODE define; you can set + it via pugiconfig.hpp or via preprocessor options, as discussed in Additional configuration + options. + If this define is set, the wchar_t interface is used; otherwise (by default) + the char interface is used. The exact wide character encoding is assumed + to be either UTF-16 or UTF-32 and is determined based on size of wchar_t type. +

+
+ + + + + +
[Note]Note

+ If size of wchar_t is 2, pugixml + assumes UTF-16 encoding instead of UCS-2, which means that some characters + are represented as two code points. +

+

+ All tree functions that work with strings work with either C-style null terminated + strings or STL strings of the selected character type. For example, node + name accessors look like this in char mode: +

+
const char* xml_node::name() const;
+bool xml_node::set_name(const char* value);
+
+

+ and like this in wchar_t mode: +

+
const wchar_t* xml_node::name() const;
+bool xml_node::set_name(const wchar_t* value);
+
+

+ There is a special type, pugi::char_t, + that is defined as the character type and depends on the library configuration; + it will be also used in the documentation hereafter. There is also a type + pugi::string_t, which is defined as the STL string + of the character type; it corresponds to std::string + in char mode and to std::wstring in wchar_t mode. +

+

+ In addition to the interface, the internal implementation changes to store + XML data as pugi::char_t; this means that these two modes + have different memory usage characteristics. The conversion to pugi::char_t upon document loading and from + pugi::char_t upon document saving happen automatically, + which also carries minor performance penalty. The general advice however + is to select the character mode based on usage scenario, i.e. if UTF-8 is + inconvenient to process and most of your XML data is localized, wchar_t mode + is probably a better choice. +

+

+ There are cases when you'll have to convert string data between UTF-8 and + wchar_t encodings; the following helper functions are provided for such purposes: +

+
std::string as_utf8(const wchar_t* str);
+std::wstring as_wide(const char* str);
+
+

+ Both functions accept null-terminated string as an argument str, and return the converted string. + as_utf8 performs conversion + from UTF-16/32 to UTF-8; as_wide + performs conversion from UTF-8 to UTF-16/32. Invalid UTF sequences are silently + discarded upon conversion. str + has to be a valid string; passing null pointer results in undefined behavior. +

+
+ + + + + +
[Note]Note
+

+ Most examples in this documentation assume char interface and therefore + will not compile with PUGIXML_WCHAR_MODE. + This is to simplify the documentation; usually the only changes you'll + have to make is to pass wchar_t + string literals, i.e. instead of +

+

+ pugi::xml_node node + = doc.child("bookstore").find_child_by_attribute("book", "id", "12345"); +

+

+ you'll have to do +

+

+ pugi::xml_node node + = doc.child(L"bookstore").find_child_by_attribute(L"book", L"id", L"12345"); +

+
+
+
+ +

+ Almost all functions in pugixml have the following thread-safety guarantees: +

+
    +
  • + it is safe to call free functions from multiple threads +
  • +
  • + it is safe to perform concurrent read-only accesses to the same tree + (all constant member functions do not modify the tree) +
  • +
  • + it is safe to perform concurrent read/write accesses, if there is only + one read or write access to the single tree at a time +
  • +
+

+ Concurrent modification and traversing of a single tree requires synchronization, + for example via reader-writer lock. Modification includes altering document + structure and altering individual node/attribute data, i.e. changing names/values. +

+

+ The only exception is set_memory_management_functions; + it modifies global variables and as such is not thread-safe. Its usage policy + has more restrictions, see Custom memory allocation/deallocation + functions. +

+
+
+ +

+ With the exception of XPath, pugixml itself does not throw any exceptions. + Additionally, most pugixml functions have a no-throw exception guarantee. +

+

+ This is not applicable to functions that operate on STL strings or IOstreams; + such functions have either strong guarantee (functions that operate on strings) + or basic guarantee (functions that operate on streams). Also functions that + call user-defined callbacks (i.e. xml_node::traverse + or xml_node::find_node) do not provide any exception + guarantees beyond the ones provided by callback. +

+

+ XPath functions may throw xpath_exception + on parsing error; also, XPath implementation uses STL, and thus may throw + i.e. std::bad_alloc in low memory conditions. Still, + XPath functions provide strong exception guarantee. +

+
+
+ +

+ pugixml requests the memory needed for document storage in big chunks, and + allocates document data inside those chunks. This section discusses replacing + functions used for chunk allocation and internal memory management implementation. +

+
+ +

+ All memory for tree structure/data is allocated via globally specified + functions, which default to malloc/free. You can set your own allocation + functions with set_memory_management functions. The function interfaces + are the same as that of malloc/free: +

+
typedef void* (*allocation_function)(size_t size);
+typedef void (*deallocation_function)(void* ptr);
+
+

+ You can use the following accessor functions to change or get current memory + management functions: +

+
void set_memory_management_functions(allocation_function allocate, deallocation_function deallocate);
+allocation_function get_memory_allocation_function();
+deallocation_function get_memory_deallocation_function();
+
+

+ Allocation function is called with the size (in bytes) as an argument and + should return a pointer to memory block with alignment that is suitable + for pointer storage and size that is greater or equal to the requested + one. If the allocation fails, the function has to return null pointer (throwing + an exception from allocation function results in undefined behavior). Deallocation + function is called with the pointer that was returned by the previous call + or with a null pointer; null pointer deallocation should be handled as + a no-op. If memory management functions are not thread-safe, library thread + safety is not guaranteed. +

+

+ This is a simple example of custom memory management (samples/custom_memory_management.cpp): +

+

+ +

+
void* custom_allocate(size_t size)
+{
+    return new (std::nothrow) char[size];
+}
+
+void custom_deallocate(void* ptr)
+{
+    delete[] static_cast<char*>(ptr);
+}
+
+

+

+

+ +

+
pugi::set_memory_management_functions(custom_allocate, custom_deallocate);
+
+

+

+

+ When setting new memory management functions, care must be taken to make + sure that there are no live pugixml objects. Otherwise when the objects + are destroyed, the new deallocation function will be called with the memory + obtained by the old allocation function, resulting in undefined behavior. +

+
+ + + + + +
[Note]Note

+ Currently memory for XPath objects is allocated using default operators + new/delete; this will change in the next version. +

+
+
+ +

+ Constructing a document object using the default constructor does not result + in any allocations; document node is stored inside the xml_document + object. +

+

+ When the document is loaded from file/buffer, unless an inplace loading + function is used (see Loading document from memory), a complete copy of character + stream is made; all names/values of nodes and attributes are allocated + in this buffer. This buffer is allocated via a single large allocation + and is only freed when document memory is reclaimed (i.e. if the xml_document object is destroyed or if + another document is loaded in the same object). Also when loading from + file or stream, an additional large allocation may be performed if encoding + conversion is required; a temporary buffer is allocated, and it is freed + before load function returns. +

+

+ All additional memory, such as memory for document structure (node/attribute + objects) and memory for node/attribute names/values is allocated in pages + on the order of 32 kilobytes; actual objects are allocated inside the pages + using a memory management scheme optimized for fast allocation/deallocation + of many small objects. Because of the scheme specifics, the pages are only + destroyed if all objects inside them are destroyed; also, generally destroying + an object does not mean that subsequent object creation will reuse the + same memory. This means that it is possible to devise a usage scheme which + will lead to higher memory usage than expected; one example is adding a + lot of nodes, and them removing all even numbered ones; not a single page + is reclaimed in the process. However this is an example specifically crafted + to produce unsatisfying behavior; in all practical usage scenarios the + memory consumption is less than that of a general-purpose allocator because + allocation meta-data is very small in size. +

+
+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/install.html b/docs/manual/install.html new file mode 100644 index 0000000..0c3e94e --- /dev/null +++ b/docs/manual/install.html @@ -0,0 +1,445 @@ + + + +Installation + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ + +
+ +

+ pugixml is distributed in source form. You can either download a source distribution + or checkout the Subversion repository. +

+
+ +

+ You can download the latest source distribution via one of the following + links: +

+
http://pugixml.googlecode.com/files/pugixml-0.9.zip
+http://pugixml.googlecode.com/files/pugixml-0.9.tar.gz
+
+

+ The distribution contains library source, documentation (the manual you're + reading now and the quick start guide) and some code examples. After downloading + the distribution, install pugixml by extracting all files from the compressed + archive. +

+

+ If you need an older version, you can download it from the version + archive. +

+
+
+ +

+ The Subversion repository is located at http://pugixml.googlecode.com/svn/. + There is a Subversion tag "release-{version}" for each version; + also there is the "latest" tag, which always points to the latest + stable release. +

+

+ For example, to checkout the current version, you can use this command: +

+
svn checkout http://pugixml.googlecode.com/svn/tags/release-0.9 pugixml
+

+ To checkout the latest version, you can use this command: +

+
svn checkout http://pugixml.googlecode.com/svn/tags/latest pugixml
+

+ The repository contains library source, documentation, code examples and + full unit test suite. +

+

+ Use latest version tag if you want to automatically get new versions via + svn update. Use other tags if you want to switch to + new versions only explicitly (for example, using svn switch + command). Also please note that Subversion trunk contains the work-in-progress + version of the code; while this means that you can get new features and + bug fixes from trunk without waiting for a new release, this also means + that occasionally the code can be broken in some configurations. +

+
+
+
+ +

+ pugixml is distributed in source form without any pre-built binaries; you + have to build them yourself. +

+

+ The complete pugixml source consists of four files - two source files, pugixml.cpp and + pugixpath.cpp, and two header files, pugixml.hpp and pugiconfig.hpp. pugixml.hpp is + the primary header which you need to include in order to use pugixml classes/functions; + pugiconfig.hpp is a supplementary configuration file (see Additional configuration + options). + The rest of this guide assumes that pugixml.hpp is either in the current directory + or in one of include directories of your projects, so that #include "pugixml.hpp" + can find the header; however you can also use relative path (i.e. #include "../libs/pugixml/src/pugixml.hpp") + or include directory-relative path (i.e. #include + <xml/thirdparty/pugixml/src/pugixml.hpp>). +

+
+ + + + + +
[Note]Note

+ You don't need to compile pugixpath.cpp unless you use XPath. +

+
+ +

+ The easiest way to build pugixml is to compile two source files, pugixml.cpp and + pugixpath.cpp, along with the existing library/executable. This process + depends on the method of building your application; for example, if you're + using Microsoft Visual Studio[1], Apple Xcode, Code::Blocks or any other IDE, just add pugixml.cpp and + pugixpath.cpp to one of your projects. +

+

+ If you're using Microsoft Visual Studio and the project has precompiled + headers turned on, you'll see the following error messages: +

+
pugixpath.cpp(3477) : fatal error C1010: unexpected end of file while looking for precompiled header. Did you forget to add '#include "stdafx.h"' to your source?
+

+ The correct way to resolve this is to disable precompiled headers for pugixml.cpp and + pugixpath.cpp; you have to set "Create/Use Precompiled Header" + option (Properties dialog -> C/C++ -> Precompiled Headers -> Create/Use + Precompiled Header) to "Not Using Precompiled Headers". You'll + have to do it for both pugixml.cpp and pugixpath.cpp, for all project configurations/platforms + (you can select Configuration "All Configurations" and Platform + "All Platforms" before editing the option): +

+
+ + +
+

+ vs2005_pch1_thumb next vs2005_pch2_thumb next vs2005_pch3_thumb next vs2005_pch4_thumb +

+
+
+
+ +

+ It's possible to compile pugixml as a standalone static library. This process + depends on the method of building your application; pugixml distribution + comes with project files for several popular IDEs/build systems. There + are project files for Apple XCode3, Code::Blocks, Codelite, Microsoft Visual + Studio 2005, 2008, 2010, and configuration scripts for CMake and premake4. + You're welcome to submit project files/build scripts for other software; + see Feedback. +

+

+ There are two projects for each version of Microsoft Visual Studio: one + for dynamically linked CRT, which has a name like pugixml_vs2008.vcproj, + and another one for statically linked CRT, which has a name like pugixml_vs2008_static.vcproj. + You should select the version that matches the CRT used in your application; + the default option for new projects created by Microsoft Visual Studio + is dynamically linked CRT, so unless you changed the defaults, you should + use the version with dynamic CRT (i.e. pugixml_vs2008.vcproj for Microsoft + Visual Studio 2008). +

+

+ In addition to adding pugixml project to your workspace, you'll have to + make sure that your application links with pugixml library. If you're using + Microsoft Visual Studio 2005/2008, you can add a dependency from your application + project to pugixml one. If you're using Microsoft Visual Studio 2010, you'll + have to add a reference to your application project instead. For other + IDEs/systems, consult the relevant documentation. +

+
++++ + + + + + + + + +
+

+ Microsoft Visual Studio 2005/2008 +

+
+

+ Microsoft Visual Studio 2010 +

+
+

+ vs2005_link1_thumb next vs2005_link2_thumb +

+
+

+ vs2010_link1_thumb next vs2010_link2_thumb +

+
+
+
+ +

+ It's possible to compile pugixml as a standalone shared library. The process + is usually similar to the static library approach; however, no preconfigured + projects/scripts are included into pugixml distribution, so you'll have + to do it yourself. Generally, if you're using GCC-based toolchain, the + process does not differ from building any other library as DLL (adding + -shared to compilation flags should suffice); if you're using MSVC-based + toolchain, you'll have to explicitly mark exported symbols with a declspec + attribute. You can do it by defining PUGIXML_API + macro, i.e. via pugiconfig.hpp: +

+
#ifdef _DLL
+#define PUGIXML_API __declspec(dllexport)
+#else
+#define PUGIXML_API __declspec(dllimport)
+#endif
+
+
+
+ +

+ pugixml uses several defines to control the compilation process. There + are two ways to define them: either put the needed definitions to pugiconfig.hpp (it + has some examples that are commented out) or provide them via compiler + command-line. Define consistency is important, i.e. the definitions should + match in all source files that include pugixml.hpp (including pugixml sources) + throughout the application. Adding defines to pugiconfig.hpp lets you guarantee + this, unless your macro definition is wrapped in preprocessor #if/#ifdef + directive and this directive is not consistent. pugiconfig.hpp will never + contain anything but comments, which means that when upgrading to new version, + you can safely leave your modified version intact. +

+

+ PUGIXML_WCHAR_MODE define toggles + between UTF-8 style interface (the in-memory text encoding is assumed to + be UTF-8, most functions use char + as character type) and UTF-16/32 style interface (the in-memory text encoding + is assumed to be UTF-16/32, depending on wchar_t + size, most functions use wchar_t + as character type). See Unicode interface for more details. +

+

+ PUGIXML_NO_XPATH define disables XPath. + Both XPath interfaces and XPath implementation are excluded from compilation; + you can still compile the file pugixpath.cpp (it will result in an empty + translation unit). This option is provided in case you do not need XPath + functionality and need to save code space. +

+

+ PUGIXML_NO_STL define disables use of + STL in pugixml. The functions that operate on STL types are no longer present + (i.e. load/save via iostream) if this macro is defined. This option is + provided in case your target platform does not have a standard-compliant + STL implementation. +

+
+ + + + + +
[Note]Note

+ As of version 0.9, STL is used in XPath implementation; therefore, XPath + is also disabled if this macro is defined. This will change in version + 1.0. +

+

+ PUGIXML_NO_EXCEPTIONS define disables + use of exceptions in pugixml. This option is provided in case your target + platform does not have exception handling capabilities +

+
+ + + + + +
[Note]Note

+ As of version 0.9, exceptions are only + used in XPath implementation; therefore, XPath is also disabled if this + macro is defined. This will change in version 1.0. +

+

+ PUGIXML_API, PUGIXML_CLASS + and PUGIXML_FUNCTION defines let you + specify custom attributes (i.e. declspec or calling conventions) for pugixml + classes and non-member functions. In absence of PUGIXML_CLASS + or PUGIXML_FUNCTION definitions, + PUGIXML_API definition + is used instead. For example, to specify fixed calling convention, you + can define PUGIXML_FUNCTION + to i.e. __fastcall. Another + example is DLL import/export attributes in MSVC (see Building pugixml as + a standalone shared library). +

+
+ + + + + +
[Note]Note

+ In that example PUGIXML_API + is inconsistent between several source files; this is an exception to + the consistency rule. +

+
+
+
+ +

+ pugixml is written in standard-compliant C++ with some compiler-specific + workarounds where appropriate. pugixml is compatible with the upcoming C++0x + standard (verified using GCC 4.5). Each version is tested with a unit test + suite (with code coverage about 99%) on the following platforms: +

+
    +
  • + Microsoft Windows: +
      +
    • + Borland C++ Compiler 5.82 +
    • +
    • + Digital Mars C++ Compiler 8.51 +
    • +
    • + Intel C++ Compiler 8.0, 9.0 x86/x64, 10.0 x86/x64, 11.0 x86/x64 +
    • +
    • + Metrowerks CodeWarrior 8.0 +
    • +
    • + Microsoft Visual C++ 6.0, 7.0 (2002), 7.1 (2003), 8.0 (2005) x86/x64, + 9.0 (2008) x86/x64, 10.0 (2010) x86/x64 +
    • +
    • + MinGW (GCC) 3.4, 4.4, 4.5, 4.6 x64 +
    • +
    +
  • +
  • + Linux (GCC 4.4.3 x86/x64) +
  • +
  • + FreeBSD (GCC 4.2.1 x86/x64) +
  • +
  • + Apple MacOSX (GCC 4.0.1 x86/x64/PowerPC) +
  • +
  • + Microsoft Xbox 360 +
  • +
  • + Nintendo Wii (Metrowerks CodeWarrior 4.1) +
  • +
  • + Sony Playstation Portable (GCC 3.4.2) +
  • +
  • + Sony Playstation 3 (GCC 4.1.1, SNC 310.1) +
  • +
+
+
+

+

[1] + All trademarks used are properties of their respective owners. +

+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/loading.html b/docs/manual/loading.html new file mode 100644 index 0000000..a3c1515 --- /dev/null +++ b/docs/manual/loading.html @@ -0,0 +1,840 @@ + + + +Loading document + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ + +

+ pugixml provides several functions for loading XML data from various places + - files, C++ iostreams, memory buffers. All functions use an extremely fast + non-validating parser. This parser is not fully W3C conformant - it can load + any valid XML document, but does not perform some well-formedness checks. While + considerable effort is made to reject invalid XML documents, some validation + is not performed because of performance reasons. Also some XML transformations + (i.e. EOL handling or attribute value normalization) can impact parsing speed + and thus can be disabled. However for vast majority of XML documents there + is no performance difference between different parsing options. Parsing options + also control whether certain XML nodes are parsed; see Parsing options for + more information. +

+

+ XML data is always converted to internal character format (see Unicode interface) + before parsing. pugixml supports all popular Unicode encodings (UTF-8, UTF-16 + (big and little endian), UTF-32 (big and little endian); UCS-2 is naturally + supported since it's a strict subset of UTF-16) and handles all encoding conversions + automatically. Unless explicit encoding is specified, loading functions perform + automatic encoding detection based on first few characters of XML data, so + in almost all cases you do not have to specify document encoding. Encoding + conversion is described in more detail in Encodings. +

+
+ +

+ The most common source of XML data is files; pugixml provides a separate + function for loading XML document from file: +

+
xml_parse_result xml_document::load_file(const char* path, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
+
+

+ This function accepts file path as its first argument, and also two optional + arguments, which specify parsing options (see Parsing options) and + input data encoding (see Encodings). The path has the target + operating system format, so it can be a relative or absolute one, it should + have the delimiters of target system, it should have the exact case if target + file system is case-sensitive, etc. File path is passed to system file opening + function as is. +

+

+ load_file destroys the existing + document tree and then tries to load the new tree from the specified file. + The result of the operation is returned in an xml_parse_result + object; this object contains the operation status, and the related information + (i.e. last successfully parsed position in the input file, if parsing fails). + See Handling parsing errors for error handling details. +

+
+ + + + + +
[Note]Note

+ As of version 0.9, there is no function for loading XML document from wide + character path. Unfortunately, there is no portable way to do this; the + version 1.0 will provide such function only for platforms with the corresponding + functionality. You can use stream-loading functions as a workaround if + your STL implementation can open file streams via wchar_t + paths. +

+

+ This is an example of loading XML document from file (samples/load_file.cpp): +

+

+ +

+
pugi::xml_document doc;
+
+pugi::xml_parse_result result = doc.load_file("tree.xml");
+
+std::cout << "Load result: " << result.description() << ", mesh name: " << doc.child("mesh").attribute("name").value() << std::endl;
+
+

+

+
+
+ +

+ Sometimes XML data should be loaded from some other source than file, i.e. + HTTP URL; also you may want to load XML data from file using non-standard + functions, i.e. to use your virtual file system facilities or to load XML + from gzip-compressed files. All these scenarios require loading document + from memory. First you should prepare a contiguous memory block with all + XML data; then you have to invoke one of buffer loading functions. These + functions will handle the necessary encoding conversions, if any, and then + will parse the data into the corresponding XML tree. There are several buffer + loading functions, which differ in the behavior and thus in performance/memory + usage: +

+
xml_parse_result xml_document::load_buffer(const void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
+xml_parse_result xml_document::load_buffer_inplace(void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
+xml_parse_result xml_document::load_buffer_inplace_own(void* contents, size_t size, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
+
+

+ All functions accept the buffer which is represented by a pointer to XML + data, contents, and data + size in bytes. Also there are two optional arguments, which specify parsing + options (see Parsing options) and input data encoding (see Encodings). + The buffer does not have to be zero-terminated. +

+

+ load_buffer function works + with immutable buffer - it does not ever modify the buffer. Because of this + restriction it has to create a private buffer and copy XML data to it before + parsing (applying encoding conversions if necessary). This copy operation + carries a performance penalty, so inplace functions are provided - load_buffer_inplace and load_buffer_inplace_own + store the document data in the buffer, modifying it in the process. In order + for the document to stay valid, you have to make sure that the buffer's lifetime + exceeds that of the tree if you're using inplace functions. In addition to + that, load_buffer_inplace + does not assume ownership of the buffer, so you'll have to destroy it yourself; + load_buffer_inplace_own assumes + ownership of the buffer and destroys it once it is not needed. This means + that if you're using load_buffer_inplace_own, + you have to allocate memory with pugixml allocation function (you can get + it via get_memory_allocation_function). +

+

+ The best way from the performance/memory point of view is to load document + using load_buffer_inplace_own; + this function has maximum control of the buffer with XML data so it is able + to avoid redundant copies and reduce peak memory usage while parsing. This + is the recommended function if you have to load the document from memory + and performance is critical. +

+

+ There is also a simple helper function for cases when you want to load the + XML document from null-terminated character string: +

+
xml_parse_result xml_document::load(const char_t* contents, unsigned int options = parse_default);
+
+

+ It is equivalent to calling load_buffer + with size = + strlen(contents). + This function assumes native encoding for input data, so it does not do any + encoding conversion. In general, this function is fine for loading small + documents from string literals, but has more overhead and less functionality + than buffer loading functions. +

+

+ This is an example of loading XML document from memory using different functions + (samples/load_memory.cpp): +

+

+ +

+
const char source[] = "<mesh name='sphere'><bounds>0 0 1 1</bounds></mesh>";
+size_t size = sizeof(source);
+
+

+

+

+ +

+
// You can use load_buffer to load document from immutable memory block:
+pugi::xml_parse_result result = doc.load_buffer(source, size);
+
+

+

+

+ +

+
// You can use load_buffer_inplace to load document from mutable memory block; the block's lifetime must exceed that of document
+char* buffer = new char[size];
+memcpy(buffer, source, size);
+
+// The block can be allocated by any method; the block is modified during parsing
+pugi::xml_parse_result result = doc.load_buffer_inplace(buffer, size);
+
+// You have to destroy the block yourself after the document is no longer used
+delete[] buffer;
+
+

+

+

+ +

+
// You can use load_buffer_inplace_own to load document from mutable memory block and to pass the ownership of this block
+// The block has to be allocated via pugixml allocation function - using i.e. operator new here is incorrect
+char* buffer = static_cast<char*>(pugi::get_memory_allocation_function()(size));
+memcpy(buffer, source, size);
+
+// The block will be deleted by the document
+pugi::xml_parse_result result = doc.load_buffer_inplace_own(buffer, size);
+
+

+

+

+ +

+
// You can use load to load document from null-terminated strings, for example literals:
+pugi::xml_parse_result result = doc.load("<mesh name='sphere'><bounds>0 0 1 1</bounds></mesh>");
+
+

+

+
+
+ +

+ For additional interoperability pugixml provides functions for loading document + from any object which implements C++ std::istream + interface. This allows you to load documents from any standard C++ stream + (i.e. file stream) or any third-party compliant implementation (i.e. Boost + Iostreams). There are two functions, one works with narrow character streams, + another handles wide character ones: +

+
xml_parse_result xml_document::load(std::istream& stream, unsigned int options = parse_default, xml_encoding encoding = encoding_auto);
+xml_parse_result xml_document::load(std::wistream& stream, unsigned int options = parse_default);
+
+

+ load with std::istream + argument loads the document from stream from the current read position to + the end, treating the stream contents as a byte stream of the specified encoding + (with encoding autodetection as necessary). Thus calling xml_document::load + on an opened std::ifstream object is equivalent to calling + xml_document::load_file. +

+

+ load with std::wstream + argument treats the stream contents as a wide character stream (encoding + is always encoding_wchar). + Because of this, using load + with wide character streams requires careful (usually platform-specific) + stream setup (i.e. using the imbue + function). Generally use of wide streams is discouraged, however it provides + you the ability to load documents from non-Unicode encodings, i.e. you can + load Shift-JIS encoded data if you set the correct locale. +

+

+ This is a simple example of loading XML document from file using streams + (samples/load_stream.cpp); read + the sample code for more complex examples involving wide streams and locales: +

+

+ +

+
std::ifstream stream("weekly-utf-8.xml");
+pugi::xml_parse_result result = doc.load(stream);
+
+

+

+

+ Stream loading requires working seek/tell functions and therefore may fail + when used with some stream implementations like gzstream. +

+
+
+ +

+ All document loading functions return the parsing result via xml_parse_result object. It contains parsing + status, the offset of last successfully parsed character from the beginning + of the source stream, and the encoding of the source stream: +

+
struct xml_parse_result
+{
+    xml_parse_status status;
+    ptrdiff_t offset;
+    xml_encoding encoding;
+
+    operator bool() const;
+    const char* description() const;
+};
+
+

+ Parsing status is represented as the xml_parse_status + enumeration and can be one of the following: +

+
    +
  • + status_ok means that no error was encountered + during parsing; the source stream represents the valid XML document which + was fully parsed and converted to a tree.

    + +
  • +
  • + status_file_not_found is only + returned by load_file + function and means that file could not be opened. +
  • +
  • + status_io_error is returned by load_file function and by load functions with std::istream/std::wstream arguments; it means that some + I/O error has occured during reading the file/stream. +
  • +
  • + status_out_of_memory means that + there was not enough memory during some allocation; any allocation failure + during parsing results in this error. +
  • +
  • + status_internal_error means that + something went horribly wrong; currently this error does not occur

    + +
  • +
  • + status_unrecognized_tag means + that parsing stopped due to a tag with either an empty name or a name + which starts with incorrect character, such as #. +
  • +
  • + status_bad_pi means that parsing stopped + due to incorrect document declaration/processing instruction +
  • +
  • + status_bad_comment, status_bad_cdata, + status_bad_doctype and status_bad_pcdata + mean that parsing stopped due to the invalid construct of the respective + type +
  • +
  • + status_bad_start_element means + that parsing stopped because starting tag either had no closing > symbol or contained some incorrect + symbol +
  • +
  • + status_bad_attribute means that + parsing stopped because there was an incorrect attribute, such as an + attribute without value or with value that is not quoted (note that + <node + attr=1> is + incorrect in XML) +
  • +
  • + status_bad_end_element means + that parsing stopped because ending tag had incorrect syntax (i.e. extra + non-whitespace symbols between tag name and >) +
  • +
  • + status_end_element_mismatch + means that parsing stopped because the closing tag did not match the + opening one (i.e. <node></nedo>) or because some tag was not closed + at all +
  • +
+

+ description() + member function can be used to convert parsing status to a string; the returned + message is always in English, so you'll have to write your own function if + you need a localized string. However please note that the exact messages + returned by description() + function may change from version to version, so any complex status handling + should be based on status + value. +

+

+ If parsing failed because the source data was not a valid XML, the resulting + tree is not destroyed - despite the fact that load function returns error, + you can use the part of the tree that was successfully parsed. Obviously, + the last element may have an unexpected name/value; for example, if the attribute + value does not end with the necessary quotation mark, like in <node + attr="value>some data</node> example, the value of + attribute attr will contain + the string value>some data</node>. +

+

+ In addition to the status code, parsing result has an offset + member, which contains the offset of last successfully parsed character if + parsing failed because of an error in source data; otherwise offset is 0. For parsing efficiency reasons, + pugixml does not track the current line during parsing; this offset is in + units of pugi::char_t (bytes for character mode, wide + characters for wide character mode). Many text editors support 'Go To Position' + feature - you can use it to locate the exact error position. Alternatively, + if you're loading the document from memory, you can display the error chunk + along with the error description (see the example code below). +

+
+ + + + + +
[Caution]Caution

+ Offset is calculated in the XML buffer in native encoding; if encoding + conversion is performed during parsing, offset can not be used to reliably + track the error position. +

+

+ Parsing result also has an encoding + member, which can be used to check that the source data encoding was correctly + guessed. It is equal to the exact encoding used during parsing (i.e. with + the exact endianness); see Encodings for more information. +

+

+ Parsing result object can be implicitly converted to bool; + if you do not want to handle parsing errors thoroughly, you can just check + the return value of load functions as if it was a bool: + if (doc.load_file("file.xml")) { ... + } else { ... }. +

+

+ This is an example of handling loading errors (samples/load_error_handling.cpp): +

+

+ +

+
pugi::xml_document doc;
+pugi::xml_parse_result result = doc.load(source);
+
+if (result)
+    std::cout << "XML [" << source << "] parsed without errors, attr value: [" << doc.child("node").attribute("attr").value() << "]\n\n";
+else
+{
+    std::cout << "XML [" << source << "] parsed with errors, attr value: [" << doc.child("node").attribute("attr").value() << "]\n";
+    std::cout << "Error description: " << result.description() << "\n";
+    std::cout << "Error offset: " << result.offset << " (error at [..." << (source + result.offset) << "]\n\n";
+}
+
+

+

+
+
+ +

+ All document loading functions accept the optional parameter options. This is a bitmask that customizes + the parsing process: you can select the node types that are parsed and various + transformations that are performed with the XML text. Disabling certain transformations + can improve parsing performance for some documents; however, the code for + all transformations is very well optimized, and thus the majority of documents + won't get any performance benefit. As a rule of thumb, only modify parsing + flags if you want to get some nodes in the document that are excluded by + default (i.e. declaration or comment nodes). +

+
+ + + + + +
[Note]Note

+ You should use the usual bitwise arithmetics to manipulate the bitmask: + to enable a flag, use mask | flag; + to disable a flag, use mask & ~flag. +

+

+ These flags control the resulting tree contents: +

+
    +
  • + parse_declaration determines if XML + document declaration (node with type node_declaration) + are to be put in DOM tree. If this flag is off, it is not put in the + tree, but is still parsed and checked for correctness. This flag is + off by default.

    + +
  • +
  • + parse_pi determines if processing instructions + (nodes with type node_pi) are to be put + in DOM tree. If this flag is off, they are not put in the tree, but are + still parsed and checked for correctness. Note that <?xml ...?> + (document declaration) is not considered to be a PI. This flag is off by default.

    + +
  • +
  • + parse_comments determines if comments + (nodes with type node_comment) are + to be put in DOM tree. If this flag is off, they are not put in the tree, + but are still parsed and checked for correctness. This flag is off by default.

    + +
  • +
  • + parse_cdata determines if CDATA sections + (nodes with type node_cdata) are to + be put in DOM tree. If this flag is off, they are not put in the tree, + but are still parsed and checked for correctness. This flag is on by default.

    + +
  • +
  • + parse_ws_pcdata determines if PCDATA + nodes (nodes with type node_pcdata) + that consist only of whitespace characters are to be put in DOM tree. + Often whitespace-only data is not significant for the application, and + the cost of allocating and storing such nodes (both memory and speed-wise) + can be significant. For example, after parsing XML string <node> <a/> </node>, <node> + element will have three children when parse_ws_pcdata + is set (child with type node_pcdata + and value " ", + child with type node_element + and name "a", and + another child with type node_pcdata + and value " "), + and only one child when parse_ws_pcdata + is not set. This flag is off by default. +
  • +
+

+ These flags control the transformation of tree element contents: +

+
    +
  • + parse_escapes determines if character + and entity references are to be expanded during the parsing process. + Character references have the form &#...; or + &#x...; (... is Unicode numeric + representation of character in either decimal (&#...;) + or hexadecimal (&#x...;) form), entity references + are &lt;, &gt;, &amp;, + &apos; and &quot; (note + that as pugixml does not handle DTD, the only allowed entities are predefined + ones). If character/entity reference can not be expanded, it is left + as is, so you can do additional processing later. Reference expansion + is performed in attribute values and PCDATA content. This flag is on by default.

    + +
  • +
  • + parse_eol determines if EOL handling (that + is, replacing sequences 0x0d 0x0a by a single 0x0a + character, and replacing all standalone 0x0d + characters by 0x0a) is to + be performed on input data (that is, comments contents, PCDATA/CDATA + contents and attribute values). This flag is on + by default.

    + +
  • +
  • + parse_wconv_attribute determines + if attribute value normalization should be performed for all attributes. + This means, that whitespace characters (new line, tab and space) are + replaced with space (' '). + New line characters are always treated as if parse_eol + is set, i.e. \r\n + is converted to single space. This flag is on + by default.

    + +
  • +
  • + parse_wnorm_attribute determines + if extended attribute value normalization should be performed for all + attributes. This means, that after attribute values are normalized as + if parse_wconv_attribute + was set, leading and trailing space characters are removed, and all sequences + of space characters are replaced by a single space character. The value + of parse_wconv_attribute + has no effect if this flag is on. This flag is off + by default. +
  • +
+
+ + + + + +
[Note]Note

+ parse_wconv_attribute option + performs transformations that are required by W3C specification for attributes + that are declared as CDATA; parse_wnorm_attribute + performs transformations required for NMTOKENS attributes. + In the absence of document type declaration all attributes behave as if + they are declared as CDATA, thus parse_wconv_attribute + is the default option. +

+

+ Additionally there are two predefined option masks: +

+
    +
  • + parse_minimal has all options turned + off. This option mask means that pugixml does not add declaration nodes, + PI nodes, CDATA sections and comments to the resulting tree and does + not perform any conversion for input data, so theoretically it is the + fastest mode. However, as discussed above, in practice parse_default is usually equally fast. +

    + +
  • +
  • + parse_default is the default set of flags, + i.e. it has all options set to their default values. It includes parsing + CDATA sections (comments/PIs are not parsed), performing character and + entity reference expansion, replacing whitespace characters with spaces + in attribute values and performing EOL handling. Note, that PCDATA sections + consisting only of whitespace characters are not parsed (by default) + for performance reasons. +
  • +
+

+ This is an example of using different parsing options (samples/load_options.cpp): +

+

+ +

+
const char* source = "<!--comment--><node>&lt;</node>";
+
+// Parsing with default options; note that comment node is not added to the tree, and entity reference &lt; is expanded
+doc.load(source);
+std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
+
+// Parsing with additional parse_comments option; comment node is now added to the tree
+doc.load(source, pugi::parse_default | pugi::parse_comments);
+std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
+
+// Parsing with additional parse_comments option and without the (default) parse_escapes option; &lt; is not expanded
+doc.load(source, (pugi::parse_default | pugi::parse_comments) & ~pugi::parse_escapes);
+std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
+
+// Parsing with minimal option mask; comment node is not added to the tree, and &lt; is not expanded
+doc.load(source, pugi::parse_minimal);
+std::cout << "First node value: [" << doc.first_child().value() << "], node child value: [" << doc.child_value("node") << "]\n";
+
+

+

+
+
+ +

+ pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little + endian), UTF-32 (big and little endian); UCS-2 is naturally supported since + it's a strict subset of UTF-16) and handles all encoding conversions. Most + loading functions accept the optional parameter encoding. + This is a value of enumeration type xml_encoding, + that can have the following values: +

+
    +
  • + encoding_auto means that pugixml will + try to guess the encoding based on source XML data. The algorithm is + a modified version of the one presented in Appendix F.1 of XML recommendation; + it tries to match the first few bytes of input data with the following + patterns in strict order:

    +
      +
    • + If first four bytes match UTF-32 BOM (Byte Order Mark), encoding + is assumed to be UTF-32 with the endianness equal to that of BOM; +
    • +
    • + If first two bytes match UTF-16 BOM, encoding is assumed to be + UTF-16 with the endianness equal to that of BOM; +
    • +
    • + If first three bytes match UTF-8 BOM, encoding is assumed to be + UTF-8; +
    • +
    • + If first four bytes match UTF-32 representation of <, + encoding is assumed to be UTF-32 with the corresponding endianness; +
    • +
    • + If first four bytes match UTF-16 representation of <?, + encoding is assumed to be UTF-16 with the corresponding endianness; +
    • +
    • + If first two bytes match UTF-16 representation of <, + encoding is assumed to be UTF-16 with the corresponding endianness + (this guess may yield incorrect result, but it's better than UTF-8); +
    • +
    • + Otherwise encoding is assumed to be UTF-8.

      + +
    • +
    +
  • +
  • + encoding_utf8 corresponds to UTF-8 encoding + as defined in Unicode standard; UTF-8 sequences with length equal to + 5 or 6 are not standard and are rejected. +
  • +
  • + encoding_utf16_le corresponds to + little-endian UTF-16 encoding as defined in Unicode standard; surrogate + pairs are supported. +
  • +
  • + encoding_utf16_be corresponds to + big-endian UTF-16 encoding as defined in Unicode standard; surrogate + pairs are supported. +
  • +
  • + encoding_utf16 corresponds to UTF-16 + encoding as defined in Unicode standard; the endianness is assumed to + be that of target platform. +
  • +
  • + encoding_utf32_le corresponds to + little-endian UTF-32 encoding as defined in Unicode standard. +
  • +
  • + encoding_utf32_be corresponds to + big-endian UTF-32 encoding as defined in Unicode standard. +
  • +
  • + encoding_utf32 corresponds to UTF-32 + encoding as defined in Unicode standard; the endianness is assumed to + be that of target platform. +
  • +
  • + encoding_wchar corresponds to the encoding + of wchar_t type; it has + the same meaning as either encoding_utf16 + or encoding_utf32, depending + on wchar_t size. +
  • +
+

+ The algorithm used for encoding_auto + correctly detects any supported Unicode encoding for all well-formed XML + documents (since they start with document declaration) and for all other + XML documents that start with <; if your XML document + does not start with < and has encoding that is different + from UTF-8, use the specific encoding. +

+
+ + + + + +
[Note]Note

+ The current behavior for Unicode conversion is to skip all invalid UTF + sequences during conversion. This behavior should not be relied upon; moreover, + in case no encoding conversion is performed, the invalid sequences are + not removed, so you'll get them as is in node/attribute contents. +

+
+
+ +

+ pugixml is not fully W3C conformant - it can load any valid XML document, + but does not perform some well-formedness checks. While considerable effort + is made to reject invalid XML documents, some validation is not performed + because of performance reasons. +

+

+ There is only one non-conformant behavior when dealing with valid XML documents: + pugixml does not use information supplied in document type declaration for + parsing. This means that entities declared in DOCTYPE are not expanded, and + all attribute/PCDATA values are always processed in a uniform way that depends + only on parsing options. +

+

+ As for rejecting invalid XML documents, there are a number of incompatibilities + with W3C specification, including: +

+
    +
  • + Multiple attributes of the same node can have equal names. +
  • +
  • + All non-ASCII characters are treated in the same way as symbols of English + alphabet, so some invalid tag names are not rejected. +
  • +
  • + Attribute values which contain < are not rejected. +
  • +
  • + Invalid entity/character references are not rejected and are instead + left as is. +
  • +
  • + Comment values can contain --. +
  • +
  • + XML data is not required to begin with document declaration; additionally, + document declaration can appear after comments and other nodes. +
  • +
  • + Invalid document type declarations are silently ignored in some cases. +
  • +
+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/modify.html b/docs/manual/modify.html new file mode 100644 index 0000000..f00e657 --- /dev/null +++ b/docs/manual/modify.html @@ -0,0 +1,541 @@ + + + +Modifying document data + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ + +

+ The document in pugixml is fully mutable: you can completely change the document + structure and modify the data of nodes/attributes. This section provides documentation + for the relevant functions. All functions take care of memory management and + structural integrity themselves, so they always result in structurally valid + tree - however, it is possible to create an invalid XML tree (for example, + by adding two attributes with the same name or by setting attribute/node name + to empty/invalid string). Tree modification is optimized for performance and + for memory consumption, so if you have enough memory you can create documents + from scratch with pugixml and later save them to file/stream instead of relying + on error-prone manual text writing and without too much overhead. +

+

+ All member functions that change node/attribute data or structure are non-constant + and thus can not be called on constant handles. However, you can easily convert + constant handle to non-constant one by simple assignment: void + foo(const pugi::xml_node& n) + { pugi::xml_node nc = n; }, so const-correctness + here mainly provides additional documentation. +

+
+ +

+ As discussed before, nodes can have name and value, both of which are strings. + Depending on node type, name or value may be absent. node_document + nodes do not have name or value, node_element + and node_declaration nodes + always have a name but never have a value, node_pcdata, + node_cdata and node_comment nodes never have a name but + always have a value (it may be empty though), node_pi + nodes always have a name and a value (again, value may be empty). In order + to set node's name or value, you can use the following functions: +

+
bool xml_node::set_name(const char_t* rhs);
+bool xml_node::set_value(const char_t* rhs);
+
+

+ Both functions try to set the name/value to the specified string, and return + the operation result. The operation fails if the node can not have name or + value (for instance, when trying to call set_name + on a node_pcdata node), if + the node handle is null, or if there is insufficient memory to handle the + request. The provided string is copied into document managed memory and can + be destroyed after the function returns (for example, you can safely pass + stack-allocated buffers to these functions). The name/value content is not + verified, so take care to use only valid XML names, or the document may become + malformed. +

+

+ There is no equivalent of child_value + function for modifying text children of the node. +

+

+ This is an example of setting node name and value (samples/modify_base.cpp): +

+

+ +

+
pugi::xml_node node = doc.child("node");
+
+// change node name
+std::cout << node.set_name("notnode");
+std::cout << ", new node name: " << node.name() << std::endl;
+
+// change comment text
+std::cout << doc.last_child().set_value("useless comment");
+std::cout << ", new comment text: " << doc.last_child().value() << std::endl;
+
+// we can't change value of the element or name of the comment
+std::cout << node.set_value("1") << ", " << doc.last_child().set_name("2") << std::endl;
+
+

+

+
+
+ +

+ All attributes have name and value, both of which are strings (value may + be empty). You can set them with the following functions: +

+
bool xml_attribute::set_name(const char_t* rhs);
+bool xml_attribute::set_value(const char_t* rhs);
+
+

+ Both functions try to set the name/value to the specified string, and return + the operation result. The operation fails if the attribute handle is null, + or if there is insufficient memory to handle the request. The provided string + is copied into document managed memory and can be destroyed after the function + returns (for example, you can safely pass stack-allocated buffers to these + functions). The name/value content is not verified, so take care to use only + valid XML names, or the document may become malformed. +

+

+ In addition to string functions, several functions are provided for handling + attributes with numbers and booleans as values: +

+
bool xml_attribute::set_value(int rhs);
+bool xml_attribute::set_value(unsigned int rhs);
+bool xml_attribute::set_value(double rhs);
+bool xml_attribute::set_value(bool rhs);
+
+

+ The above functions convert the argument to string and then call the base + set_value function. Integers + are converted to a decimal form, floating-point numbers are converted to + either decimal or scientific form, depending on the number magnitude, boolean + values are converted to either "true" + or "false". +

+
+ + + + + +
[Caution]Caution

+ Number conversion functions depend on current C locale as set with setlocale, so may generate unexpected + results if the locale is different from "C". +

+
+ + + + + +
[Note]Note

+ There are no portable 64-bit types in C++, so there is no corresponding + set_value function. If + your platform has a 64-bit integer, you can easily write such a function + yourself. +

+

+ For convenience, all set_value + functions have the corresponding assignment operators: +

+
xml_attribute& xml_attribute::operator=(const char_t* rhs);
+xml_attribute& xml_attribute::operator=(int rhs);
+xml_attribute& xml_attribute::operator=(unsigned int rhs);
+xml_attribute& xml_attribute::operator=(double rhs);
+xml_attribute& xml_attribute::operator=(bool rhs);
+
+

+ These operators simply call the right set_value + function and return the attribute they're called on; the return value of + set_value is ignored, so + errors are not detected. +

+

+ This is an example of setting attribute name and value (samples/modify_base.cpp): +

+

+ +

+
pugi::xml_attribute attr = node.attribute("id");
+
+// change attribute name/value
+std::cout << attr.set_name("key") << ", " << attr.set_value("345");
+std::cout << ", new attribute: " << attr.name() << "=" << attr.value() << std::endl;
+
+// we can use numbers or booleans
+attr.set_value(1.234);
+std::cout << "new attribute value: " << attr.value() << std::endl;
+
+// we can also use assignment operators for more concise code
+attr = true;
+std::cout << "final attribute value: " << attr.value() << std::endl;
+
+

+

+
+
+ +

+ Nodes and attributes do not exist outside of document tree, so you can't + create them without adding them to some document. A node or attribute can + be created at the end of node/attribute list or before/after some other node: +

+
xml_attribute xml_node::append_attribute(const char_t* name);
+xml_attribute xml_node::insert_attribute_after(const char_t* name, const xml_attribute& attr);
+xml_attribute xml_node::insert_attribute_before(const char_t* name, const xml_attribute& attr);
+
+xml_node xml_node::append_child(xml_node_type type = node_element);
+xml_node xml_node::insert_child_after(xml_node_type type, const xml_node& node);
+xml_node xml_node::insert_child_before(xml_node_type type, const xml_node& node);
+
+

+ append_attribute and append_child create a new node/attribute + at the end of the corresponding list of the node the method is called on; + insert_attribute_after, + insert_attribute_before, + insert_child_after and insert_attribute_before add the node/attribute + before or after specified node/attribute. +

+

+ Attribute functions create an attribute with the specified name; you can + specify the empty name and change the name later if you want to. Node functions + create the node with the specified type; since node type can't be changed, + you have to know the desired type beforehand. Also note that not all types + can be added as children; see below for clarification. +

+

+ All functions return the handle to newly created object on success, and null + handle on failure. There are several reasons for failure: +

+
    +
  • + Adding fails if the target node is null; +
  • +
  • + Only node_element nodes + can contain attributes, so attribute adding fails if node is not an element; +
  • +
  • + Only node_document and + node_element nodes can + contain children, so child node adding fails if target node is not an + element or a document; +
  • +
  • + node_document and node_null nodes can not be inserted + as children, so passing node_document + or node_null value as + type results in operation failure; +
  • +
  • + node_declaration nodes + can only be added as children of the document node; attempt to insert + declaration node as a child of an element node fails; +
  • +
  • + Adding node/attribute results in memory allocation, which may fail; +
  • +
  • + Insertion functions fail if the specified node or attribute is not in + the target node's children/attribute list. +
  • +
+

+ Even if the operation fails, the document remains in consistent state, but + the requested node/attribute is not added. +

+
+ + + + + +
[Caution]Caution

+ attribute() and child() functions do not add attributes or nodes to the + tree, so code like node.attribute("id") = 123; will not do anything if node does not have an attribute with + name "id". Make sure + you're operating with existing attributes/nodes by adding them if necessary. +

+

+ This is an example of adding new attributes/nodes to the document (samples/modify_add.cpp): +

+

+ +

+
// add node with some name
+pugi::xml_node node = doc.append_child();
+node.set_name("node");
+
+// add description node with text child
+pugi::xml_node descr = node.append_child();
+descr.set_name("description");
+descr.append_child(pugi::node_pcdata).set_value("Simple node");
+
+// add param node before the description
+pugi::xml_node param = node.insert_child_before(pugi::node_element, descr);
+param.set_name("param");
+
+// add attributes to param node
+param.append_attribute("name") = "version";
+param.append_attribute("value") = 1.1;
+param.insert_attribute_after("type", param.attribute("name")) = "float";
+
+

+

+
+
+ +

+ If you do not want your document to contain some node or attribute, you can + remove it with one of the following functions: +

+
bool xml_node::remove_attribute(const xml_attribute& a);
+bool xml_node::remove_child(const xml_node& n);
+
+

+ remove_attribute removes + the attribute from the attribute list of the node, and returns the operation + result. remove_child removes + the child node with the entire subtree (including all descendant nodes and + attributes) from the document, and returns the operation result. Removing + fails if one of the following is true: +

+
    +
  • + The node the function is called on is null; +
  • +
  • + The attribute/node to be removed is null; +
  • +
  • + The attribute/node to be removed is not in the node's attribute/child + list. +
  • +
+

+ Removing the attribute or node invalidates all handles to the same underlying + object, and also invalidates all iterators pointing to the same object. Removing + node also invalidates all past-the-end iterators to its attribute or child + node list. Be careful to ensure that all such handles and iterators either + do not exist or are not used after the attribute/node is removed. +

+

+ If you want to remove the attribute or child node by its name, two additional + helper functions are available: +

+
bool xml_node::remove_attribute(const char_t* name);
+bool xml_node::remove_child(const char_t* name);
+
+

+ These functions look for the first attribute or child with the specified + name, and then remove it, returning the result. If there is no attribute + or child with such name, the function returns false; + if there are two nodes with the given name, only the first node is deleted. + If you want to delete all nodes with the specified name, you can use code + like this: while (node.remove_child("tool")) ;. +

+

+ This is an example of removing attributes/nodes from the document (samples/modify_remove.cpp): +

+

+ +

+
// remove description node with the whole subtree
+pugi::xml_node node = doc.child("node");
+node.remove_child("description");
+
+// remove id attribute
+pugi::xml_node param = node.child("param");
+param.remove_attribute("value");
+
+// we can also remove nodes/attributes by handles
+pugi::xml_attribute id = param.attribute("name");
+param.remove_attribute(id);
+
+

+

+
+
+ +

+ With the help of previously described functions, it is possible to create + trees with any contents and structure, including cloning the existing data. + However since this is an often needed operation, pugixml provides built-in + node/attribute cloning facilities. Since nodes and attributes do not exist + outside of document tree, you can't create a standalone copy - you have to + immediately insert it somewhere in the tree. For this, you can use one of + the following functions: +

+
xml_attribute xml_node::append_copy(const xml_attribute& proto);
+xml_attribute xml_node::insert_copy_after(const xml_attribute& proto, const xml_attribute& attr);
+xml_attribute xml_node::insert_copy_before(const xml_attribute& proto, const xml_attribute& attr);
+xml_node xml_node::append_copy(const xml_node& proto);
+xml_node xml_node::insert_copy_after(const xml_node& proto, const xml_node& node);
+xml_node xml_node::insert_copy_before(const xml_node& proto, const xml_node& node);
+
+

+ These functions mirror the structure of append_child, + insert_child_before and related + functions - they take the handle to the prototype object, which is to be + cloned, insert a new attribute/node at the appropriate place, and then copy + the attribute data or the whole node subtree to the new object. The functions + return the handle to the resulting duplicate object, or null handle on failure. +

+

+ The attribute is copied along with the name and value; the node is copied + along with its type, name and value; additionally attribute list and all + children are recursively cloned, resulting in the deep subtree clone. The + prototype object can be a part of the same document, or a part of any other + document. +

+

+ The failure conditions resemble those of append_child, + insert_child_before and related + functions, consult their documentation + for more information. There are additional caveats specific to cloning + functions: +

+
    +
  • + Cloning null handles results in operation failure; +
  • +
  • + Node cloning starts with insertion of the node of the same type as that + of the prototype; for this reason, cloning functions can not be directly + used to clone entire documents, since node_document + is not a valid insertion type. The example below provides a workaround. +
  • +
  • + It is possible to copy a subtree as a child of some node inside this + subtree, i.e. node.append_copy(node.parent().parent());. + This is a valid operation, and it results in a clone of the subtree in + the state before cloning started, i.e. no infinite recursion takes place. +
  • +
+

+ This is an example with one possible implementation of include tags in XML + (samples/include.cpp). It illustrates + node cloning and usage of other document modification functions: +

+

+ +

+
bool load_preprocess(pugi::xml_document& doc, const char* path);
+
+bool preprocess(pugi::xml_node node)
+{
+    for (pugi::xml_node child = node.first_child(); child; )
+    {
+        if (child.type() == pugi::node_pi && strcmp(child.name(), "include") == 0)
+        {
+            pugi::xml_node include = child;
+
+            // load new preprocessed document (note: ideally this should handle relative paths)
+            const char* path = include.value();
+
+            pugi::xml_document doc;
+            if (!load_preprocess(doc, path)) return false;
+
+            // insert the comment marker above include directive
+            node.insert_child_before(pugi::node_comment, include).set_value(path);
+
+            // copy the document above the include directive (this retains the original order!)
+            for (pugi::xml_node ic = doc.first_child(); ic; ic = ic.next_sibling())
+            {
+                node.insert_copy_before(ic, include);
+            }
+
+            // remove the include node and move to the next child
+            child = child.next_sibling();
+
+            node.remove_child(include);
+        }
+        else
+        {
+            if (!preprocess(child)) return false;
+
+            child = child.next_sibling();
+        }
+    }
+
+    return true;
+}
+
+bool load_preprocess(pugi::xml_document& doc, const char* path)
+{
+    pugi::xml_parse_result result = doc.load_file(path, pugi::parse_default | pugi::parse_pi); // for <?include?>
+    
+    return result ? preprocess(doc) : false;
+}
+
+

+

+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/saving.html b/docs/manual/saving.html new file mode 100644 index 0000000..e12b31d --- /dev/null +++ b/docs/manual/saving.html @@ -0,0 +1,473 @@ + + + +Saving document + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+ + +

+ Often after creating a new document or loading the existing one and processing + it, it is necessary to save the result back to file. Also it is occasionally + useful to output the whole document or a subtree to some stream; use cases + include debug printing, serialization via network or other text-oriented medium, + etc. pugixml provides several functions to output any subtree of the document + to a file, stream or another generic transport interface; these functions allow + to customize the output format (see Output options), and also perform + necessary encoding conversions (see Encodings). This section documents + the relevant functionality. +

+

+ The node/attribute data is written to the destination properly formatted according + to the node type; all special XML symbols, such as < and &, are properly + escaped. In order to guard against forgotten node/attribute names, empty node/attribute + names are printed as ":anonymous". + For proper output, make sure all node and attribute names are set to meaningful + values. +

+
+ + + + + +
[Caution]Caution

+ Currently the content of CDATA sections is not escaped, so CDATA sections + with values that contain "]]>" + will result in malformed document. This will be fixed in version 1.0. +

+
+ +

+ If you want to save the whole document to a file, you can use the following + function: +

+
bool xml_document::save_file(const char* path, const char_t* indent = "\t", unsigned int flags = format_default, xml_encoding encoding = encoding_auto) const;
+
+

+ This function accepts file path as its first argument, and also three optional + arguments, which specify indentation and other output options (see Output options) + and output data encoding (see Encodings). The path has the target + operating system format, so it can be a relative or absolute one, it should + have the delimiters of target system, it should have the exact case if target + file system is case-sensitive, etc. File path is passed to system file opening + function as is. +

+

+ save_file opens the target + file for writing, outputs the requested header (by default a document declaration + is output, unless the document already has one), and then saves the document + contents. If the file could not be opened, the function returns false. Calling save_file + is equivalent to creating an xml_writer_file + object with FILE* + handle as the only constructor argument and then calling save; + see Saving document via writer interface for writer interface details. +

+
+ + + + + +
[Note]Note

+ As of version 0.9, there is no function for saving XML document to wide + character paths. Unfortunately, there is no portable way to do this; the + version 1.0 will provide such function only for platforms with the corresponding + functionality. You can use stream-saving functions as a workaround if your + STL implementation can open file streams via wchar_t paths. +

+

+ This is a simple example of saving XML document to file (samples/save_file.cpp): +

+

+ +

+
// save document to file
+std::cout << "Saving result: " << doc.save_file("save_file_output.xml") << std::endl;
+
+

+

+
+
+ +

+ For additional interoperability pugixml provides functions for saving document + to any object which implements C++ std::ostream interface. This allows you + to save documents to any standard C++ stream (i.e. file stream) or any third-party + compliant implementation (i.e. Boost Iostreams). Most notably, this allows + for easy debug output, since you can use std::cout + stream as saving target. There are two functions, one works with narrow character + streams, another handles wide character ones: +

+
void xml_document::save(std::ostream& stream, const char_t* indent = "\t", unsigned int flags = format_default, xml_encoding encoding = encoding_auto) const;
+void xml_document::save(std::wostream& stream, const char_t* indent = "\t", unsigned int flags = format_default) const;
+
+

+ save with std::ostream + argument saves the document to the stream in the same way as save_file (i.e. with requested header and + with encoding conversions). On the other hand, save + with std::wstream argument saves the document to + the wide stream with encoding_wchar + encoding. Because of this, using save + with wide character streams requires careful (usually platform-specific) + stream setup (i.e. using the imbue + function). Generally use of wide streams is discouraged, however it provides + you with the ability to save documents to non-Unicode encodings, i.e. you + can save Shift-JIS encoded data if you set the correct locale. +

+

+ Calling save with stream + target is equivalent to creating an xml_writer_stream + object with stream as the only constructor argument and then calling save; see Saving document via writer interface for writer + interface details. +

+

+ This is a simple example of saving XML document to standard output (samples/save_stream.cpp): +

+

+ +

+
// save document to standard output
+std::cout << "Document:\n";
+doc.save(std::cout);
+
+

+

+
+
+ +

+ All of the above saving functions are implemented in terms of writer interface. + This is a simple interface with a single function, which is called several + times during output process with chunks of document data as input: +

+
class xml_writer
+{
+public:
+    virtual void write(const void* data, size_t size) = 0;
+};
+
+void xml_document::save(xml_writer& writer, const char_t* indent = "\t", unsigned int flags = format_default, xml_encoding encoding = encoding_auto) const;
+
+

+ In order to output the document via some custom transport, for example sockets, + you should create an object which implements xml_writer_file + interface and pass it to save + function. xml_writer_file::write + function is called with a buffer as an input, where data + points to buffer start, and size + is equal to the buffer size in bytes. write + implementation must write the buffer to the transport; it can not save the + passed buffer pointer, as the buffer contents will change after write returns. The buffer contains the + chunk of document data in the desired encoding. +

+

+ write function is called + with relatively large blocks (size is usually several kilobytes, except for + the first block with BOM, which is output only if format_write_bom + is set, and last block, which may be small), so there is often no need for + additional buffering in the implementation. +

+

+ This is a simple example of custom writer for saving document data to STL + string (samples/save_custom_writer.cpp); + read the sample code for more complex examples: +

+

+ +

+
struct xml_string_writer: pugi::xml_writer
+{
+    std::string result;
+
+    virtual void write(const void* data, size_t size)
+    {
+        result += std::string(static_cast<const char*>(data), size);
+    }
+};
+
+

+

+
+
+ +

+ While the previously described functions saved the whole document to the + destination, it is easy to save a single subtree. The following functions + are provided: +

+
void xml_node::print(std::ostream& os, const char_t* indent = "\t", unsigned int flags = format_default, xml_encoding encoding = encoding_auto, unsigned int depth = 0) const;
+void xml_node::print(std::wostream& os, const char_t* indent = "\t", unsigned int flags = format_default, unsigned int depth = 0) const;
+void xml_node::print(xml_writer& writer, const char_t* indent = "\t", unsigned int flags = format_default, xml_encoding encoding = encoding_auto, unsigned int depth = 0) const;
+
+

+ These functions have the same arguments with the same meaning as the corresponding + xml_document::save functions, and allow you to save the + subtree to either a C++ IOstream or to any object that implements xml_writer interface. +

+

+ Saving a subtree differs from saving the whole document: the process behaves + as if format_write_bom is + off, and format_no_declaration + is on, even if actual values of the flags are different. This means that + BOM is not written to the destination, and document declaration is only written + if it is the node itself or is one of node's children. Note that this also + holds if you're saving a document; this example (samples/save_subtree.cpp) + illustrates the difference: +

+

+ +

+
// get a test document
+pugi::xml_document doc;
+doc.load("<foo bar='baz'><call>hey</call></foo>");
+
+// print document to standard output (prints <?xml version="1.0"?><foo bar="baz"><call>hey</call></foo>)
+doc.save(std::cout, "", pugi::format_raw);
+std::cout << std::endl;
+
+// print document to standard output as a regular node (prints <foo bar="baz"><call>hey</call></foo>)
+doc.print(std::cout, "", pugi::format_raw);
+std::cout << std::endl;
+
+// print a subtree to standard output (prints <call>hey</call>)
+doc.child("foo").child("call").print(std::cout, "", pugi::format_raw);
+std::cout << std::endl;
+
+

+

+
+
+ +

+ All saving functions accept the optional parameter flags. + This is a bitmask that customizes the output format; you can select the way + the document nodes are printed and select the needed additional information + that is output before the document contents. +

+
+ + + + + +
[Note]Note

+ You should use the usual bitwise arithmetics to manipulate the bitmask: + to enable a flag, use mask | flag; + to disable a flag, use mask & ~flag. +

+

+ These flags control the resulting tree contents: +

+
    +
  • + format_indent determines if all nodes + should be indented with the indentation string (this is an additional + parameter for all saving functions, and is "\t" + by default). If this flag is on, before every node the indentation string + is output several times, where the amount of indentation depends on the + node's depth relative to the output subtree. This flag has no effect + if format_raw is enabled. + This flag is on by default.

    + +
  • +
  • + format_raw switches between formatted and + raw output. If this flag is on, the nodes are not indented in any way, + and also no newlines that are not part of document text are printed. + Raw mode can be used for serialization where the result is not intended + to be read by humans; also it can be useful if the document was parsed + with parse_ws_pcdata + flag, to preserve the original document formatting as much as possible. + This flag is off by default. +
  • +
+

+ These flags control the additional output information: +

+
    +
  • + format_no_declaration allows + to disable default node declaration output. By default, if the document + is saved via save or + save_file function, and + it does not have any document declaration, a default declaration is output + before the document contents. Enabling this flag disables this declaration. + This flag has no effect in xml_node::print + functions: they never output the default declaration. This flag is off by default.

    + +
  • +
  • + format_write_bom allows to enable + Byte Order Mark (BOM) output. By default, no BOM is output, so in case + of non UTF-8 encodings the resulting document's encoding may not be recognized + by some parsers and text editors, if they do not implement sophisticated + encoding detection. Enabling this flag adds an encoding-specific BOM + to the output. This flag has no effect in xml_node::print + functions: they never output the BOM. This flag is off + by default. +
  • +
+

+ Additionally, there is one predefined option mask: +

+
  • + format_default is the default set of + flags, i.e. it has all options set to their default values. It sets formatted + output with indentation, without BOM and with default node declaration, + if necessary. +
+

+ This is an example that shows the outputs of different output options (samples/save_options.cpp): +

+

+ +

+
// get a test document
+pugi::xml_document doc;
+doc.load("<foo bar='baz'><call>hey</call></foo>");
+
+// default options; prints
+// <?xml version="1.0"?>
+// <foo bar="baz">
+//         <call>hey</call>
+// </foo>
+doc.save(std::cout);
+std::cout << std::endl;
+
+// default options with custom indentation string; prints
+// <?xml version="1.0"?>
+// <foo bar="baz">
+// --<call>hey</call>
+// </foo>
+doc.save(std::cout, "--");
+std::cout << std::endl;
+
+// default options without indentation; prints
+// <?xml version="1.0"?>
+// <foo bar="baz">
+// <call>hey</call>
+// </foo>
+doc.save(std::cout, "\t", pugi::format_default & ~pugi::format_indent); // can also pass "" instead of indentation string for the same effect
+std::cout << std::endl;
+
+// raw output; prints
+// <?xml version="1.0"?><foo bar="baz"><call>hey</call></foo>
+doc.save(std::cout, "\t", pugi::format_raw);
+std::cout << std::endl << std::endl;
+
+// raw output without declaration; prints
+// <foo bar="baz"><call>hey</call></foo>
+doc.save(std::cout, "\t", pugi::format_raw | pugi::format_no_declaration);
+std::cout << std::endl;
+
+

+

+
+
+ +

+ pugixml supports all popular Unicode encodings (UTF-8, UTF-16 (big and little + endian), UTF-32 (big and little endian); UCS-2 is naturally supported since + it's a strict subset of UTF-16) and handles all encoding conversions during + output. The output encoding is set via the encoding + parameter of saving functions, which is of type xml_encoding. + The possible values for the encoding are documented in Encodings; + the only flag that has a different meaning is encoding_auto. +

+

+ While all other flags set the exact encoding, encoding_auto + is meant for automatic encoding detection. The automatic detection does not + make sense for output encoding, since there is usually nothing to infer the + actual encoding from, so here encoding_auto + means UTF-8 encoding, which is the most popular encoding for XML data storage. + This is also the default value of output encoding; specify another value + if you do not want UTF-8 encoded output. +

+

+ Also note that wide stream saving functions do not have encoding + argument and always assume encoding_wchar + encoding. +

+
+ + + + + +
[Note]Note

+ The current behavior for Unicode conversion is to skip all invalid UTF + sequences during conversion. This behavior should not be relied upon; if + your node/attribute names do not contain any valid UTF sequences, they + may be output as if they are empty, which will result in malformed XML + document. +

+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/manual/toc.html b/docs/manual/toc.html new file mode 100644 index 0000000..60a054a --- /dev/null +++ b/docs/manual/toc.html @@ -0,0 +1,130 @@ + + + +Table of Contents + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHome +
+
+
+ + +
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHome +
+ + diff --git a/docs/manual/xpath.html b/docs/manual/xpath.html new file mode 100644 index 0000000..731a969 --- /dev/null +++ b/docs/manual/xpath.html @@ -0,0 +1,494 @@ + + + +XPath + + + + + + + + + + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+
+
+

+ XPath +

+ +

+ If the task at hand is to select a subset of document nodes that match some + criteria, it is possible to code a function using the existing traversal functionality + for any practical criteria. However, often either a data-driven approach is + desirable, in case the criteria are not predefined and come from a file, or + it is inconvenient to use traversal interfaces and a higher-level DSL is required. + There is a standard language for XML processing, XPath, that can be useful + for these cases. pugixml implements an almost complete subset of XPath 1.0. + Because of differences in document object model and some performance implications, + there are minor violations of the official specifications, which can be found + in Conformance to W3C specification. The rest of this section describes the interface for XPath + functionality. Please note that if you wish to learn to use XPath language, + you have to look for other tutorials or manuals; for example, you can read + W3Schools XPath tutorial, + XPath tutorial + at tizag.com, and the XPath + 1.0 specification. +

+
+ + + + + +
[Note]Note

+ As of version 0.9, you need both STL and exception support to use XPath; + XPath is disabled if either PUGIXML_NO_STL + or PUGIXML_NO_EXCEPTIONS + is defined. +

+
+ +

+ Each XPath expression can have one of the following types: boolean, number, + string or node set. Boolean type corresponds to bool + type, number type corresponds to double + type, string type corresponds to either std::string + or std::wstring, depending on whether wide + character interface is enabled, and node set corresponds to xpath_node_set type. There is an enumeration, + xpath_value_type, which can + take the values xpath_type_boolean, + xpath_type_number, xpath_type_string or xpath_type_node_set, + accordingly. +

+

+ Because an XPath node can be either a node or an attribute, there is a special + type, xpath_node, which is + a discriminated union of these types. A value of this type contains two node + handles, one of xml_node + type, and another one of xml_attribute + type; at most one of them can be non-null. The accessors to get these handles + are available: +

+
xml_node xpath_node::node() const;
+xml_attribute xpath_node::attribute() const;
+
+

+ XPath nodes can be null, in which case both accessors return null handles. +

+

+ Note that as per XPath specification, each XPath node has a parent, which + can be retrieved via this function: +

+
xml_node xpath_node::parent() const;
+
+

+ parent function returns the + node's parent if the XPath node corresponds to xml_node + handle (equivalent to node().parent()), or the node to which the attribute belongs + to, if the XPath node corresponds to xml_attribute + handle. For null nodes, parent + returns null handle. +

+

+ Like node and attribute handles, XPath node handles can be implicitly cast + to boolean-like object to check if it is a null node, and also can be compared + for equality with each other. +

+

+ You can also create XPath nodes with one of tree constructors: the default + constructor, the constructor that takes node argument, and the constructor + that takes attribute and node arguments (in which case the attribute must + belong to the attribute list of the node). However, usually you don't need + to create your own XPath node objects, since they are returned to you via + selection functions. +

+

+ XPath expressions operate not on single nodes, but instead on node sets. + A node set is a collection of nodes, which can be optionally ordered in either + a forward document order or a reverse one. Document order is defined in XPath + specification; an XPath node is before another node in document order if + it appears before it in XML representation of the corresponding document. +

+

+ Node sets are represented by xpath_node_set + object, which has an interface that resembles one of sequential random-access + containers. It has an iterator type along with usual begin/past-the-end iterator + accessors: +

+
typedef const xpath_node* xpath_node_set::const_iterator;
+const_iterator xpath_node_set::begin() const;
+const_iterator xpath_node_set::end() const;
+
+

+ And it also can be iterated via indices, just like std::vector: +

+
const xpath_node& xpath_node_set::operator[](size_t index) const;
+size_t xpath_node_set::size() const;
+bool xpath_node_set::empty() const;
+
+

+ All of the above operations have the same semantics as that of std::vector: + the iterators are random-access, all of the above operations are constant + time, and accessing the element at index that is greater or equal than the + set size results in undefined behavior. You can use both iterator-based and + index-based access for iteration, however the iterator-based can be faster. +

+

+ The order of iteration depends on the order of nodes inside the set; the + order can be queried via the following function: +

+
enum xpath_node_set::type_t {type_unsorted, type_sorted, type_sorted_reverse};
+type_t xpath_node_set::type() const;
+
+

+ type function returns the + current order of nodes; type_sorted + means that the nodes are in forward document order, type_sorted_reverse + means that the nodes are in reverse document order, and type_unsorted + means that neither order is guaranteed (nodes can accidentally be in a sorted + order even if type() + returns type_unsorted). If + you require a specific order of iteration, you can change it via sort function: +

+
void xpath_node_set::sort(bool reverse = false);
+
+

+ Calling sort sorts the nodes + in either forward or reverse document order, depending on the argument; after + this call type() + will return type_sorted or + type_sorted_reverse. +

+

+ Often the actual iteration is not needed; instead, only the first element + in document order is required. For this, a special accessor is provided: +

+
xpath_node xpath_node_set::first() const;
+
+

+ This function returns the first node in forward document order from the set, + or null node if the set is empty. Note that while the result of the node + does not depend on the order of nodes in the set (i.e. on the result of + type()), + the complexity does - if the set is sorted, the complexity is constant, otherwise + it is linear in the number of elements or worse. +

+
+
+ +

+ If you want to select nodes that match some XPath expression, you can do + it with the following functions: +

+
xpath_node xml_node::select_single_node(const char_t* query) const;
+xpath_node_set xml_node::select_nodes(const char_t* query) const;
+
+

+ select_nodes function compiles + the expression and then executes it with the node as a context node, and + returns the resulting node set. select_single_node + returns only the first node in document order from the result, and is equivalent + to calling select_nodes(query).first(). + If the XPath expression does not match anything, or the node handle is null, + select_nodes returns an empty + set, and select_single_node + returns null XPath node. +

+

+ Both functions throw xpath_exception + if the query can not be compiled or if it returns a value with type other + than node set; see Error handling for details. +

+

+ While compiling expressions is fast, the compilation time can introduce a + significant overhead if the same expression is used many times on small subtrees. + If you're doing many similar queries, consider compiling them into query + objects (see Using query objects for further reference). Once you get a compiled + query object, you can pass it to select functions instead of an expression + string: +

+
xpath_node xml_node::select_single_node(const xpath_query& query) const;
+xpath_node_set xml_node::select_nodes(const xpath_query& query) const;
+
+

+ Both functions throw xpath_exception + if the query returns a value with type other than node set. +

+

+ This is an example of selecting nodes using XPath expressions (samples/xpath_select.cpp): +

+

+ +

+
pugi::xpath_node_set tools = doc.select_nodes("/Profile/Tools/Tool[@AllowRemote='true' and @DeriveCaptionFrom='lastparam']");
+
+std::cout << "Tools:";
+
+for (pugi::xpath_node_set::const_iterator it = tools.begin(); it != tools.end(); ++it)
+{
+    pugi::xpath_node node = *it;
+    std::cout << " " << node.node().attribute("Filename").value();
+}
+
+pugi::xpath_node build_tool = doc.select_single_node("//Tool[contains(Description, 'build system')]");
+
+std::cout << "\nBuild tool: " << build_tool.node().attribute("Filename").value() << "\n";
+
+

+

+
+
+ +

+ When you call select_nodes + with an expression string as an argument, a query object is created behind + the scene. A query object represents a compiled XPath expression. Query objects + can be needed in the following circumstances: +

+
    +
  • + You can precompile expressions to query objects to save compilation time + if it becomes an issue; +
  • +
  • + You can use query objects to evaluate XPath expressions which result + in booleans, numbers or strings; +
  • +
  • + You can get the type of expression value via query object. +
  • +
+

+ Query objects correspond to xpath_query + type. They are immutable and non-copyable: they are bound to the expression + at creation time and can not be cloned. If you want to put query objects + in a container, allocate them on heap via new + operator and store pointers to xpath_query + in the container. +

+

+ You can create a query object with the constructor that takes XPath expression + as an argument: +

+
explicit xpath_query::xpath_query(const char_t* query);
+
+

+ The expression is compiled and the compiled representation is stored in the + new query object. If compilation fails, xpath_exception + is thrown (see Error handling for details). After the query is created, + you can query the type of the evaluation result using the following function: +

+
xpath_value_type xpath_query::return_type() const;
+
+

+ You can evaluate the query using one of the following functions: +

+
bool xpath_query::evaluate_boolean(const xml_node& n) const;
+double xpath_query::evaluate_number(const xml_node& n) const;
+string_t xpath_query::evaluate_string(const xml_node& n) const;
+xpath_node_set xpath_query::evaluate_node_set(const xml_node& n) const;
+
+

+ All functions take the context node as an argument, compute the expression + and return the result, converted to the requested type. By XPath specification, + value of any type can be converted to boolean, number or string value, but + no type other than node set can be converted to node set. Because of this, + evaluate_boolean, evaluate_number and evaluate_string + always return a result, but evaluate_node_set + throws an xpath_exception + if the return type is not node set. +

+
+ + + + + +
[Note]Note

+ Calling node.select_nodes("query") + is equivalent to calling xpath_query("query").evaluate_node_set(node). +

+

+ This is an example of using query objects (samples/xpath_query.cpp): +

+

+ +

+
// Select nodes via compiled query
+pugi::xpath_query query_remote_tools("/Profile/Tools/Tool[@AllowRemote='true']");
+
+pugi::xpath_node_set tools = query_remote_tools.evaluate_node_set(doc);
+std::cout << "Remote tool: ";
+tools[2].node().print(std::cout);
+
+// Evaluate numbers via compiled query
+pugi::xpath_query query_timeouts("sum(//Tool/@Timeout)");
+std::cout << query_timeouts.evaluate_number(doc) << std::endl;
+
+// Evaluate strings via compiled query for different context nodes
+pugi::xpath_query query_name_valid("string-length(substring-before(@Filename, '_')) > 0 and @OutputFileMasks");
+pugi::xpath_query query_name("concat(substring-before(@Filename, '_'), ' produces ', @OutputFileMasks)");
+
+for (pugi::xml_node tool = doc.first_element_by_path("Profile/Tools/Tool"); tool; tool = tool.next_sibling())
+{
+    std::string s = query_name.evaluate_string(tool);
+
+    if (query_name_valid.evaluate_boolean(tool)) std::cout << s << std::endl;
+}
+
+

+

+
+
+ +

+ As of version 0.9, all XPath errors result in thrown exceptions. The errors + can arise during expression compilation or node set evaluation. In both cases, + an xpath_exception object + is thrown. This is an exception object that implements std::exception + interface, and thus has a single function what(): +

+
virtual const char* xpath_exception::what() const throw();
+
+

+ This function returns the error message. Currently it is impossible to get + the exact place where query compilation failed. This functionality, along + with optional error handling without exceptions, will be available in version + 1.0. +

+

+ This is an example of XPath error handling (samples/xpath_error.cpp): +

+

+ +

+
// Exception is thrown for incorrect query syntax
+try
+{
+    doc.select_nodes("//nodes[#true()]");
+}
+catch (const pugi::xpath_exception& e)
+{
+    std::cout << "Select failed: " << e.what() << std::endl;
+}
+
+// Exception is thrown for incorrect query semantics
+try
+{
+    doc.select_nodes("(123)/next");
+}
+catch (const pugi::xpath_exception& e)
+{
+    std::cout << "Select failed: " << e.what() << std::endl;
+}
+
+// Exception is thrown for query with incorrect return type
+try
+{
+    doc.select_nodes("123");
+}
+catch (const pugi::xpath_exception& e)
+{
+    std::cout << "Select failed: " << e.what() << std::endl;
+}
+
+

+

+
+
+ +

+ Because of the differences in document object models, performance considerations + and implementation complexity, pugixml does not provide a fully conformant + XPath 1.0 implementation. This is the current list of incompatibilities: +

+
    +
  • + Consecutive text nodes sharing the same parent are not merged, i.e. in + <node>text1 + <![CDATA[data]]> text2</node> node should have one text node children, + but instead has three. +
  • +
  • + Since document can't have a document type declaration, id() + function always returns an empty node set. +
  • +
  • + Namespace nodes are not supported (affects namespace:: axis). +
  • +
  • + Name tests are performed on QNames in XML document instead of expanded + names; for <foo + xmlns:ns1='uri' xmlns:ns2='uri'><ns1:child/><ns2:child/></foo>, + query foo/ns1:* + will return only the first child, not both of them. Compliant XPath implementations + can return both nodes if the user provides appropriate namespace declarations. +
  • +
  • + String functions consider a character to be either a single char value or a single wchar_t + value, depending on the library configuration; this means that some string + functions are not fully Unicode-aware. This affects substring(), string-length() and translate() functions. +
  • +
  • + Variable references are not supported. +
  • +
+

+ Some of these incompatibilities will be fixed in version 1.0. +

+
+
+ + + +
+
+ + + +
pugixml 0.9 manual | + Overview | + Installation | + Document: + Object model · Loading · Accessing · Modifying · Saving | + XPath | + API Reference | + Table of Contents +
+PrevUpHomeNext +
+ + diff --git a/docs/quickstart.html b/docs/quickstart.html new file mode 100644 index 0000000..4fc4524 --- /dev/null +++ b/docs/quickstart.html @@ -0,0 +1,828 @@ + + + +pugixml 0.9 + + + + + +
+ + +
+ +

+ pugixml is a light-weight C++ XML processing library. It consists of a DOM-like + interface with rich traversal/modification capabilities, an extremely fast + XML parser which constructs the DOM tree from an XML file/buffer, and an + XPath 1.0 implementation for complex data-driven tree queries. Full Unicode + support is also available, with Unicode interface variants and conversions + between different Unicode encodings (which happen automatically during parsing/saving). + The library is extremely portable and easy to integrate and use. pugixml + is developed and maintained since 2006 and has many users. All code is distributed + under the MIT license, making it completely free to use in both open-source + and proprietary applications. +

+

+ pugixml enables very fast, convenient and memory-efficient XML document processing. + However, since pugixml has a DOM parser, it can't process XML documents that + do not fit in memory; also the parser is a non-validating one, so if you + need DTD/Schema validation, the library is not for you. +

+

+ This is the quick start guide for pugixml, which purpose is to enable you + to start using the library quickly. Many important library features are either + not described at all or only mentioned briefly; for more complete information + you should read the complete manual. +

+
+ + + + + +
[Note]Note

+ No documentation is perfect, neither is this one. If you encounter a description + that is unclear, please file an issue as described in Feedback. Also if + you can spare the time for a full proof-reading, including spelling and + grammar, that would be great! Please send me an e-mail; + as a token of appreciation, your name will be included into the corresponding + section of the manual. +

+
+
+ +

+ pugixml is distributed in source form. You can download a source distribution + via one of the following links: +

+
http://pugixml.googlecode.com/files/pugixml-0.9.zip
+http://pugixml.googlecode.com/files/pugixml-0.9.tar.gz
+
+

+ The distribution contains library source, documentation (the guide you're + reading now and the manual) and some code examples. After downloading the + distribution, install pugixml by extracting all files from the compressed + archive. +

+

+ The complete pugixml source consists of four files - two source files, pugixml.cpp and + pugixpath.cpp, and two header files, pugixml.hpp and pugiconfig.hpp. pugixml.hpp is + the primary header which you need to include in order to use pugixml classes/functions. + The rest of this guide assumes that pugixml.hpp is either in the current directory + or in one of include directories of your projects, so that #include "pugixml.hpp" + can find the header; however you can also use relative path (i.e. #include "../libs/pugixml/src/pugixml.hpp") + or include directory-relative path (i.e. #include + <xml/thirdparty/pugixml/src/pugixml.hpp>). +

+

+ The easiest way to build pugixml is to compile two source files, pugixml.cpp and + pugixpath.cpp, along with the existing library/executable. This process depends + on the method of building your application; for example, if you're using + Microsoft Visual Studio[1], Apple Xcode, Code::Blocks or any other IDE, just add pugixml.cpp and + pugixpath.cpp to one of your projects. There are other building methods available, + including building pugixml as a standalone static/shared library; read the + manual for further information. +

+
+
+ +

+ pugixml stores XML data in DOM-like way: the entire XML document (both document + structure and element data) is stored in memory as a tree. The tree can be + loaded from character stream (file, string, C++ I/O stream), then traversed + via special API or XPath expressions. The whole tree is mutable: both node + structure and node/attribute data can be changed at any time. Finally, the + result of document transformations can be saved to a character stream (file, + C++ I/O stream or custom transport). +

+

+ The root of the tree is the document itself, which corresponds to C++ type + xml_document. Document has + one or more child nodes, which correspond to C++ type xml_node. + Nodes have different types; depending on a type, a node can have a collection + of child nodes, a collection of attributes, which correspond to C++ type + xml_attribute, and some additional + data (i.e. name). +

+

+ The most common node types are: +

+
    +
  • + Document node (node_document) + - this is the root of the tree, which consists of several child nodes. + This node corresponds to xml_document + class; note that xml_document + is a sub-class of xml_node, + so the entire node interface is also available. +
  • +
  • + Element/tag node (node_element) + - this is the most common type of node, which represents XML elements. + Element nodes have a name, a collection of attributes and a collection + of child nodes (both of which may be empty). The attribute is a simple + name/value pair. +
  • +
  • + Plain character data nodes (node_pcdata) + represent plain text in XML. PCDATA nodes have a value, but do not have + name or children/attributes. Note that plain character data is not a + part of the element node but instead has its own node; for example, an + element node can have several child PCDATA nodes. +
  • +
+

+ Despite the fact that there are several node types, there are only three + C++ types representing the tree (xml_document, + xml_node, xml_attribute); + some operations on xml_node + are only valid for certain node types. They are described below. +

+
+ + + + + +
[Note]Note

+ All pugixml classes and functions are located in pugi + namespace; you have to either use explicit name qualification (i.e. pugi::xml_node), or to gain access to relevant + symbols via using directive + (i.e. using pugi::xml_node; or using + namespace pugi;). +

+

+ xml_document is the owner + of the entire document structure; destroying the document destroys the whole + tree. The interface of xml_document + consists of loading functions, saving functions and the interface of xml_node, which allows for document inspection + and/or modification. Note that while xml_document + is a sub-class of xml_node, + xml_node is not a polymorphic + type; the inheritance is only used to simplify usage. +

+

+ xml_node is the handle to + document node; it can point to any node in the document, including document + itself. There is a common interface for nodes of all types. Note that xml_node is only a handle to the actual + node, not the node itself - you can have several xml_node + handles pointing to the same underlying object. Destroying xml_node handle does not destroy the node + and does not remove it from the tree. +

+

+ There is a special value of xml_node + type, known as null node or empty node. It does not correspond to any node + in any document, and thus resembles null pointer. However, all operations + are defined on empty nodes; generally the operations don't do anything and + return empty nodes/attributes or empty strings as their result. This is useful + for chaining calls; i.e. you can get the grandparent of a node like so: + node.parent().parent(); + if a node is a null node or it does not have a parent, the first parent() + call returns null node; the second parent() call then also returns null node, so you + don't have to check for errors twice. You can test if a handle is null via + implicit boolean cast: if (node) { ... } + or if (!node) { ... }. +

+

+ xml_attribute is the handle + to an XML attribute; it has the same semantics as xml_node, + i.e. there can be several xml_attribute + handles pointing to the same underlying object, there is a special null attribute + value, which propagates to function results. +

+

+ There are two choices of interface and internal representation when configuring + pugixml: you can either choose the UTF-8 (also called char) interface or + UTF-16/32 (also called wchar_t) one. The choice is controlled via PUGIXML_WCHAR_MODE define; you can set + it via pugiconfig.hpp or via preprocessor options. All tree functions that + work with strings work with either C-style null terminated strings or STL + strings of the selected character type. Read the manual for additional information + on Unicode interface. +

+
+
+ +

+ pugixml provides several functions for loading XML data from various places + - files, C++ iostreams, memory buffers. All functions use an extremely fast + non-validating parser. This parser is not fully W3C conformant - it can load + any valid XML document, but does not perform some well-formedness checks. + While considerable effort is made to reject invalid XML documents, some validation + is not performed because of performance reasons. XML data is always converted + to internal character format before parsing. pugixml supports all popular + Unicode encodings (UTF-8, UTF-16 (big and little endian), UTF-32 (big and + little endian); UCS-2 is naturally supported since it's a strict subset of + UTF-16) and handles all encoding conversions automatically. +

+

+ The most common source of XML data is files; pugixml provides a separate + function for loading XML document from file. This function accepts file path + as its first argument, and also two optional arguments, which specify parsing + options and input data encoding, which are described in the manual. +

+

+ This is an example of loading XML document from file (samples/load_file.cpp): +

+

+ +

+
pugi::xml_document doc;
+
+pugi::xml_parse_result result = doc.load_file("tree.xml");
+
+std::cout << "Load result: " << result.description() << ", mesh name: " << doc.child("mesh").attribute("name").value() << std::endl;
+
+

+

+

+ load_file, as well as other + loading functions, destroys the existing document tree and then tries to + load the new tree from the specified file. The result of the operation is + returned in an xml_parse_result + object; this object contains the operation status, and the related information + (i.e. last successfully parsed position in the input file, if parsing fails). +

+

+ Parsing result object can be implicitly converted to bool; + if you do not want to handle parsing errors thoroughly, you can just check + the return value of load functions as if it was a bool: + if (doc.load_file("file.xml")) { ... + } else { ... }. + Otherwise you can use the status + member to get parsing status, or the description() member function to get the status in a + string form. +

+

+ This is an example of handling loading errors (samples/load_error_handling.cpp): +

+

+ +

+
pugi::xml_document doc;
+pugi::xml_parse_result result = doc.load(source);
+
+if (result)
+    std::cout << "XML [" << source << "] parsed without errors, attr value: [" << doc.child("node").attribute("attr").value() << "]\n\n";
+else
+{
+    std::cout << "XML [" << source << "] parsed with errors, attr value: [" << doc.child("node").attribute("attr").value() << "]\n";
+    std::cout << "Error description: " << result.description() << "\n";
+    std::cout << "Error offset: " << result.offset << " (error at [..." << (source + result.offset) << "]\n\n";
+}
+
+

+

+

+ Sometimes XML data should be loaded from some other source than file, i.e. + HTTP URL; also you may want to load XML data from file using non-standard + functions, i.e. to use your virtual file system facilities or to load XML + from gzip-compressed files. These scenarios either require loading document + from memory, in which case you should prepare a contiguous memory block with + all XML data and to pass it to one of buffer loading functions, or loading + document from C++ IOstream, in which case you should provide an object which + implements std::istream or std::wistream + interface. +

+

+ There are different functions for loading document from memory; they treat + the passed buffer as either an immutable one (load_buffer), + a mutable buffer which is owned by the caller (load_buffer_inplace), + or a mutable buffer which ownership belongs to pugixml (load_buffer_inplace_own). + There is also a simple helper function, xml_document::load, + for cases when you want to load the XML document from null-terminated character + string. +

+

+ This is an example of loading XML document from memory using one of these + functions (samples/load_memory.cpp); + read the sample code for more examples: +

+

+ +

+
const char source[] = "<mesh name='sphere'><bounds>0 0 1 1</bounds></mesh>";
+size_t size = sizeof(source);
+
+

+

+

+ +

+
// You can use load_buffer_inplace to load document from mutable memory block; the block's lifetime must exceed that of document
+char* buffer = new char[size];
+memcpy(buffer, source, size);
+
+// The block can be allocated by any method; the block is modified during parsing
+pugi::xml_parse_result result = doc.load_buffer_inplace(buffer, size);
+
+// You have to destroy the block yourself after the document is no longer used
+delete[] buffer;
+
+

+

+

+ This is a simple example of loading XML document from file using streams + (samples/load_stream.cpp); read + the sample code for more complex examples involving wide streams and locales: +

+

+ +

+
std::ifstream stream("weekly-utf-8.xml");
+pugi::xml_parse_result result = doc.load(stream);
+
+

+

+
+
+ +

+ pugixml features an extensive interface for getting various types of data + from the document and for traversing the document. You can use various accessors + to get node/attribute data, you can traverse the child node/attribute lists + via accessors or iterators, you can do depth-first traversals with xml_tree_walker objects, and you can use + XPath for complex data-driven queries. +

+

+ You can get node or attribute name via name() accessor, and value via value() accessor. Note that both functions never + return null pointers - they either return a string with the relevant content, + or an empty string if name/value is absent or if the handle is null. Also + there are two notable things for reading values: +

+
    +
  • + It is common to store data as text contents of some node - i.e. <node><description>This + is a + node</description></node>. + In this case, <description> node does not have a value, but instead + has a child of type node_pcdata + with value "This is a node". + pugixml provides child_value() helper functions to parse such data. +
  • +
  • + In many cases attribute values have types that are not strings - i.e. + an attribute may always contain values that should be treated as integers, + despite the fact that they are represented as strings in XML. pugixml + provides several accessors that convert attribute value to some other + type. +
  • +
+

+ This is an example of using these functions (samples/traverse_base.cpp): +

+

+ +

+
for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+{
+    std::cout << "Tool " << tool.attribute("Filename").value();
+    std::cout << ": AllowRemote " << tool.attribute("AllowRemote").as_bool();
+    std::cout << ", Timeout " << tool.attribute("Timeout").as_int();
+    std::cout << ", Description '" << tool.child_value("Description") << "'\n";
+}
+
+

+

+

+ Since a lot of document traversal consists of finding the node/attribute + with the correct name, there are special functions for that purpose. For + example, child("Tool") + returns the first node which has the name "Tool", + or null handle if there is no such node. This is an example of using such + functions (samples/traverse_base.cpp): +

+

+ +

+
std::cout << "Tool for *.dae generation: " << tools.find_child_by_attribute("Tool", "OutputFileMasks", "*.dae").attribute("Filename").value() << "\n";
+
+for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+{
+    std::cout << "Tool " << tool.attribute("Filename").value() << "\n";
+}
+
+

+

+

+ Child node lists and attribute lists are simply double-linked lists; while + you can use previous_sibling/next_sibling and other such functions for + iteration, pugixml additionally provides node and attribute iterators, so + that you can treat nodes as containers of other nodes or attributes. All + iterators are bidirectional and support all usual iterator operations. The + iterators are invalidated if the node/attribute objects they're pointing + to are removed from the tree; adding nodes/attributes does not invalidate + any iterators. +

+

+ Here is an example of using iterators for document traversal (samples/traverse_iter.cpp): +

+

+ +

+
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
+{
+    std::cout << "Tool:";
+
+    for (pugi::xml_attribute_iterator ait = it->attributes_begin(); ait != it->attributes_end(); ++ait)
+    {
+        std::cout << " " << ait->name() << "=" << ait->value();
+    }
+
+    std::cout << std::endl;
+}
+
+

+

+

+ The methods described above allow traversal of immediate children of some + node; if you want to do a deep tree traversal, you'll have to do it via a + recursive function or some equivalent method. However, pugixml provides a + helper for depth-first traversal of a subtree. In order to use it, you have + to implement xml_tree_walker + interface and to call traverse + function. +

+

+ This is an example of traversing tree hierarchy with xml_tree_walker (samples/traverse_walker.cpp): +

+

+ +

+
struct simple_walker: pugi::xml_tree_walker
+{
+    virtual bool for_each(pugi::xml_node& node)
+    {
+        for (int i = 0; i < depth(); ++i) std::cout << "  "; // indentation
+
+        std::cout << node_types[node.type()] << ": name='" << node.name() << "', value='" << node.value() << "'\n";
+
+        return true; // continue traversal
+    }
+};
+
+

+

+

+ +

+
simple_walker walker;
+doc.traverse(walker);
+
+

+

+

+ Finally, for complex queries often a higher-level DSL is needed. pugixml + provides an implementation of XPath 1.0 language for such queries. The complete + description of XPath usage can be found in the manual, but here are some + examples: +

+

+ +

+
pugi::xpath_node_set tools = doc.select_nodes("/Profile/Tools/Tool[@AllowRemote='true' and @DeriveCaptionFrom='lastparam']");
+
+std::cout << "Tools:";
+
+for (pugi::xpath_node_set::const_iterator it = tools.begin(); it != tools.end(); ++it)
+{
+    pugi::xpath_node node = *it;
+    std::cout << " " << node.node().attribute("Filename").value();
+}
+
+pugi::xpath_node build_tool = doc.select_single_node("//Tool[contains(Description, 'build system')]");
+
+std::cout << "\nBuild tool: " << build_tool.node().attribute("Filename").value() << "\n";
+
+

+

+
+ + + + + +
[Caution]Caution

+ XPath functions throw xpath_exception + objects on error; the sample above does not catch these exceptions. +

+
+
+ +

+ The document in pugixml is fully mutable: you can completely change the document + structure and modify the data of nodes/attributes. All functions take care + of memory management and structural integrity themselves, so they always + result in structurally valid tree - however, it is possible to create an + invalid XML tree (for example, by adding two attributes with the same name + or by setting attribute/node name to empty/invalid string). Tree modification + is optimized for performance and for memory consumption, so if you have enough + memory you can create documents from scratch with pugixml and later save + them to file/stream instead of relying on error-prone manual text writing + and without too much overhead. +

+

+ All member functions that change node/attribute data or structure are non-constant + and thus can not be called on constant handles. However, you can easily convert + constant handle to non-constant one by simple assignment: void + foo(const pugi::xml_node& n) { pugi::xml_node nc = n; }, so const-correctness + here mainly provides additional documentation. +

+

+ As discussed before, nodes can have name and value, both of which are strings. + Depending on node type, name or value may be absent. You can use set_name and set_value + member functions to set them. Similar functions are available for attributes; + however, the set_value function + is overloaded for some other types except strings, like floating-point numbers. + Also, attribute value can be set using an assignment operator. This is an + example of setting node/attribute name and value (samples/modify_base.cpp): +

+

+ +

+
pugi::xml_node node = doc.child("node");
+
+// change node name
+std::cout << node.set_name("notnode");
+std::cout << ", new node name: " << node.name() << std::endl;
+
+// change comment text
+std::cout << doc.last_child().set_value("useless comment");
+std::cout << ", new comment text: " << doc.last_child().value() << std::endl;
+
+// we can't change value of the element or name of the comment
+std::cout << node.set_value("1") << ", " << doc.last_child().set_name("2") << std::endl;
+
+

+

+

+ +

+
pugi::xml_attribute attr = node.attribute("id");
+
+// change attribute name/value
+std::cout << attr.set_name("key") << ", " << attr.set_value("345");
+std::cout << ", new attribute: " << attr.name() << "=" << attr.value() << std::endl;
+
+// we can use numbers or booleans
+attr.set_value(1.234);
+std::cout << "new attribute value: " << attr.value() << std::endl;
+
+// we can also use assignment operators for more concise code
+attr = true;
+std::cout << "final attribute value: " << attr.value() << std::endl;
+
+

+

+

+ Nodes and attributes do not exist outside of document tree, so you can't + create them without adding them to some document. A node or attribute can + be created at the end of node/attribute list or before/after some other node. + All insertion functions return the handle to newly created object on success, + and null handle on failure. Even if the operation fails (for example, if + you're trying to add a child node to PCDATA node), the document remains in + consistent state, but the requested node/attribute is not added. +

+
+ + + + + +
[Caution]Caution

+ attribute() and child() functions do not add attributes or nodes to the + tree, so code like node.attribute("id") = 123; will not do anything if node does not have an attribute with + name "id". Make sure + you're operating with existing attributes/nodes by adding them if necessary. +

+

+ This is an example of adding new attributes/nodes to the document (samples/modify_add.cpp): +

+

+ +

+
// add node with some name
+pugi::xml_node node = doc.append_child();
+node.set_name("node");
+
+// add description node with text child
+pugi::xml_node descr = node.append_child();
+descr.set_name("description");
+descr.append_child(pugi::node_pcdata).set_value("Simple node");
+
+// add param node before the description
+pugi::xml_node param = node.insert_child_before(pugi::node_element, descr);
+param.set_name("param");
+
+// add attributes to param node
+param.append_attribute("name") = "version";
+param.append_attribute("value") = 1.1;
+param.insert_attribute_after("type", param.attribute("name")) = "float";
+
+

+

+

+ If you do not want your document to contain some node or attribute, you can + remove it with remove_attribute + and remove_child functions. + Removing the attribute or node invalidates all handles to the same underlying + object, and also invalidates all iterators pointing to the same object. Removing + node also invalidates all past-the-end iterators to its attribute or child + node list. Be careful to ensure that all such handles and iterators either + do not exist or are not used after the attribute/node is removed. +

+

+ This is an example of removing attributes/nodes from the document (samples/modify_remove.cpp): +

+

+ +

+
// remove description node with the whole subtree
+pugi::xml_node node = doc.child("node");
+node.remove_child("description");
+
+// remove id attribute
+pugi::xml_node param = node.child("param");
+param.remove_attribute("value");
+
+// we can also remove nodes/attributes by handles
+pugi::xml_attribute id = param.attribute("name");
+param.remove_attribute(id);
+
+

+

+
+
+ +

+ Often after creating a new document or loading the existing one and processing + it, it is necessary to save the result back to file. Also it is occasionally + useful to output the whole document or a subtree to some stream; use cases + include debug printing, serialization via network or other text-oriented + medium, etc. pugixml provides several functions to output any subtree of + the document to a file, stream or another generic transport interface; these + functions allow to customize the output format, and also perform necessary + encoding conversions. +

+

+ The node/attribute data is written to the destination properly formatted + according to the node type; all special XML symbols, such as < and &, + are properly escaped. In order to guard against forgotten node/attribute + names, empty node/attribute names are printed as ":anonymous". + For proper output, make sure all node and attribute names are set to meaningful + values. +

+

+ If you want to save the whole document to a file, you can use the save_file function, which returns true on success. This is a simple example + of saving XML document to file (samples/save_file.cpp): +

+

+ +

+
// save document to file
+std::cout << "Saving result: " << doc.save_file("save_file_output.xml") << std::endl;
+
+

+

+

+ For additional interoperability pugixml provides functions for saving document + to any object which implements C++ std::ostream interface. This allows you + to save documents to any standard C++ stream (i.e. file stream) or any third-party + compliant implementation (i.e. Boost Iostreams). Most notably, this allows + for easy debug output, since you can use std::cout + stream as saving target. There are two functions, one works with narrow character + streams, another handles wide character ones. +

+

+ This is a simple example of saving XML document to standard output (samples/save_stream.cpp): +

+

+ +

+
// save document to standard output
+std::cout << "Document:\n";
+doc.save(std::cout);
+
+

+

+

+ All of the above saving functions are implemented in terms of writer interface. + This is a simple interface with a single function, which is called several + times during output process with chunks of document data as input. In order + to output the document via some custom transport, for example sockets, you + should create an object which implements xml_writer_file + interface and pass it to xml_document::save + function. +

+

+ This is a simple example of custom writer for saving document data to STL + string (samples/save_custom_writer.cpp); + read the sample code for more complex examples: +

+

+ +

+
struct xml_string_writer: pugi::xml_writer
+{
+    std::string result;
+
+    virtual void write(const void* data, size_t size)
+    {
+        result += std::string(static_cast<const char*>(data), size);
+    }
+};
+
+

+

+

+ While the previously described functions saved the whole document to the + destination, it is easy to save a single subtree. Instead of calling xml_document::save, just call xml_node::print + function on the target node. You can save node contents to C++ IOstream object + or custom writer in this way. Saving a subtree slightly differs from saving + the whole document; read the manual for more information. +

+
+
+ +

+ If you believe you've found a bug in pugixml, please file an issue via issue submission form. + Be sure to include the relevant information so that the bug can be reproduced: + the version of pugixml, compiler version and target architecture, the code + that uses pugixml and exhibits the bug, etc. Feature requests and contributions + can be filed as issues, too. +

+

+ If filing an issue is not possible due to privacy or other concerns, you + can contact pugixml author by e-mail directly: arseny.kapoulkine@gmail.com. +

+
+
+ +

+ The pugixml library is distributed under the MIT license: +

+
+

+ Copyright (c) 2006-2010 Arseny Kapoulkine +

+

+ Permission is hereby granted, free of charge, to any person obtaining a + copy of this software and associated documentation files (the "Software"), + to deal in the Software without restriction, including without limitation + the rights to use, copy, modify, merge, publish, distribute, sublicense, + and/or sell copies of the Software, and to permit persons to whom the Software + is furnished to do so, subject to the following conditions: +

+

+ The above copyright notice and this permission notice shall be included + in all copies or substantial portions of the Software. +

+

+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + IN THE SOFTWARE. +

+
+
+
+ + + +

Last revised: July 11, 2010 at 16:13:56 GMT

+ + -- cgit v1.2.3