XPath

+ XPath +

XPath types
Selecting nodes via XPath expression
Using query objects
Error handling
Conformance to W3C specification

+ If the task at hand is to select a subset of document nodes that match some + criteria, it is possible to code a function using the existing traversal functionality + for any practical criteria. However, often either a data-driven approach is + desirable, in case the criteria are not predefined and come from a file, or + it is inconvenient to use traversal interfaces and a higher-level DSL is required. + There is a standard language for XML processing, XPath, that can be useful + for these cases. pugixml implements an almost complete subset of XPath 1.0. + Because of differences in document object model and some performance implications, + there are minor violations of the official specifications, which can be found + in Conformance to W3C specification. The rest of this section describes the interface for XPath + functionality. Please note that if you wish to learn to use XPath language, + you have to look for other tutorials or manuals; for example, you can read + W3Schools XPath tutorial, + XPath tutorial + at tizag.com, and the XPath + 1.0 specification. +

+ + + + + +

	Note
	+ As of version 0.9, you need both STL and exception support to use XPath; + XPath is disabled if either `PUGIXML_NO_STL` + or `PUGIXML_NO_EXCEPTIONS` + is defined. +

+ XPath types +

+ Each XPath expression can have one of the following types: boolean, number, + string or node set. Boolean type corresponds to bool + type, number type corresponds to double + type, string type corresponds to either std::string + or std::wstring, depending on whether wide + character interface is enabled, and node set corresponds to xpath_node_set type. There is an enumeration, + xpath_value_type, which can + take the values xpath_type_boolean, + xpath_type_number, xpath_type_string or xpath_type_node_set, + accordingly. +

+ Because an XPath node can be either a node or an attribute, there is a special + type, xpath_node, which is + a discriminated union of these types. A value of this type contains two node + handles, one of xml_node + type, and another one of xml_attribute + type; at most one of them can be non-null. The accessors to get these handles + are available: +

xml_node xpath_node::node() const;
+xml_attribute xpath_node::attribute() const;
+

+ XPath nodes can be null, in which case both accessors return null handles. +

+ Note that as per XPath specification, each XPath node has a parent, which + can be retrieved via this function: +

xml_node xpath_node::parent() const;
+

+ parent function returns the + node's parent if the XPath node corresponds to xml_node + handle (equivalent to node().parent()), or the node to which the attribute belongs + to, if the XPath node corresponds to xml_attribute + handle. For null nodes, parent + returns null handle. +

+ Like node and attribute handles, XPath node handles can be implicitly cast + to boolean-like object to check if it is a null node, and also can be compared + for equality with each other. +

+ You can also create XPath nodes with one of tree constructors: the default + constructor, the constructor that takes node argument, and the constructor + that takes attribute and node arguments (in which case the attribute must + belong to the attribute list of the node). However, usually you don't need + to create your own XPath node objects, since they are returned to you via + selection functions. +

+ XPath expressions operate not on single nodes, but instead on node sets. + A node set is a collection of nodes, which can be optionally ordered in either + a forward document order or a reverse one. Document order is defined in XPath + specification; an XPath node is before another node in document order if + it appears before it in XML representation of the corresponding document. +

+ Node sets are represented by xpath_node_set + object, which has an interface that resembles one of sequential random-access + containers. It has an iterator type along with usual begin/past-the-end iterator + accessors: +

typedef const xpath_node* xpath_node_set::const_iterator;
+const_iterator xpath_node_set::begin() const;
+const_iterator xpath_node_set::end() const;
+

+ And it also can be iterated via indices, just like std::vector: +

const xpath_node& xpath_node_set::operator[](size_t index) const;
+size_t xpath_node_set::size() const;
+bool xpath_node_set::empty() const;
+

+ All of the above operations have the same semantics as that of std::vector: + the iterators are random-access, all of the above operations are constant + time, and accessing the element at index that is greater or equal than the + set size results in undefined behavior. You can use both iterator-based and + index-based access for iteration, however the iterator-based can be faster. +

+ The order of iteration depends on the order of nodes inside the set; the + order can be queried via the following function: +

enum xpath_node_set::type_t {type_unsorted, type_sorted, type_sorted_reverse};
+type_t xpath_node_set::type() const;
+

+ type function returns the + current order of nodes; type_sorted + means that the nodes are in forward document order, type_sorted_reverse + means that the nodes are in reverse document order, and type_unsorted + means that neither order is guaranteed (nodes can accidentally be in a sorted + order even if type() + returns type_unsorted). If + you require a specific order of iteration, you can change it via sort function: +

void xpath_node_set::sort(bool reverse = false);
+

+ Calling sort sorts the nodes + in either forward or reverse document order, depending on the argument; after + this call type() + will return type_sorted or + type_sorted_reverse. +

+ Often the actual iteration is not needed; instead, only the first element + in document order is required. For this, a special accessor is provided: +

xpath_node xpath_node_set::first() const;
+

+ This function returns the first node in forward document order from the set, + or null node if the set is empty. Note that while the result of the node + does not depend on the order of nodes in the set (i.e. on the result of + type()), + the complexity does - if the set is sorted, the complexity is constant, otherwise + it is linear in the number of elements or worse. +

+ Selecting nodes via XPath expression +

+ If you want to select nodes that match some XPath expression, you can do + it with the following functions: +

xpath_node xml_node::select_single_node(const char_t* query) const;
+xpath_node_set xml_node::select_nodes(const char_t* query) const;
+

+ select_nodes function compiles + the expression and then executes it with the node as a context node, and + returns the resulting node set. select_single_node + returns only the first node in document order from the result, and is equivalent + to calling select_nodes(query).first(). + If the XPath expression does not match anything, or the node handle is null, + select_nodes returns an empty + set, and select_single_node + returns null XPath node. +

+ Both functions throw xpath_exception + if the query can not be compiled or if it returns a value with type other + than node set; see Error handling for details. +

+ While compiling expressions is fast, the compilation time can introduce a + significant overhead if the same expression is used many times on small subtrees. + If you're doing many similar queries, consider compiling them into query + objects (see Using query objects for further reference). Once you get a compiled + query object, you can pass it to select functions instead of an expression + string: +

xpath_node xml_node::select_single_node(const xpath_query& query) const;
+xpath_node_set xml_node::select_nodes(const xpath_query& query) const;
+

+ Both functions throw xpath_exception + if the query returns a value with type other than node set. +

+ This is an example of selecting nodes using XPath expressions (samples/xpath_select.cpp): +

+ +

pugi::xpath_node_set tools = doc.select_nodes("/Profile/Tools/Tool[@AllowRemote='true' and @DeriveCaptionFrom='lastparam']");
+
+std::cout << "Tools:";
+
+for (pugi::xpath_node_set::const_iterator it = tools.begin(); it != tools.end(); ++it)
+{
+    pugi::xpath_node node = *it;
+    std::cout << " " << node.node().attribute("Filename").value();
+}
+
+pugi::xpath_node build_tool = doc.select_single_node("//Tool[contains(Description, 'build system')]");
+
+std::cout << "\nBuild tool: " << build_tool.node().attribute("Filename").value() << "\n";
+

+ Using query objects +

+ When you call select_nodes + with an expression string as an argument, a query object is created behind + the scene. A query object represents a compiled XPath expression. Query objects + can be needed in the following circumstances: +

+ You can precompile expressions to query objects to save compilation time + if it becomes an issue; +
+ You can use query objects to evaluate XPath expressions which result + in booleans, numbers or strings; +
+ You can get the type of expression value via query object. +

+ Query objects correspond to xpath_query + type. They are immutable and non-copyable: they are bound to the expression + at creation time and can not be cloned. If you want to put query objects + in a container, allocate them on heap via new + operator and store pointers to xpath_query + in the container. +

+ You can create a query object with the constructor that takes XPath expression + as an argument: +

explicit xpath_query::xpath_query(const char_t* query);
+

+ The expression is compiled and the compiled representation is stored in the + new query object. If compilation fails, xpath_exception + is thrown (see Error handling for details). After the query is created, + you can query the type of the evaluation result using the following function: +

xpath_value_type xpath_query::return_type() const;
+

+ You can evaluate the query using one of the following functions: +

bool xpath_query::evaluate_boolean(const xml_node& n) const;
+double xpath_query::evaluate_number(const xml_node& n) const;
+string_t xpath_query::evaluate_string(const xml_node& n) const;
+xpath_node_set xpath_query::evaluate_node_set(const xml_node& n) const;
+

+ All functions take the context node as an argument, compute the expression + and return the result, converted to the requested type. By XPath specification, + value of any type can be converted to boolean, number or string value, but + no type other than node set can be converted to node set. Because of this, + evaluate_boolean, evaluate_number and evaluate_string + always return a result, but evaluate_node_set + throws an xpath_exception + if the return type is not node set. +

+ + + + + +

	Note
	+ Calling `node.select_nodes("query")` + is equivalent to calling `xpath_query("query").evaluate_node_set(node)`. +

+ This is an example of using query objects (samples/xpath_query.cpp): +

+ +

// Select nodes via compiled query
+pugi::xpath_query query_remote_tools("/Profile/Tools/Tool[@AllowRemote='true']");
+
+pugi::xpath_node_set tools = query_remote_tools.evaluate_node_set(doc);
+std::cout << "Remote tool: ";
+tools[2].node().print(std::cout);
+
+// Evaluate numbers via compiled query
+pugi::xpath_query query_timeouts("sum(//Tool/@Timeout)");
+std::cout << query_timeouts.evaluate_number(doc) << std::endl;
+
+// Evaluate strings via compiled query for different context nodes
+pugi::xpath_query query_name_valid("string-length(substring-before(@Filename, '_')) > 0 and @OutputFileMasks");
+pugi::xpath_query query_name("concat(substring-before(@Filename, '_'), ' produces ', @OutputFileMasks)");
+
+for (pugi::xml_node tool = doc.first_element_by_path("Profile/Tools/Tool"); tool; tool = tool.next_sibling())
+{
+    std::string s = query_name.evaluate_string(tool);
+
+    if (query_name_valid.evaluate_boolean(tool)) std::cout << s << std::endl;
+}
+

+ Error handling +

+ As of version 0.9, all XPath errors result in thrown exceptions. The errors + can arise during expression compilation or node set evaluation. In both cases, + an xpath_exception object + is thrown. This is an exception object that implements std::exception + interface, and thus has a single function what(): +

virtual const char* xpath_exception::what() const throw();
+

+ This function returns the error message. Currently it is impossible to get + the exact place where query compilation failed. This functionality, along + with optional error handling without exceptions, will be available in version + 1.0. +

+ This is an example of XPath error handling (samples/xpath_error.cpp): +

+ +

// Exception is thrown for incorrect query syntax
+try
+{
+    doc.select_nodes("//nodes[#true()]");
+}
+catch (const pugi::xpath_exception& e)
+{
+    std::cout << "Select failed: " << e.what() << std::endl;
+}
+
+// Exception is thrown for incorrect query semantics
+try
+{
+    doc.select_nodes("(123)/next");
+}
+catch (const pugi::xpath_exception& e)
+{
+    std::cout << "Select failed: " << e.what() << std::endl;
+}
+
+// Exception is thrown for query with incorrect return type
+try
+{
+    doc.select_nodes("123");
+}
+catch (const pugi::xpath_exception& e)
+{
+    std::cout << "Select failed: " << e.what() << std::endl;
+}
+

+ Conformance to W3C specification +

+ Because of the differences in document object models, performance considerations + and implementation complexity, pugixml does not provide a fully conformant + XPath 1.0 implementation. This is the current list of incompatibilities: +

+ Consecutive text nodes sharing the same parent are not merged, i.e. in + <node>text1 + <![CDATA[data]]> text2</node> node should have one text node children, + but instead has three. +
+ Since document can't have a document type declaration, id() + function always returns an empty node set. +
+ Namespace nodes are not supported (affects namespace:: axis). +
+ Name tests are performed on QNames in XML document instead of expanded + names; for <foo + xmlns:ns1='uri' xmlns:ns2='uri'><ns1:child/><ns2:child/></foo>, + query foo/ns1:* + will return only the first child, not both of them. Compliant XPath implementations + can return both nodes if the user provides appropriate namespace declarations. +
+ String functions consider a character to be either a single char value or a single wchar_t + value, depending on the library configuration; this means that some string + functions are not fully Unicode-aware. This affects substring(), string-length() and translate() functions. +
+ Variable references are not supported. +

+ Some of these incompatibilities will be fixed in version 1.0. +