diff options
author | Arseny Kapoulkine <arseny.kapoulkine@gmail.com> | 2016-01-14 07:52:40 -0800 |
---|---|---|
committer | Arseny Kapoulkine <arseny.kapoulkine@gmail.com> | 2016-01-14 07:52:40 -0800 |
commit | c388dbeba4f5de655ca74eb21d0a6d29c5eaaee2 (patch) | |
tree | 2e4f67bf33ac0f4b982831b4cc31f61d50cec836 /docs/manual.adoc | |
parent | ad3b492c1a4b3bf3a3163aa2af1641f422dba33f (diff) | |
parent | 4f3be7616729cbf0c8768caf861331d710d457a8 (diff) |
Merge pull request #79 from zeux/embed-pcdata
Add parse_embed_pcdata flag
This flag determines if plain character data is be stored in the parent element's value. This significantly changes the structure of the document; this flag is only recommended for parsing documents with a lot of PCDATA nodes in a very memory-constrained environment.
Most high-level APIs continue to work; code that inspects DOM using first_child()/value() will have to be adapted.
Diffstat (limited to 'docs/manual.adoc')
-rw-r--r-- | docs/manual.adoc | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/docs/manual.adoc b/docs/manual.adoc index ccf3fe7..1d8d88a 100644 --- a/docs/manual.adoc +++ b/docs/manual.adoc @@ -746,6 +746,9 @@ These flags control the resulting tree contents: * [[parse_ws_pcdata_single]]`parse_ws_pcdata_single` determines if whitespace-only PCDATA nodes that have no sibling nodes are to be put in DOM tree. In some cases application needs to parse the whitespace-only contents of nodes, i.e. `<node> </node>`, but is not interested in whitespace markup elsewhere. It is possible to use <<parse_ws_pcdata,parse_ws_pcdata>> flag in this case, but it results in excessive allocations and complicates document processing; this flag can be used to avoid that. As an example, after parsing XML string `<node> <a> </a> </node>` with `parse_ws_pcdata_single` flag set, `<node>` element will have one child `<a>`, and `<a>` element will have one child with type <<node_pcdata,node_pcdata>> and value `" "`. This flag has no effect if <<parse_ws_pcdata,parse_ws_pcdata>> is enabled. This flag is *off* by default. +* [[parse_embed_pcdata]]`parse_embed_pcdata` determines if PCDATA contents is to be saved as element values. Normally element nodes have names but not values; this flag forces the parser to store the contents as a value if PCDATA is the first child of the element node (otherwise PCDATA node is created as usual). This can significantly reduce the memory required for documents with many PCDATA nodes. To retrieve the data you can use `xml_node::value()` on the element nodes or any of the higher-level functions like `child_value` or `text`. This flag is *off* by default. +Since this flag significantly changes the DOM structure it is only recommended for parsing documents with many PCDATA nodes in memory-constrained environments. This flag is *off* by default. + * [[parse_fragment]]`parse_fragment` determines if document should be treated as a fragment of a valid XML. Parsing document as a fragment leads to top-level PCDATA content (i.e. text that is not located inside a node) to be added to a tree, and additionally treats documents without element nodes as valid. This flag is *off* by default. CAUTION: Using in-place parsing (<<xml_document::load_buffer_inplace,load_buffer_inplace>>) with `parse_fragment` flag may result in the loss of the last character of the buffer if it is a part of PCDATA. Since PCDATA values are null-terminated strings, the only way to resolve this is to provide a null-terminated buffer as an input to `load_buffer_inplace` - i.e. `doc.load_buffer_inplace("test\0", 5, pugi::parse_default | pugi::parse_fragment)`. @@ -2611,6 +2614,7 @@ const unsigned int +++<a href="#parse_pi">parse_pi</a>+++ const unsigned int +++<a href="#parse_trim_pcdata">parse_trim_pcdata</a>+++ const unsigned int +++<a href="#parse_ws_pcdata">parse_ws_pcdata</a>+++ const unsigned int +++<a href="#parse_ws_pcdata_single">parse_ws_pcdata_single</a>+++ +const unsigned int +++<a href="#parse_embed_pcdata">parse_embed_pcdata</a>+++ const unsigned int +++<a href="#parse_wconv_attribute">parse_wconv_attribute</a>+++ const unsigned int +++<a href="#parse_wnorm_attribute">parse_wnorm_attribute</a>+++ ---- |