Age | Commit message (Collapse) | Author |
|
This makes code more consistent between wchar/utf8 mode.
|
|
Extra argument 'hint' is used to start the attribute lookup; if the attribute
is not found the lookup is restarted from the beginning of the attriubte list.
This allows to optimize attribute lookups if you need to get many attributes
from the node and can make assumptions about the likely ordering. The code is
correct regardless of the order, but it is faster than using vanilla lookups
if the order matches the calling order.
Fixes #30.
|
|
auto_deleter is now used in all modes so we can't exclude it from compilation.
|
|
|
|
|
|
|
|
Now compact_string matches compact_pointer_parent.
Turns out PUGI__UNLIKELY is good at reordering conditions but usually does not
really affect performance. Since MSVC should treat "if" branches as taken and
does not support branch probabilities, don't use them if we don't need to.
|
|
|
|
Instead of checking if the object being removed allocated a marker, mark the
marker block as deleted immediately upon allocation. This simplifies the logic
and prevents extra markers from being inserted if we allocate/deallocate the
same node indefinitely.
Also change marker pointer type to uint32_t*.
|
|
First assignment uses a fast path; second assignment uses a specialized
path as well.
|
|
When we deallocate nodes/attributes that allocated the marker we have to
adjust the size accordingly, and dismiss the marker in case it gets
overwritten with something else...
|
|
Header is now just 2 bytes, with optional additonal 4 bytes that are only
allocated for every 85 nodes / 128 attributes.
|
|
|
|
|
|
This temporarily increases the node size to 16 bytes - we'll bring it back.
It allows us to remove the horrible node_pi hack and to reduce the amount of
changes against master. This comes at the price of not decreasing basline
xml_node_struct size.
The compact xml_node_struct is also increased by this change but a followup
change will reduce *both* xml_attribute_struct and xml_node_struct (to 8/12
bytes).
|
|
Split a long line into multiple statements.
|
|
Also remove useless comments.
|
|
|
|
|
|
We used this in two cases - to get the page pointer and to test flags.
We now use PUGI__GETPAGE for getting the page pointer and operator& to test
flags - this makes getting node type significantly faster since it does not
require page pointer reconstruction.
|
|
|
|
|
|
Clarify the offset applied when encoding the pointer difference.
Make decoding diff slightly more clear - no effect on performance.
Adjust branch weighting in compact_string encoding - 0.5% faster.
Use uint16_t in compact_pointer_parent - 2% faster.
|
|
Make sure compact_hash_table::rehash() is not inlined - that way reserve() is
inlined so the fast path has no extra function calls.
Also use subtraction instead of multiplication when checking capacity.
|
|
|
|
xpath_query, xpath_node_set and xpath_variable_set are now moveable.
This is a nice performance optimization for variable/node sets, and enables
storing xpath_query in containers without using pointers (it's only possible
now since the query is not copyable).
|
|
|
|
xpath_variable_set is essentially an associative container; it's about time it
became copyable.
Implementation is slightly tricky due to out of memory handling. Both copy ctor
and assignment operator have strong exception guarantee (even if exceptions are
disabled! which translates to "roll back on allocation errors").
|
|
The type of the variable is now initialized correctly in the ctor, so that there
is no interim invalid state.
|
|
Since the type of the set was updated before assignment, assigning in
out-of-memory condition could change the type to not match the content.
|
|
If xml_writer::write throws an exception while being called from flush(), the
exception is thrown from destructor. Clang in C++11 mode calls std::terminate
in this case.
|
|
Fix code style and revert redundant parameters/whitespace changes.
Also remove format_each_attribute_on_new_line - we're only introducing one
extra formatting flag. The flag implies format_indent but does not include its
bitmask.
Also add a few more tests.
Fixes #14.
|
|
|
|
|
|
Also fix test in wchar_t mode.
|
|
Ensure that all the necessary cleanup is performed in case the allocation fails
with an exception - files are closed, buffers are reclaimed, etc.
Any test that triggers a simulated out-of-memory condition is ran once again
with a throwing allocation function. Unobserved std::bad_alloc count as test
failures and require CHECK_ALLOC_FAIL macro.
Fixes #17.
|
|
|
|
|
|
Previously attributes that were copied with their node used string sharing,
but standalone attributes that were copied using xml_node::*_copy(xml_attribute)
were not.
|
|
Instead of reallocating the string for every tree level just do two passes
over the ancestor chain.
|
|
as_utf8_end was used with std::string, where writing an extra zero-terminating
character should *probably* always work (at least if size is positive) but is
not ideal.
The only place that needed to zero-terminate was convert_path_heap.
|
|
When parsing XPath variables, we need to perform a heap allocation; if it
fails, an xpath_exception instead of bad_alloc used to be thrown.
Now we throw the exception of a correct type so that xpath_exception means
'parsing error'.
|
|
|
|
|
|
|
|
Previously we omitted extra whitespace for single PCDATA/CDATA children, but in
mixed content there was extra indentation before/after text nodes.
One of the problems with that is that the text that you saved is not exactly
the same as the parsing result using default flags (parse_trim_pcdata helps).
Another problem is that parse-format cycles do not have a fixed point for mixed
content - the result expands indefinitely. Some XML libraries, like Python
minidom, have the same issue, but this is definitely a problem.
Pretty-printing mixed content is hard. It seems that the only other sensible
choice is to switch mixed content nodes to raw formatting. In a way the code in
this change is a weaker version of that - it removes indentation around text
nodes but still keeps it around element siblings/children.
Thus we can switch to mixed-raw formatting at some point later, which will be
a superset of the current behavior.
To do this we have to either switch at the first text node (.NET XmlDocument
does that), or scan the children of each element for a possible text node and
switch before we output the first child.
The former behavior seems non-intuitive (and a bit broken); unfortunately, the
latter behavior can cost up to 20% of the output time for trees *without* mixed
content.
Fixes #13.
|
|
|
|
|
|
Since in compact mode we only ever have a guaranteed alignment on 4, the pages
are limited to 256k even if pointers are 64 bit.
|
|
|