Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and speed improvement #3

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 26 additions & 3 deletions rapidxml.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -657,6 +657,7 @@ namespace rapidxml
xml_base()
: m_name(0)
, m_value(0)
, m_value_size(0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code deliberately left m_value_size uninitialized (it's only accessed if m_value != nullptr)

, m_parent(0)
{
}
Expand Down Expand Up @@ -689,7 +690,7 @@ namespace rapidxml
//! <br><br>
//! Use value_size() function to determine length of the value.
//! \return Value of node, or empty string if node has no value.
Ch *value() const
const Ch *value() const
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least have a non-cost value() overload that returns a non-const Ch*

(Even then I'm not all too happy about breaking the public interface, especially since iterators etc. only return const nodes/attributes.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not good idea to change the value and there is always static_cast or some dirty tricks to break that rule ;-)

{
return m_value ? m_value : nullstr();
}
Expand Down Expand Up @@ -914,10 +915,19 @@ namespace rapidxml
xml_node(node_type type)
: m_type(type)
, m_first_node(0)
, m_last_node(0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. This was left uninitialized deliberately.

, m_first_attribute(0)
, m_last_attribute(0)
{
}

void resetRest() {
m_last_attribute = NULL;
m_prev_sibling = NULL;
m_next_sibling = NULL;
this->m_name_size = 0;
}

///////////////////////////////////////////////////////////////////////////
// Node data access

Expand Down Expand Up @@ -1395,6 +1405,8 @@ namespace rapidxml
{
assert(text);

this->resetRest();

// Remove current contents
this->remove_all_nodes();
this->remove_all_attributes();
Expand Down Expand Up @@ -1597,6 +1609,7 @@ namespace rapidxml
// Use translation skip
Ch *src = text;
Ch *dest = src;
bool shorter = false;
while (StopPred::test(*src))
{
// If entity translation is enabled
Expand All @@ -1615,13 +1628,15 @@ namespace rapidxml
*dest = Ch('&');
++dest;
src += 5;
shorter = true;
continue;
}
if (src[2] == Ch('p') && src[3] == Ch('o') && src[4] == Ch('s') && src[5] == Ch(';'))
{
*dest = Ch('\'');
++dest;
src += 6;
shorter = true;
continue;
}
break;
Expand All @@ -1633,6 +1648,7 @@ namespace rapidxml
*dest = Ch('"');
++dest;
src += 6;
shorter = true;
continue;
}
break;
Expand All @@ -1644,6 +1660,7 @@ namespace rapidxml
*dest = Ch('>');
++dest;
src += 4;
shorter = true;
continue;
}
break;
Expand All @@ -1655,6 +1672,7 @@ namespace rapidxml
*dest = Ch('<');
++dest;
src += 4;
shorter = true;
continue;
}
break;
Expand Down Expand Up @@ -1723,6 +1741,11 @@ namespace rapidxml
*dest++ = *src++;

}
if(shorter) {
// Add terminating zero after decoded shorter value 2
if (!(Flags & parse_no_string_terminators))
*dest = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe the issue this is supposed to fix?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests for character references in 3ade903.
To me, everything (nul-terminated strings & length) seems to be just fine. Do you have a test-case that shows your issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another bad feature - default unescape affect also ex. CR & LF (&#13; &#10; &#xA; &#xD;) so in multi-platform environment you cannot process XML without unrecoverable CR/LF mismatch sometimes...

}

// Return new end
text = src;
Expand Down Expand Up @@ -1930,7 +1953,7 @@ namespace rapidxml
if (!(Flags & parse_no_string_terminators))
{
pi->name()[pi->name_size()] = Ch('\0');
pi->value()[pi->value_size()] = Ch('\0');
const_cast<Ch *>(pi->value())[pi->value_size()] = Ch('\0');
}

text += 2; // Skip '?>'
Expand Down Expand Up @@ -2309,7 +2332,7 @@ namespace rapidxml

// Add terminating zero after value
if (!(Flags & parse_no_string_terminators))
attribute->value()[attribute->value_size()] = 0;
const_cast<Ch *>(attribute->value())[attribute->value_size()] = 0;

// Skip whitespace after attribute value
skip<whitespace_pred, Flags>(text);
Expand Down