Skip to content

Commit

Permalink
Merge pull request #431 from biojppm/post_newparser
Browse files Browse the repository at this point in the history
newparser: post-merge fixes
  • Loading branch information
biojppm committed May 21, 2024
2 parents 1f8a10d + 76ece7d commit 6c19ea9
Show file tree
Hide file tree
Showing 21 changed files with 1,361 additions and 309 deletions.
31 changes: 27 additions & 4 deletions changelog/current.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,36 @@
**All the changes described come from a single PR: [#PR414](https://github.com/biojppm/rapidyaml/pull/414).**
Most of the changes are from the giant Parser refactor described below. Before getting to that, a couple of other minor changes first.


### Fixes

- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - Emitter: prevent stack overflows when emitting malicious trees by providing a max tree depth for the emit visitor. This was done by adding an `EmitOptions` structure as an argument both to the emitter and to the emit functions, which is then forwarded to the emitter. This `EmitOptions` structure has a max tree depth setting with a default value of 64.
- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - Fix `_RYML_CB_ALLOC()` using `(T)` in parenthesis, making the macro unusable.


### New features

- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - append-emitting to existing containers in the `emitrs_` functions, suggested in [#345](https://github.com/biojppm/rapidyaml/issues/345). This was achieved by adding a `bool append=false` as the last parameter of these functions.
- [#PR431](https://github.com/biojppm/rapidyaml/pull/431) - add depth query methods:
```cpp
Tree::depth_asc(id_type) const; // O(log(num_tree_nodes)) get the depth of a node ascending (ie, from root to node)
Tree::depth_desc(id_type) const; // O(num_tree_nodes) get the depth of a node descending (ie, from node to deep-most leaf node)
ConstNodeRef::depth_asc() const; // likewise
ConstNodeRef::depth_desc() const;
NodeRef::depth_asc() const;
NodeRef::depth_desc() const;
```
------
All other changes come from [#PR414](https://github.com/biojppm/rapidyaml/pull/414).
### Parser refactor
The parser was completely refactored ([#PR414](https://github.com/biojppm/rapidyaml/pull/414)). This was a large and hard job carried out over several months, and the result is:
The parser was completely refactored ([#PR414](https://github.com/biojppm/rapidyaml/pull/414)). This was a large and hard job carried out over several months, but it brings important improvements.
- A new event-based parser engine is now in place, enabling the improvements described below. This engine uses a templated event handler, where each event is a function call, which spares branches on the event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
- The new parser is an event-based parser, based on an event dispatcher engine. This engine is templated on event handler, where each event is a function call, which spares branches on the event handler. The parsing code was fully rewritten, and is now much more simple (albeit longer), and much easier to work with and fix.
- YAML standard-conformance was improved significantly. Along with many smaller fixes and additions, (too many to list here), the main changes are the following:
- The parser engine can now successfully parse container keys, emitting all the events in the correct , **but** as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by `EventHandlerTree`). For an example of a handler which can accomodate key containers, see the one which is used for the test suite at `test/test_suite/test_suite_event_handler.hpp`
- The parser engine can now successfully parse container keys, emitting all the events in correctly, **but** as before, the ryml tree cannot accomodate these (and this constraint is no longer enforced by the parser, but instead by `EventHandlerTree`). For an example of a handler which can accomodate key containers, see the one which is used for the test suite at `test/test_suite/test_suite_event_handler.hpp`
- Anchor keys can now be terminated with colon (eg, `&anchor: key: val`), as dictated by the standard.
- The parser engine can now be used to create native trees in other programming languages, or in cases where the user *must* have container keys.
- Performance of both parsing and emitting improved significantly; see some figures below.
Expand Down
2 changes: 1 addition & 1 deletion src/c4/yml/common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@ do \
} \
} while(0)
#define _RYML_CB_ALLOC_HINT(cb, T, num, hint) (T*) (cb).m_allocate((num) * sizeof(T), (hint), (cb).m_user_data)
#define _RYML_CB_ALLOC(cb, T, num) _RYML_CB_ALLOC_HINT((cb), (T), (num), nullptr)
#define _RYML_CB_ALLOC(cb, T, num) _RYML_CB_ALLOC_HINT((cb), T, (num), nullptr)
#define _RYML_CB_FREE(cb, buf, T, num) \
do { \
(cb).m_free((buf), (num) * sizeof(T), (cb).m_user_data); \
Expand Down
61 changes: 29 additions & 32 deletions src/c4/yml/emit.def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,28 +29,13 @@ substr Emitter<Writer>::emit_as(EmitType_e type, Tree const& t, id_type id, bool
if(type == EMIT_YAML)
_emit_yaml(id);
else if(type == EMIT_JSON)
_do_visit_json(id);
_do_visit_json(id, 0);
else
_RYML_CB_ERR(m_tree->callbacks(), "unknown emit type");
m_tree = nullptr;
return this->Writer::_get(error_on_excess);
}

template<class Writer>
substr Emitter<Writer>::emit_as(EmitType_e type, Tree const& t, bool error_on_excess)
{
if(t.empty())
return {};
return this->emit_as(type, t, t.root_id(), error_on_excess);
}

template<class Writer>
substr Emitter<Writer>::emit_as(EmitType_e type, ConstNodeRef const& n, bool error_on_excess)
{
_RYML_CB_CHECK(n.tree()->callbacks(), n.readable());
return this->emit_as(type, *n.tree(), n.id(), error_on_excess);
}


//-----------------------------------------------------------------------------

Expand Down Expand Up @@ -82,7 +67,7 @@ void Emitter<Writer>::_emit_yaml(id_type id)
this->Writer::_do_write(":\n");
++ilevel;
}
_do_visit_block_container(id, ilevel, ilevel);
_do_visit_block_container(id, 0, ilevel, ilevel);
return;
}
}
Expand Down Expand Up @@ -256,13 +241,15 @@ void Emitter<Writer>::_write_doc(id_type id)
}

template<class Writer>
void Emitter<Writer>::_do_visit_flow_sl(id_type node, id_type ilevel)
void Emitter<Writer>::_do_visit_flow_sl(id_type node, id_type depth, id_type ilevel)
{
const bool prev_flow = m_flow;
m_flow = true;
RYML_ASSERT(!m_tree->is_stream(node));
RYML_ASSERT(m_tree->is_container(node) || m_tree->is_doc(node));
RYML_ASSERT(m_tree->is_root(node) || (m_tree->parent_is_map(node) || m_tree->parent_is_seq(node)));
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");

if(m_tree->is_doc(node))
{
Expand Down Expand Up @@ -345,7 +332,7 @@ void Emitter<Writer>::_do_visit_flow_sl(id_type node, id_type ilevel)
else
{
// with single-line flow, we can never go back to block
_do_visit_flow_sl(child, ilevel + 1);
_do_visit_flow_sl(child, depth + 1, ilevel + 1);
}
}

Expand All @@ -363,19 +350,25 @@ void Emitter<Writer>::_do_visit_flow_sl(id_type node, id_type ilevel)
C4_SUPPRESS_WARNING_MSVC_WITH_PUSH(4702) // unreachable error, triggered by flow_ml not implemented

template<class Writer>
void Emitter<Writer>::_do_visit_flow_ml(id_type id, id_type ilevel, id_type do_indent)
void Emitter<Writer>::_do_visit_flow_ml(id_type id, id_type depth, id_type ilevel, id_type do_indent)
{
C4_UNUSED(id);
C4_UNUSED(depth);
C4_UNUSED(ilevel);
C4_UNUSED(do_indent);
c4::yml::error("not implemented");
#ifdef THIS_IS_A_WORK_IN_PROGRESS
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");
const bool prev_flow = m_flow;
m_flow = true;
c4::yml::error("not implemented");
// do it...
m_flow = prev_flow;
#endif
}

template<class Writer>
void Emitter<Writer>::_do_visit_block_container(id_type node, id_type level, bool do_indent)
void Emitter<Writer>::_do_visit_block_container(id_type node, id_type depth, id_type level, bool do_indent)
{
if(m_tree->is_seq(node))
{
Expand All @@ -397,19 +390,19 @@ void Emitter<Writer>::_do_visit_block_container(id_type node, id_type level, boo
{
_indent(level, do_indent);
this->Writer::_do_write("- ");
_do_visit_flow_sl(child, 0u);
_do_visit_flow_sl(child, depth+1, 0u);
this->Writer::_do_write('\n');
}
else if(ty.is_flow_ml())
{
_indent(level, do_indent);
this->Writer::_do_write("- ");
_do_visit_flow_ml(child, 0u, do_indent);
_do_visit_flow_ml(child, depth+1, 0u, do_indent);
this->Writer::_do_write('\n');
}
else
{
_do_visit_block(child, level, do_indent); // same level
_do_visit_block(child, depth+1, level, do_indent); // same indentation level
}
}
do_indent = true;
Expand All @@ -436,18 +429,18 @@ void Emitter<Writer>::_do_visit_block_container(id_type node, id_type level, boo
if(ty.is_flow_sl())
{
_indent(level, do_indent);
_do_visit_flow_sl(ich, 0u);
_do_visit_flow_sl(ich, depth+1, 0u);
this->Writer::_do_write('\n');
}
else if(ty.is_flow_ml())
{
_indent(level, do_indent);
_do_visit_flow_ml(ich, 0u);
_do_visit_flow_ml(ich, depth+1, 0u);
this->Writer::_do_write('\n');
}
else
{
_do_visit_block(ich, level, do_indent); // same level!
_do_visit_block(ich, depth+1, level, do_indent); // same level!
}
} // keyval vs container
do_indent = true;
Expand All @@ -456,11 +449,13 @@ void Emitter<Writer>::_do_visit_block_container(id_type node, id_type level, boo
}

template<class Writer>
void Emitter<Writer>::_do_visit_block(id_type node, id_type ilevel, id_type do_indent)
void Emitter<Writer>::_do_visit_block(id_type node, id_type depth, id_type ilevel, id_type do_indent)
{
RYML_ASSERT(!m_tree->is_stream(node));
RYML_ASSERT(m_tree->is_container(node) || m_tree->is_doc(node));
RYML_ASSERT(m_tree->is_root(node) || (m_tree->parent_is_map(node) || m_tree->parent_is_seq(node)));
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");
if(m_tree->is_doc(node))
{
_write_doc(node);
Expand Down Expand Up @@ -537,16 +532,18 @@ void Emitter<Writer>::_do_visit_block(id_type node, id_type ilevel, id_type do_i
if(m_tree->is_root(node) || m_tree->is_doc(node))
next_level = ilevel; // do not indent at top level

_do_visit_block_container(node, next_level, do_indent);
_do_visit_block_container(node, depth, next_level, do_indent);
}

C4_SUPPRESS_WARNING_MSVC_POP


template<class Writer>
void Emitter<Writer>::_do_visit_json(id_type id)
void Emitter<Writer>::_do_visit_json(id_type id, id_type depth)
{
_RYML_CB_CHECK(m_tree->callbacks(), !m_tree->is_stream(id)); // JSON does not have streams
if(C4_UNLIKELY(depth > m_opts.max_depth()))
_RYML_CB_ERR(m_tree->callbacks(), "max depth exceeded");
if(m_tree->is_keyval(id))
{
_writek_json(id);
Expand Down Expand Up @@ -574,7 +571,7 @@ void Emitter<Writer>::_do_visit_json(id_type id)
{
if(ich != m_tree->first_child(id))
this->Writer::_do_write(',');
_do_visit_json(ich);
_do_visit_json(ich, depth+1);
}

if(m_tree->is_seq(id))
Expand Down
Loading

0 comments on commit 6c19ea9

Please sign in to comment.