You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current iteration of the analyzer's parser is a hand-written, recursive descent parser that "recovers" on errors instead of failing at the very first one that it encounters, which is a different approach from what the mainline compiler as well as the PureScript-based parser is doing. Recovery basically allows the parser to parse as much information as possible such that the semantic analyzer core can have as much surface area from the source file to work with.
One of the challenges we're seeing in the parser currently is loop safety, specifically for nodes that require repeated parsing of elements. For instance, as of a9cb011, the following declaration fails to parse with an infinite loop error (explanation):
b | if0else1 then 2 = 0
For a moment, let's take a step back and think about what the parser does:
It consumes atoms (i.e. tokens)
It can build nodes (i.e. groups of tokens and nodes)
For unexpected tokens, it can yield an Error node which consumes the incorrect token and marks it with an error message saying what was expected. This is effectively recovery at the atom level.
For nodes, we currently don't have a concrete solution. I'll be talking more about possible ones in the next section.
Nodes can be further subdivided into two kinds:
Composite nodes
Container nodes
Container nodes are composite nodes but composite nodes are not container nodes. More specifically, composite nodes are collections of tokens and more nodes, while container nodes have the additional property of having an unbounded (read: bounded by RAM) number of children.
IfThenElse is an example of a composite node: it contains the tokens for the keywords, as well as the accompanying nodes for the expression.
Import/export lists, declaration lists, variable lists, arrays expressions, record expressions, etc. are examples of container nodes. They all involve repeated parsing of nodes until a condition is met, which is usually when a "terminating" token is seen (e.g. ] for arrays, etc.)
One of the problems we currently have with container nodes is how they handle "recovery" for when items fail to parse. The example I mentioned earlier demonstrates this: while parsing the list of guarded expressions, the parser fails to parse the ternary, which puts the parser in a state where it gets stuck on a single token.
An ideal approach to recovery here would involve "attempting" to parse the item. If the attempt succeeds, then no further action is needed; if the attempt fails, then the parser consumes tokens until the next nearest position where it can confidently begin parsing again.
In summary:
The parser has the mechanisms to perform recovery for unexpected tokens and march on with parsing.
Container nodes are currently parsed aggressively without recovery for nodes.
A "safe" parser would involve some degree of backtracking and recovery for items in container nodes.
The text was updated successfully, but these errors were encountered:
Great write up! I think the elm parser does something similar - it jumps to the next "start of line" expression and carries on from there. So:
a = [ 1, 2, 3, 4 ] ] ] ] ]
^ - error, we recover by skipping until we find another top-level declaration
# another function
>>= and stuff
b = 1
^ - Great! Now we know we're at the start of a definition so we can continue from here.
The current iteration of the analyzer's parser is a hand-written, recursive descent parser that "recovers" on errors instead of failing at the very first one that it encounters, which is a different approach from what the mainline compiler as well as the PureScript-based parser is doing. Recovery basically allows the parser to parse as much information as possible such that the semantic analyzer core can have as much surface area from the source file to work with.
One of the challenges we're seeing in the parser currently is loop safety, specifically for nodes that require repeated parsing of elements. For instance, as of a9cb011, the following declaration fails to parse with an infinite loop error (explanation):
For a moment, let's take a step back and think about what the parser does:
Error
node which consumes the incorrect token and marks it with an error message saying what was expected. This is effectively recovery at the atom level.Nodes can be further subdivided into two kinds:
Container nodes are composite nodes but composite nodes are not container nodes. More specifically, composite nodes are collections of tokens and more nodes, while container nodes have the additional property of having an unbounded (read: bounded by RAM) number of children.
IfThenElse
is an example of a composite node: it contains the tokens for the keywords, as well as the accompanying nodes for the expression.Import/export lists, declaration lists, variable lists, arrays expressions, record expressions, etc. are examples of container nodes. They all involve repeated parsing of nodes until a condition is met, which is usually when a "terminating" token is seen (e.g.
]
for arrays, etc.)One of the problems we currently have with container nodes is how they handle "recovery" for when items fail to parse. The example I mentioned earlier demonstrates this: while parsing the list of guarded expressions, the parser fails to parse the ternary, which puts the parser in a state where it gets stuck on a single token.
An ideal approach to recovery here would involve "attempting" to parse the item. If the attempt succeeds, then no further action is needed; if the attempt fails, then the parser consumes tokens until the next nearest position where it can confidently begin parsing again.
In summary:
The text was updated successfully, but these errors were encountered: