Java: Improving error recovery by doing a look-ahead match counting? #4530

MarkARussell · 2024-02-14T19:11:37Z

MarkARussell
Feb 14, 2024

I'm trying to improve the error recovery handling for my particular use case, and one of the things I'm attempting to do is figure out what the "best" recovery action would be at any given time by calculating the number of tokens that would match if I took a particular recovery action. For example, I calculate the number of matches I might be able to achieve if I inserted a "missing" token, deleted an "extraneous" token, or even substituted an "invalid" token. Then using these counts, I select the one with the highest match count as the recover action to perform.

My current code is based on recursively evaluating the ATNState#getTransitions(), similar to the limited token look ahead the DefaultErrorStrategy uses. While it mostly works, I don't think I have it 100% correct, guessing something is off around epsilon/rule transitions.

At the moment I'm only using this logic in the recoverInline() function of my custom error strategy. However I'd like to be able to do something similar in sync(), which is a much more complex case. Due to the current short comings of the code, I haven't yet tried integrating with sync().

Code:

    int getBestMatchCount(
        int tokenIndex, ATNState stateToEval, int currentDepth,
        ATN atn, ParserRuleContext ruleContext, TokenStream tokenStream) {

        int tokenType = tokenStream.LA(tokenIndex);
        IntervalSet expectingAtState = atn.nextTokens(stateToEval, ruleContext);
        if (!expectingAtState.contains(tokenType)) {
            return 0; // No more matches.  Return current count
        }

        // Increment the token index (so when we recurse we eval the next token)
        int matchCount = 1;
        tokenIndex++;

        // If we haven't hit max depth, loop through each transition available from our current state and recurse
        if (currentDepth <= RECOVERY_EVAL_MAX_DEPTH && tokenType != Token.EOF) {
            currentDepth++; // Increment current depth so we have an ending

            var bestCount = 0;
            for (var transition : stateToEval.getTransitions()) {
                int transitionMatchCount = getBestMatchCount(
                        tokenIndex, transition.target, currentDepth, atn, ruleContext, tokenStream);
                bestCount = Math.max(bestCount, transitionMatchCount);
            }
            matchCount += bestCount;
        }
        return matchCount;
    }

Two Questions:

Are there any API's in the ANTLR Java runtime that would make calculations of this sort easier or more straight forward?
Is it even possible to do something like this in sync()? Not real clear on how I would insert a missing/invalid token.

In case it's relevant, my use case is for parsing textual aviation notices, AIRMET's specifically. These tend to be short with fairly well defined format, and with zero chance of fixing any errors as they are produced by various goverment agencies around the world. Further, I fully realize that what I suggest above has a performance cost, but given the shortness and relative frequency, I'm pretty sure the cost will be doable.

Example bulletin:

    WAIY32 LIIB 190526
    LIRR AIRMET 5 VALID 190530/190830 LIIB-
    LIRR ROMA FIR MOD TURB FCST ENTIRE FIR SFC/FL150 STNR NC

Thanks for any assistance you can provide

jimidle · 2024-02-15T17:12:01Z

jimidle
Feb 15, 2024

I am not sure if this is specifically to do with ANTLR5?

However, a good way to get your head around this is not override the recovery strategy, then debug what happens in the runtime "as is" for various scenarios. You can also see that there are other strategies you can install. You shoudl also be able to see how the follow sets are used and stacked. That came out of me not wanting to drop out of an inner repeating rule because of a single missing or extra token in the JavaFX parser in V3. Ter then put that into v4 with supporting methods.

Then try other strategies and your own custom strategies. Following in the debugger is a great way to see what methods the existing strategies use to support their actions.

It looks like a custom strategy for AIRMET is a doable thing. Don't worry about performance in error recovery - that's already out the window by that point anyway :)

1 reply

MarkARussell Feb 15, 2024
Author

Yeah this doesn't have anything to do with ANTLR5. I originally created it as an ANTLR4 Q&A, but I think someone may have moved it thinking I was looking for a code change or enhancement to ANTLR

I did start with the DefaultErrorStrategy and ran into a number of cases where inserting a "missing" token was more appropriate then the delete action it was performing, and other cases where substituting an error token was a better solution than either the missing or delete actions. That's what started me down this path of scanning ahead to see which recovery approach was "best".

When you talk about follow sets, are you talking about the followState on RuleTransition? Used within DefaultErrorStrategy#getErrorRecoverySet() for building the recovery set?

As to other ErrorStrategy's, the only other one I'm aware of is BailErrorStrategy and that one is useless for my use case as i need to maximize the content I can extract from AIRMET even in the face of errors. Are there other ErrorStrategy implementations somewhere that I'm not aware of? (didn't see any others in the antlr java runtime)

Last, do you know how i might insert a missing token if/when i detect it in the sync() function. Is it as simple as ctx.addErrorNode(createErrorNode(_ctx,t))? (at least that's what it looks like the Parser is doing)

ericvergnaud · 2024-02-16T16:38:33Z

ericvergnaud
Feb 16, 2024
Maintainer

Moved back to antlr4 as requested, thanks for clarifying.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java: Improving error recovery by doing a look-ahead match counting? #4530

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Java: Improving error recovery by doing a look-ahead match counting? #4530

MarkARussell Feb 14, 2024

Replies: 2 comments · 1 reply

jimidle Feb 15, 2024

MarkARussell Feb 15, 2024 Author

ericvergnaud Feb 16, 2024 Maintainer

MarkARussell
Feb 14, 2024

Replies: 2 comments 1 reply

jimidle
Feb 15, 2024

MarkARussell Feb 15, 2024
Author

ericvergnaud
Feb 16, 2024
Maintainer