Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css3] Grammar very slow due to numerous ambiguities #4256

Open
kaby76 opened this issue Sep 23, 2024 · 0 comments
Open

[css3] Grammar very slow due to numerous ambiguities #4256

kaby76 opened this issue Sep 23, 2024 · 0 comments

Comments

@kaby76
Copy link
Contributor

kaby76 commented Sep 23, 2024

See https://groups.google.com/g/antlr-discussion/c/8KQC7fwcXpQ

Using the latest ANTLR 4.13.2, with the official example repository's CSS3 grammar (https://github.com/antlr/grammars-v4/tree/master/css3), I generated a C++ parser to parse a simple CSS file of about 200 lines. Without adding any custom listener logic, on an M1 MacBook Pro, it takes 1 second for the first run! I'll consider that as a warm-up, but even after that, it takes 0.14 seconds each time! In contrast, my own homemade approach relying purely on string find only takes 0.002 seconds. Why is there such a performance difference? Is it because I'm using it incorrectly?

As with many of the grammars in the grammars-v4 repo, you never know what to expect.

The problem is that the css3 grammar is ambiguous. To debug the grammar, you can use "trperf" for a quick explanation of the ambiguity or "trparse --ambig" for a slow but detailed explanation of the ambiguity. These tools are from the Trash toolkit. https://github.com/kaby76/Trash. For "trparse --ambig", I had to pare down the input to a few hundred lines, otherwise it would take too long.

To name a few ambiguities, we have Decisions 55 in "ruleSet", 60 in "declaration", 98 in "fontFaceDeclaration", and 112 in "ws".

"ws" is particularly egregious because it is the most time-consuming expression. Using "trparse --ambig", and Bash diff, I found these rules to be a problem.

expr
: term (operator_? term)*
;

term
: number ws # knownTerm
| percentage ws # knownTerm
| dimension ws # knownTerm
| String_ ws # knownTerm
| UnicodeRange ws # knownTerm
| ident ws # knownTerm
| var_ # knownTerm
| url ws # knownTerm
| hexcolor # knownTerm
| calc # knownTerm
| function_ # knownTerm
| unknownDimension ws # unknownTerm
| dxImageTransform # badTerm
;

operator_
: '/' ws # goodOperator
| Comma ws # goodOperator
| Space ws # goodOperator
| '=' ws # badOperator // IE filter and DXImageTransform function
;

For partial input

abbr[title] {
border-bottom: 1px dotted;
}

we see two parses for "1px",

examples/xxx.css.112: (expr (term (dimension (Dimension "1px")) (ws (Space " "))) (term (ident (Ident "dotted")) (ws)))
examples/xxx.css.112: (expr (term (dimension (Dimension "1px")) (ws)) (operator_ (Space " ") (ws)) (term (ident (Ident "dotted")) (ws)))

(dotnet trparse -- --ambig examples/xxx.css | dotnet trxgrep -- ' //expr[.//Dimension[text()="1px"]]' | dotnet trtree -- -a)

A space should not be an operator!

Ken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant