-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix errors in the tokenizer and tree construction tests. #136
Commits on Jun 26, 2021
-
-- in a comment isn't an error
`<!-- -- -->` is not an error. The middle `--` cause the tokenizer to enter the comment end state. The "anything else" clause appends two `-` to the comment token's data and the current input character is reconsumed in the comment state.
Configuration menu - View commit details
-
Copy full SHA for 5a7149f - Browse repository at this point
Copy the full SHA 5a7149fView commit details -
Starting from the data state, we have 1. `<` switches to the tag open state 2. `!` switches to the markup declaration open state 3. `--` switches to the comment start state 4. `-` (the third one) switches to the comment start dash state 5. `-` (the fourth one) switches to the comment end state 6. `-` (the fifth one) appends `-` to the comment and does not change state 7. `>` emits the comment token and switches to the data state.
Configuration menu - View commit details
-
Copy full SHA for c70ef7d - Browse repository at this point
Copy the full SHA c70ef7dView commit details -
Add DOCTYPE errors (and remove one)
A DOCTYPE token is an error in one of three cases: 1. The token's name is not `html`; 2. The token's public identifier is not missing; 3. The token's system identifier is not missing and the token's system identifier isn't `about:legacy-compat`. This appears to have changed at some point from a much more complex set of conditions.
Configuration menu - View commit details
-
Copy full SHA for 4c2b23a - Browse repository at this point
Copy the full SHA 4c2b23aView commit details -
A `>` after `DOCTYPE` is a missing-doctype-name parse error but it is not also a missing-whitespace-before-doctype-name parse error. Doctypes with a public identifier is a (currently unnamed) parse error.
Configuration menu - View commit details
-
Copy full SHA for 4022182 - Browse repository at this point
Copy the full SHA 4022182View commit details -
Named character references in attributes whose last character is not `;` and for which the next input character is `=` (or ASCII alphanumeric, but this isn't tested here), flushes the code points consumed as a character reference _without_ adding a parse error. Named character references not in attributes whose last character is not `;` are errors, regardless of the following character as noted in the `#new-errors` section but without an entry in `#errors`, the number of errors are wrong. (See html5lib#107). Separately, this adds the missing expected-doctype-but-got-start-tag error.
Configuration menu - View commit details
-
Copy full SHA for 607e334 - Browse repository at this point
Copy the full SHA 607e334View commit details -
<!doctype html><script><!
has only one errorHere are the (abbreviated) steps (intermingling the tree-construction and tokenizer). 1. `<script>` token causes `html` and `head` elements to be inserted and is processed in the "in head" insertion mode 2. `<script>` token switches the tokenizer to the script data state and switches to the "text" insertion mode 3. `<` switches to the script data less-than sign state 4. `!` switches to the script data escape start state and emits `<!` 5. EOF is reconsumed in the script data state 6. EOF emits an EOF token 7. EOF token (in the "text" insertion mode) is a parse error, `<script>` is popped off the stack of open elements, switches back to the "in head" insertion mode and reprocesses the token 8. EOF token (in "in head") triggers the "anything else clause" which pops the `head` element, switches to "after head", inserts a `body` token, switches to "in body", and reprocesses 9. EOF token (in "in body") stops parsing with no error because the stack of open elements contains `html` and `body` Only step 7 adds a parse error.
Configuration menu - View commit details
-
Copy full SHA for 745869f - Browse repository at this point
Copy the full SHA 745869fView commit details -
Include the new error in
#errors
Without it, the number of errors is incorrect.
Configuration menu - View commit details
-
Copy full SHA for e295593 - Browse repository at this point
Copy the full SHA e295593View commit details -
Configuration menu - View commit details
-
Copy full SHA for c9e881e - Browse repository at this point
Copy the full SHA c9e881eView commit details -
The space is not foster parented
``` A<table><tr> B</tr> </em>C</table> ``` The second space isn't foster parented. The `</tr>` triggers the "anything else" clause of the in table text insertion mode which causes ` B` to be foster parented. The following space clears the pending table character tokens list and thus the `</em>` inserts the space without foster parenting. This is also clear from the text node `A BC` in the result. A foster parented second space would result in `A B C`.
Configuration menu - View commit details
-
Copy full SHA for 25b393d - Browse repository at this point
Copy the full SHA 25b393dView commit details -
End tag causes two parse errors
``` <math><annotation-xml></svg>x ``` The `</svg>` is parsed in foreign context because 1. The stack of open elements is not empty; 2. The adjusted current node (`annotation-xml` element) is not in the HTML namespace; 3. The adjusted current node is not a MathML text integration point; 4. The adjusted current node is not a MathML text integration point (separate condition from 3); 5. The adjusted current node is a MathML `annotation-xml` element but the token (`</svg>`) is not a start tag; 6. The token is not a start tag (also the `annotation-xml` isn't an HTML integration point because it lacks a particular attribute); 7. The token isn't a character token (also the `annotation-xml` isn't an HTML integration point). Thus the "any other end tag" clause of 12.2.6.5 "The rules for parsing tokens in foreign context" applies. The current node's tag name (`annotation-xml`) is not the same as the tag name of the token (`svg`) so this is a parse error. Since there's no `svg` element in the stack of open elements, the loop in step 3 through 6 of the "any other end tag" clause exits when the `body` element is reached. and the token is processed according to the current insertion mode, "in body." The `</svg>` matches the "any other end tag" of "in body." Since the MathML `annotation-xml` element is not an HTML element but is special which is a second parse error.
Configuration menu - View commit details
-
Copy full SHA for 20620b5 - Browse repository at this point
Copy the full SHA 20620b5View commit details -
These are tricky. A `<math>` tag as the first token in a document fragment whose context node is `td` (or `th`, but this isn't tested here) is fine. One in a document fragment whose context node is `tr`, `thead`, `tbody`, or `tfoot` is a parse error and the elements are foster parented. The table element start tags after the `<mo>` tag are parsed as html and there are no `tr` (for `tr` elements) elements or `thead`, `tbody`, or `tfoot` (for the others) elements in table scope which is a parse error. I just invented some sames for those. The `</table>` tag after the `<mo>` is parsed as foreign but it doesn't match `<mo>` so it's a parse error. Finally, the EOF occurs while a bunch of elements are open which is a parse error.
Configuration menu - View commit details
-
Copy full SHA for 5ef4bd4 - Browse repository at this point
Copy the full SHA 5ef4bd4View commit details -
The `</td>` is parsed in the "in cell" insertion mode and is an error because it doesn't match the open `span` element. Each of the three characters in `Foo` is reparented and is an error.
Configuration menu - View commit details
-
Copy full SHA for 45d2577 - Browse repository at this point
Copy the full SHA 45d2577View commit details -
Configuration menu - View commit details
-
Copy full SHA for dd97d25 - Browse repository at this point
Copy the full SHA dd97d25View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc5b122 - Browse repository at this point
Copy the full SHA cc5b122View commit details -
Configuration menu - View commit details
-
Copy full SHA for e599f25 - Browse repository at this point
Copy the full SHA e599f25View commit details -
If the `#errors` section should have the same number of lines as errors (see html5lib#107), then the NULL-character errors need to be accounted for.
Configuration menu - View commit details
-
Copy full SHA for 5e88d5c - Browse repository at this point
Copy the full SHA 5e88d5cView commit details -
Configuration menu - View commit details
-
Copy full SHA for be61df3 - Browse repository at this point
Copy the full SHA be61df3View commit details -
Also fix up the `#new-errors` section to make the line and column numbers match the others.
Configuration menu - View commit details
-
Copy full SHA for 05319d0 - Browse repository at this point
Copy the full SHA 05319d0View commit details -
These all appear to be duplicated errors (or more like an explanation of the error).
Configuration menu - View commit details
-
Copy full SHA for b6520ef - Browse repository at this point
Copy the full SHA b6520efView commit details -
Each character in the table (caused by the `<col>`) gets reparented and causes an error. The second `<a>` gets reparented but there's already one open, so that's a second error but the open one is not in scope so that's a third error.
Configuration menu - View commit details
-
Copy full SHA for fa35532 - Browse repository at this point
Copy the full SHA fa35532View commit details -
Fix EOF error line/columns, add to #errors
Fix the line and column numbers. Currently, the number of errors in the `#errors` section of the test is the correct number of errors. Counting the ones in `#new-errors` breaks hundreds of tests because they contain an equivalent error in `#errors`. So this adds the eof in comment error to `#errors` as well.
Configuration menu - View commit details
-
Copy full SHA for 65e3699 - Browse repository at this point
Copy the full SHA 65e3699View commit details -
Adds the other spec-mandated errors. When an error is caused by a tag, the line and column numbers are the line and column the tag starts in.
Configuration menu - View commit details
-
Copy full SHA for 26f2b8a - Browse repository at this point
Copy the full SHA 26f2b8aView commit details -
This error line appears to have been duplicated by mistake.
Configuration menu - View commit details
-
Copy full SHA for 535e74b - Browse repository at this point
Copy the full SHA 535e74bView commit details