From 5a7149f804288bbcc94bf38bb67750375e71030d Mon Sep 17 00:00:00 2001 From: Stephen Checkoway Date: Mon, 1 Oct 2018 14:20:28 -0400 Subject: [PATCH 01/23] -- in a comment isn't an error `` is not an error. The middle `--` cause the tokenizer to enter the comment end state. The "anything else" clause appends two `-` to the comment token's data and the current input character is reconsumed in the comment state. --- tree-construction/comments01.dat | 8 -------- tree-construction/html5test-com.dat | 1 - 2 files changed, 9 deletions(-) diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat index fa79c2b1..e0619028 100644 --- a/tree-construction/comments01.dat +++ b/tree-construction/comments01.dat @@ -57,7 +57,6 @@ FOOBAZ #errors (1,3): expected-doctype-but-got-chars -(1,15): unexpected-char-in-comment -(1,24): unexpected-char-in-comment #document | | @@ -86,8 +83,6 @@ FOOBAZ FOOBAZ #errors (1,3): expected-doctype-but-got-chars -(1,15): unexpected-char-in-comment -(1,24): unexpected-char-in-comment (1,31): unexpected-bang-after-double-dash-in-comment #new-errors (1:32) incorrectly-closed-comment @@ -103,9 +98,6 @@ FOOBAZ FOO #errors -(1,10): unexpected-char-in-comment (1,15): expected-doctype-but-got-eof #document | From c70ef7db74634c2c6433287dcbcaf2eee3bf1fd8 Mon Sep 17 00:00:00 2001 From: Stephen Checkoway Date: Mon, 1 Oct 2018 14:09:43 -0400 Subject: [PATCH 02/23] is not an error Starting from the data state, we have 1. `<` switches to the tag open state 2. `!` switches to the markup declaration open state 3. `--` switches to the comment start state 4. `-` (the third one) switches to the comment start dash state 5. `-` (the fourth one) switches to the comment end state 6. `-` (the fifth one) appends `-` to the comment and does not change state 7. `>` emits the comment token and switches to the data state. --- tree-construction/comments01.dat | 1 - tree-construction/tests1.dat | 1 - 2 files changed, 2 deletions(-) diff --git a/tree-construction/comments01.dat b/tree-construction/comments01.dat index e0619028..cb508b01 100644 --- a/tree-construction/comments01.dat +++ b/tree-construction/comments01.dat @@ -194,7 +194,6 @@ FOOBAZ FOOBAZ #errors (1,3): expected-doctype-but-got-chars -(1,10): unexpected-dash-after-double-dash-in-comment #document | | diff --git a/tree-construction/tests1.dat b/tree-construction/tests1.dat index 1c36c1b8..86632deb 100644 --- a/tree-construction/tests1.dat +++ b/tree-construction/tests1.dat @@ -425,7 +425,6 @@ Line1
Line2
Line3
Line4 #data
helloexcite!me! #errors -(1,7): unexpected-dash-after-double-dash-in-comment (1,14): expected-doctype-but-got-start-tag (1,41): unexpected-start-tag-implies-table-voodoo (1,48): foster-parenting-character-in-table From 4c2b23a83fb7cc0e88eca34d6d401885b0fc8e66 Mon Sep 17 00:00:00 2001 From: Stephen Checkoway Date: Wed, 3 Oct 2018 02:18:50 -0400 Subject: [PATCH 03/23] Add DOCTYPE errors (and remove one) A DOCTYPE token is an error in one of three cases: 1. The token's name is not `html`; 2. The token's public identifier is not missing; 3. The token's system identifier is not missing and the token's system identifier isn't `about:legacy-compat`. This appears to have changed at some point from a much more complex set of conditions. --- tree-construction/doctype01.dat | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tree-construction/doctype01.dat b/tree-construction/doctype01.dat index c845becf..9efdaf70 100644 --- a/tree-construction/doctype01.dat +++ b/tree-construction/doctype01.dat @@ -34,7 +34,6 @@ #data Hello #errors -(1,9): need-space-after-doctype (1,10): expected-doctype-name-but-got-right-bracket (1,10): unknown-doctype #new-errors @@ -337,6 +336,7 @@ Hello #errors +(2,43): unknown-doctype #document | | @@ -421,6 +421,7 @@ #errors (1,50): unexpected-char-in-doctype +(1,89): unknown-doctype #new-errors (1:50) missing-whitespace-between-doctype-public-and-system-identifiers #document @@ -433,6 +434,7 @@ #errors (1,50): unexpected-char-in-doctype +(1,89): unknown-doctype #new-errors (1:50) missing-whitespace-between-doctype-public-and-system-identifiers #document @@ -446,6 +448,7 @@ #errors (1,21): unexpected-char-in-doctype (1,49): unexpected-char-in-doctype +(1,88): unknown-doctype #new-errors (1:22) missing-whitespace-after-doctype-public-keyword (1:49) missing-whitespace-between-doctype-public-and-system-identifiers @@ -460,6 +463,7 @@ #errors (1,21): unexpected-char-in-doctype (1,49): unexpected-char-in-doctype +(1,88): unknown-doctype #new-errors (1:22) missing-whitespace-after-doctype-public-keyword (1:49) missing-whitespace-between-doctype-public-and-system-identifiers From 4022182c33de2ce597297d87fb77c0f9e51a9c3c Mon Sep 17 00:00:00 2001 From: Stephen Checkoway Date: Tue, 2 Oct 2018 15:49:49 -0400 Subject: [PATCH 04/23] Fix errors in doctypes A `>` after `DOCTYPE` is a missing-doctype-name parse error but it is not also a missing-whitespace-before-doctype-name parse error. Doctypes with a public identifier is a (currently unnamed) parse error. --- tree-construction/tests6.dat | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tree-construction/tests6.dat b/tree-construction/tests6.dat index f3991232..8c36dd3d 100644 --- a/tree-construction/tests6.dat +++ b/tree-construction/tests6.dat @@ -48,7 +48,6 @@ #data #errors -(1,9): need-space-after-doctype (1,10): expected-doctype-name-but-got-right-bracket (1,10): unknown-doctype #new-errors @@ -604,6 +603,7 @@ html #data #errors +(1,50): doctype-has-public-identifier #document | | From 607e334b5188eea743a76fa3cfa70657c9b36f97 Mon Sep 17 00:00:00 2001 From: Stephen Checkoway Date: Mon, 1 Oct 2018 22:02:07 -0400 Subject: [PATCH 05/23] Fix entity errors Named character references in attributes whose last character is not `;` and for which the next input character is `=` (or ASCII alphanumeric, but this isn't tested here), flushes the code points consumed as a character reference _without_ adding a parse error. Named character references not in attributes whose last character is not `;` are errors, regardless of the following character as noted in the `#new-errors` section but without an entry in `#errors`, the number of errors are wrong. (See https://github.com/html5lib/html5lib-tests/issues/107). Separately, this adds the missing expected-doctype-but-got-start-tag error. --- tree-construction/entities02.dat | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tree-construction/entities02.dat b/tree-construction/entities02.dat index 0c6e898c..74965a35 100644 --- a/tree-construction/entities02.dat +++ b/tree-construction/entities02.dat @@ -45,7 +45,6 @@ #data
#errors -(1,15): named-entity-without-semicolon (1,20): expected-doctype-but-got-start-tag #document | @@ -204,7 +203,6 @@ #data
#errors -(1,18): named-entity-without-semicolon (1,23): expected-doctype-but-got-start-tag #document | @@ -299,6 +297,8 @@ #data
ZZÆ=
#errors +(1,5): expected-doctype-but-got-start-tag +(1:14) missing-semicolon-after-character-reference #new-errors (1:14) missing-semicolon-after-character-reference #document From 745869fc026203512f6856a0cb003d1c4466e88b Mon Sep 17 00:00:00 2001 From: Stephen Checkoway Date: Mon, 1 Oct 2018 14:52:27 -0400 Subject: [PATCH 06/23] `
please!