Split encoding tests to those determined pre-parse and during-parse #28

gsnedders · 2013-12-23T00:03:24Z

At the moment we only have tests that check we get the right encoding at the end. Really we want to be testing the meta pre-scan too (which is done for html5lib… by using any tests with less than 512 characters or something like that…).

This change adds a `preparsed` subdirectory in the `encoding` directory, with tests for which the result of the *encoding sniffing algorithm* at https://html.spec.whatwg.org/#encoding-sniffing-algorithm is the expected result — that is, tests for which the expected result is the output of running *only* the encoding sniffing algorithm (of which the main sub-algorithm is the so-called “meta prescan”) — without also running the tokenization state machine and tree-construction stage. This change also adds a README file that explicitly documents what the expected results for the encoding tests are, based on whether or not they’re in the `preparsed` subdirectory. Without those changes, it’s unclear whether the expected results shown in the existing tests are for the output of fully parsing the test data — through the tokenization state machine and tree-construction stage — or instead just the output of the encoding sniffing algorithm only. And without those changes, we also don’t have any tests a system can use for testing only the output from the encoding sniffing algorithm. Fixes #28

gsnedders mentioned this issue Nov 8, 2017

Charset in meta content does not correctly parse for trailing semi-colon html5lib/html5lib-python#92

Open

sideshowbarker linked a pull request Aug 21, 2020 that will close this issue

Test the encoding sniffing algorithm (aka meta prescan) #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split encoding tests to those determined pre-parse and during-parse #28

Split encoding tests to those determined pre-parse and during-parse #28

gsnedders commented Dec 23, 2013

Split encoding tests to those determined pre-parse and during-parse #28

Split encoding tests to those determined pre-parse and during-parse #28

Comments

gsnedders commented Dec 23, 2013