Add test automation #44

sideshowbarker · 2020-08-17T14:57:17Z

The changes in this pull-request branch enable us, for all pull requests and pushes, to automate testing of the parser against the html5lib-tests suite — including CI execution (using GitHub Actions) of the tests .

The changes include the following commits:

15955c0 Add Html5libTest
c1c1035 Add html5lib-tests as a submodule
ff46888 Add GitHub Actions build.yml to run Maven in CI
419cae5 Exit 1 for test harness if any tests fail
2dcd4da Make Html5libTest handle double-escaped tests
27c5068 Make Html5libTest only check .dat and .test files
500a9f5 Make EncodingTester usable in testing parsed state (from PR Enable separate testing of encoding for both “pre-parsed” and “parsed” output #49)
8d31a0b Make HtmlInputStreamReader sniffing limit settable (from PR Enable separate testing of encoding for both “pre-parsed” and “parsed” output #49)

…along with some related minor changes to get things working smoothly.

pom.xml

carlosame · 2020-08-22T12:25:30Z

I pulled your branch to investigate the error, but I ran into this:

[INFO]      [java] Exception in thread "main" java.io.FileNotFoundException: C:\Users\****\oss\htmlparser\html5lib-tests\encoding (Acceso denegado)
[INFO]      [java]      at java.base/java.io.FileInputStream.open0(Native Method)
[INFO]      [java]      at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
[INFO]      [java]      at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
[INFO]      [java]      at java.base/java.io.FileInputStream.<init>(FileInputStream.java:109)
[INFO]      [java]      at nu.validator.htmlparser.test.EncodingTester.main(EncodingTester.java:116)

which is caused by ant giving a directory name to EncodingTester while that class expects a filename. Obviously the FileInputStream is not happy.

Is that branch fully updated?

anthonyvdotbe · 2020-08-22T12:32:09Z

Indeed, EncodingTester hasn't been adapted yet to take a directory, though the others have ( see 62baf8a )

carlosame · 2020-08-22T12:38:24Z

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0. Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

sideshowbarker · 2020-08-22T13:06:15Z

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0

Yes — I changed the version locally, and removed the fork="true" attributes, and confirmed it works as expected.

Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

OK, went ahead and made that change too.

Pushed the updates to this branch. Thanks again very much.

sideshowbarker · 2020-08-22T13:10:28Z

Well I just now see that dropping fork="true" causes the Java 8 build to fail.

https://github.com/validator/htmlparser/pull/44/checks?check_run_id=1015950576

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

carlosame · 2020-08-22T13:41:57Z

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

If you test with Java 8, ant is going to look for tools.jar which you removed from classpath in d0dd5ab.

Testing with version 11 and then the latest JDK (14 currently) makes sense, Java 8 is possibly unnecessary.

carlosame · 2020-08-23T12:38:42Z

Incidentally, in css4j I'm compiling my tests for target 8 due to a JPMS bug in the maven-compiler-plugin.

That bug is caused by using test classes in one module from the test classes in another module (yes the test-jar is listed as a dependency), and in principle this project has nothing close to that so it should not be affected.

sideshowbarker · 2020-08-24T17:49:43Z

So as y’all may have noticed, I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

That approach currently requires adding an extra config setting, to teach Surefire where the html5lib-tests directory is.

That config setting would be unnecessary if/when we end up moving to a new directory layout that follows Maven conventions. But after we do ever get there, the POM can be updated to drop the extra config setting.

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

If so, do you want to have a Copyright (c) 2020 Anthony Vanelverdinghe line included in the license/copyright header comment in the source — along with a Copyright (c) 2020 Mozilla Foundation line?

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

anthonyvdotbe · 2020-08-24T19:14:53Z

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

Yes, I confirm that I agree.

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

I confirm that I'm waiving my right to have my name included in the copyright statement.

carlosame · 2020-08-24T20:19:49Z

I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

The conversion of all the tests (including Html5libTest) to JUnit is something that could be done after this PR is merged, and mostly looks like a search-and-replace trick.

This change causes com.sun.tools to be included as a dependency only if the JDK version is 1.8. Otherwise, without this change, the build fails when run under any Java version later than Java 8.

This change causes the Maven AntRun plugin to invoke the javac and java commands with fork=true when run in a Java 8 environment. Otherwise, without this change, the build fails when run under Java 8.

This change updates the TokenizerTester code to expect its input test data to have the string "Data state" to identify PCDATA tests — rather than the string "DATA state". The test data in the html5lib-tests suite uses "Data state", so without this change, running TokenizerTester against html5lib-tests causes TokenizerTester to fail with a “Broken test data” harness failure.

This change makes TokenizerTester, TreeTester, and EncodingTester exit with status code 1 if any test cases fail. Otherwise, without this change, we won’t catch the test failures when running tests under CI.

This change makes TokenizerTester, TreeTester, and EncodingTester stop emitting the word “Success” to standard error every time a test passes. Otherwise, without this change, in test output, we end up with so many “Success” lines that the actual test failures are effectively obscured (you have to scroll back through the output/log to hunt for failures).

This change makes the sniffing limit in HtmlInputStreamReader settable. Without this change, the HtmlInputStreamReader sniffing limit is hardcoded to 1024 — and in the context of testing, that has the effect of limiting HtmlInputStreamReader to only being useful for testing expected output of the meta prescan. So this change makes it possible for HtmlInputStreamReader to also be used for testing the results for the state where the expected character encoding is not limited to what can be determined by checking the first 1024 bytes of the input stream.

This change updates EncodingTester to make it test the result for cases when the expected character encoding is not limited to what can be determined by checking only the first 1024 bytes of the input stream. Otherwise, without this change, EncodingTester is limited to only being useful for testing the output of the meta prescan.

This change makes the TokenizerTester(InputStream stream) constructor public — as the corresponding constructors for TreeTester and EncodingTester already are.

This change refines how Html5libTest handles filenames; it adds a mechanism to allow a required/expected file extension to be specified for each test type, and uses the mechanism to specify that ".test" is the required/expected extension for tokenizer tests, and that ".dat" is the required/expected extension for tree-construction test files and for encoding test files. Without this change, Html5libTest only deals correctly with the ".test" extension for tokenizer test files — but not with the ".dat" extension for tree-construction test files and encoding test files.

This change makes Html5libTest correctly handle tests in the html5lib-tests suite which have cases with so-called “double-escaped” “input” and “output” values — for example, values that contain the literals “\\u0000” and “\\uFFFD" rather than “\u0000” and “\uFFFD”.

This change replaces Path.of() calls in Html5libTest with Paths.get(). Per https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Path.html#of(java.net.URI) Path.of() was introduced in Java 11. So Java 8 has no Path.of(); see also https://docs.oracle.com/javase/8/docs/api/java/nio/file/Path.html We need to continue to support Java 8 for the time being. It seems Paths.get() will eventually end up being formally deprecated; by the time it finally is, we may also be able to quit supporting Java 8 — and so we can just switch to Path.of() then.

carlosame · 2020-09-04T11:52:40Z

I see that you re-enabled testing for Java 8, which is a good thing (commits 05b9024 and a27c7b6), but using Java 8 is unlikely to get along very well with the current modularization patches (as long as there is the requirement that the minimum JDK to process the POM is 11). And running automated testing with Java 11 or higher is going to be a quite different game once the project is modularized.

It's not that you cannot do either of those things (you can even use Java 8 if the POM is complicated enough) but modularization is highly coupled with testing, and setting up an automated testing infrastructure without accounting for modules has the potential of being a waste of time.

IMHO either this PR or the modularization PR should be merged as soon as possible (BTW after making any necessary changes), and then the other PR can rebase on top of it. Yes I assume that you can't do a lot either way, just making sure that you are aware of this.

sideshowbarker · 2020-09-04T14:13:00Z

@carlosame Thanks for the heads-up about the issue dealing with Java 8 in the face of the modularization patches.

It’d be pretty disappointing if Maven, in all its baroque complexity, didn’t provide a way to build a modularized jar for current Java versions while also remaining able to ensure the code at least continues to compile under Java 8.

I personally am not willing to cavalierly ignore all the users who possibly might be constrained to needing to use the parser in a Java 8 environment. I don’t have a way to measure how many such users the parser might have. For all I know, it could be there are so few users of the parser that need Java 8 compatibility that we don’t need to worry about it. Or it could be that there are some important users for whom we’d be abandoning when we rightly shouldn’t be.

But if we can’t even ourselves continue to have an automated way to test that the code continues compiles under Java 8, then we have no way to know when some code change we made has broken compatibility with Java 8.

sideshowbarker · 2020-09-04T15:45:24Z

On further reflection, it occurs to me to ask: As far as building with Maven goes, couldn’t we just have one pom.xml for building/testing the “normal” build for Java 11+, and then a second pom.xml for building/testing the Java 8 build?

carlosame · 2020-09-04T16:58:02Z

ensure the code at least continues to compile under Java 8

Hmm that's not what I was meaning. The modularization patches in PR #43 generate bytecode for Java 8 (IMHO should be 7) except for the module-info file. One thing is "compiling for Java 8", another is "testing with Java 8". In fact, the most likely outcome for this project's modularization is that the actual bytecode being tested will be version 8 but the JDK used during the tests is 11, 15 or whatever.

Actually, your current patches in this PR are always testing Java 8 bytecode, you are just producing and testing that level-8 bytecode with different JDKs. And source-level compatibility is set to 8 as well.

carlosame · 2020-09-04T18:16:40Z

Additional clarification to my previous post: the idea is that Java 8 users will be able to use the compiled jar files, but won't be able to build them. A modular JDK is required to build.

If you still want to test with Java 8 and want to avoid undesirable POM hacks, you could set up a separate test project that would depend on a level-8 test-jar produced by the main build. But I see no real reason to do that.

sideshowbarker · 2021-09-02T11:42:22Z

I believe everything in this PR branch is now also in the main branch — so I’m going ahead and closing this.

sideshowbarker force-pushed the sideshowbarker/test-automation branch 5 times, most recently from 3c71e71 to 48e6304 Compare August 17, 2020 17:54

This was referenced Aug 17, 2020

Help wanted: review Maven pom.xml changes for test automation #45

Closed

Help wanted: review GitHub Actions workflow for test automation #46

Closed

carlosame reviewed Aug 20, 2020

View reviewed changes

pom.xml Outdated Show resolved Hide resolved

sideshowbarker force-pushed the sideshowbarker/test-automation branch from 48e6304 to 80c53bc Compare August 21, 2020 05:16

sideshowbarker force-pushed the validator-nu branch from 61325c3 to 5ae26d8 Compare August 21, 2020 09:53

sideshowbarker changed the base branch from validator-nu to master August 21, 2020 16:45

sideshowbarker force-pushed the sideshowbarker/test-automation branch from 6b61599 to 04904c4 Compare August 21, 2020 17:42

sideshowbarker force-pushed the sideshowbarker/test-automation branch 7 times, most recently from 1b971e8 to 016b9ce Compare August 23, 2020 22:30

sideshowbarker force-pushed the sideshowbarker/test-automation branch from 016b9ce to 2317de9 Compare August 26, 2020 13:38

sideshowbarker force-pushed the sideshowbarker/test-automation branch 6 times, most recently from f6a787f to d84dbbb Compare September 4, 2020 04:41

sideshowbarker and others added 15 commits September 4, 2020 14:25

Include com.sun.tools as dependency only if Java 8

05b9024

This change causes com.sun.tools to be included as a dependency only if the JDK version is 1.8. Otherwise, without this change, the build fails when run under any Java version later than Java 8.

Set fork=true for Maven AntRun javac/java if Java8

a27c7b6

This change causes the Maven AntRun plugin to invoke the javac and java commands with fork=true when run in a Java 8 environment. Otherwise, without this change, the build fails when run under Java 8.

Exit 1 for test harness if any tests fail

d419264

This change makes TokenizerTester, TreeTester, and EncodingTester exit with status code 1 if any test cases fail. Otherwise, without this change, we won’t catch the test failures when running tests under CI.

Add GitHub Actions build.yml to run Maven in CI

ebbd0c5

Make TokenizerTester() constructor public

196bd48

This change makes the TokenizerTester(InputStream stream) constructor public — as the corresponding constructors for TreeTester and EncodingTester already are.

Add Html5libTest

d4edb40

Add html5lib-tests as a submodule

ef95b92

Html5libTest: Add copyright; match project style

fcd18b1

sideshowbarker force-pushed the sideshowbarker/test-automation branch from d84dbbb to 390d9a9 Compare September 4, 2020 05:27

sideshowbarker closed this Sep 2, 2021

sideshowbarker deleted the sideshowbarker/test-automation branch September 2, 2021 11:42

Add test automation #44

Add test automation #44

Uh oh!

Conversation

sideshowbarker commented Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

carlosame commented Aug 22, 2020

Uh oh!

anthonyvdotbe commented Aug 22, 2020

Uh oh!

carlosame commented Aug 22, 2020

Uh oh!

sideshowbarker commented Aug 22, 2020

Uh oh!

sideshowbarker commented Aug 22, 2020

Uh oh!

carlosame commented Aug 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlosame commented Aug 23, 2020

Uh oh!

sideshowbarker commented Aug 24, 2020

Uh oh!

anthonyvdotbe commented Aug 24, 2020

Uh oh!

carlosame commented Aug 24, 2020

Uh oh!

carlosame commented Sep 4, 2020

Uh oh!

sideshowbarker commented Sep 4, 2020

Uh oh!

sideshowbarker commented Sep 4, 2020

Uh oh!

carlosame commented Sep 4, 2020

Uh oh!

carlosame commented Sep 4, 2020

Uh oh!

sideshowbarker commented Sep 2, 2021

Uh oh!

Uh oh!

sideshowbarker commented Aug 17, 2020 •

edited

Loading

carlosame commented Aug 22, 2020 •

edited

Loading