Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test automation #44

Closed
wants to merge 15 commits into from
Closed

Conversation

sideshowbarker
Copy link
Contributor

@sideshowbarker sideshowbarker commented Aug 17, 2020

The changes in this pull-request branch enable us, for all pull requests and pushes, to automate testing of the parser against the html5lib-tests suite — including CI execution (using GitHub Actions) of the tests .

The changes include the following commits:

…along with some related minor changes to get things working smoothly.

pom.xml Outdated Show resolved Hide resolved
@sideshowbarker sideshowbarker changed the base branch from validator-nu to master August 21, 2020 16:45
@carlosame
Copy link

I pulled your branch to investigate the error, but I ran into this:

[INFO]      [java] Exception in thread "main" java.io.FileNotFoundException: C:\Users\****\oss\htmlparser\html5lib-tests\encoding (Acceso denegado)
[INFO]      [java]      at java.base/java.io.FileInputStream.open0(Native Method)
[INFO]      [java]      at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
[INFO]      [java]      at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
[INFO]      [java]      at java.base/java.io.FileInputStream.<init>(FileInputStream.java:109)
[INFO]      [java]      at nu.validator.htmlparser.test.EncodingTester.main(EncodingTester.java:116)

which is caused by ant giving a directory name to EncodingTester while that class expects a filename. Obviously the FileInputStream is not happy.

Is that branch fully updated?

@anthonyvdotbe
Copy link
Contributor

Indeed, EncodingTester hasn't been adapted yet to take a directory, though the others have ( see 62baf8a )

@carlosame
Copy link

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0. Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

@sideshowbarker
Copy link
Contributor Author

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0

Yes — I changed the version locally, and removed the fork="true" attributes, and confirmed it works as expected.

Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

OK, went ahead and made that change too.

Pushed the updates to this branch. Thanks again very much.

@sideshowbarker
Copy link
Contributor Author

Well I just now see that dropping fork="true" causes the Java 8 build to fail.

https://github.com/validator/htmlparser/pull/44/checks?check_run_id=1015950576

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

@carlosame
Copy link

carlosame commented Aug 22, 2020

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

If you test with Java 8, ant is going to look for tools.jar which you removed from classpath in d0dd5ab.

Testing with version 11 and then the latest JDK (14 currently) makes sense, Java 8 is possibly unnecessary.

@carlosame
Copy link

Incidentally, in css4j I'm compiling my tests for target 8 due to a JPMS bug in the maven-compiler-plugin.

That bug is caused by using test classes in one module from the test classes in another module (yes the test-jar is listed as a dependency), and in principle this project has nothing close to that so it should not be affected.

@sideshowbarker sideshowbarker force-pushed the sideshowbarker/test-automation branch 7 times, most recently from 1b971e8 to 016b9ce Compare August 23, 2020 22:30
@sideshowbarker
Copy link
Contributor Author

So as y’all may have noticed, I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

That approach currently requires adding an extra config setting, to teach Surefire where the html5lib-tests directory is.

That config setting would be unnecessary if/when we end up moving to a new directory layout that follows Maven conventions. But after we do ever get there, the POM can be updated to drop the extra config setting.

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

If so, do you want to have a Copyright (c) 2020 Anthony Vanelverdinghe line included in the license/copyright header comment in the source — along with a Copyright (c) 2020 Mozilla Foundation line?

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

@anthonyvdotbe
Copy link
Contributor

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

Yes, I confirm that I agree.

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

I confirm that I'm waiving my right to have my name included in the copyright statement.

@carlosame
Copy link

I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

The conversion of all the tests (including Html5libTest) to JUnit is something that could be done after this PR is merged, and mostly looks like a search-and-replace trick.

@sideshowbarker sideshowbarker force-pushed the sideshowbarker/test-automation branch 6 times, most recently from f6a787f to d84dbbb Compare September 4, 2020 04:41
sideshowbarker and others added 15 commits September 4, 2020 14:25
This change causes com.sun.tools to be included as a dependency
only if the JDK version is 1.8.

Otherwise, without this change, the build fails when run under any
Java version later than Java 8.
This change causes the Maven AntRun plugin to invoke the javac and java
commands with fork=true when run in a Java 8 environment.

Otherwise, without this change, the build fails when run under Java 8.
This change updates the TokenizerTester code to expect its input test
data to have the string "Data state" to identify PCDATA tests — rather
than the string "DATA state".

The test data in the html5lib-tests suite uses "Data state", so without
this change, running TokenizerTester against html5lib-tests causes
TokenizerTester to fail with a “Broken test data” harness failure.
This change makes TokenizerTester, TreeTester, and EncodingTester exit
with status code 1 if any test cases fail.

Otherwise, without this change, we won’t catch the test failures when
running tests under CI.
This change makes TokenizerTester, TreeTester, and EncodingTester stop
emitting the word “Success” to standard error every time a test passes.

Otherwise, without this change, in test output, we end up with so many
“Success” lines that the actual test failures are effectively obscured
(you have to scroll back through the output/log to hunt for failures).
This change makes the sniffing limit in HtmlInputStreamReader settable.

Without this change, the HtmlInputStreamReader sniffing limit is
hardcoded to 1024 — and in the context of testing, that has the effect
of limiting HtmlInputStreamReader to only being useful for testing
expected output of the meta prescan.

So this change makes it possible for HtmlInputStreamReader to also be
used for testing the results for the state where the expected character
encoding is not limited to what can be determined by checking the first
1024 bytes of the input stream.
This change updates EncodingTester to make it test the result for cases
when the expected character encoding is not limited to what can be
determined by checking only the first 1024 bytes of the input stream.

Otherwise, without this change, EncodingTester is limited to only being
useful for testing the output of the meta prescan.
This change makes the TokenizerTester(InputStream stream) constructor
public — as the corresponding constructors for TreeTester and
EncodingTester already are.
This change refines how Html5libTest handles filenames; it adds a
mechanism to allow a required/expected file extension to be specified
for each test type, and uses the mechanism to specify that ".test" is
the required/expected extension for tokenizer tests, and that ".dat" is
the required/expected extension for tree-construction test files and for
encoding test files.

Without this change, Html5libTest only deals correctly with the ".test"
extension for tokenizer test files — but not with the ".dat" extension
for tree-construction test files and encoding test files.
This change makes Html5libTest correctly handle tests in the
html5lib-tests suite which have cases with so-called “double-escaped”
“input” and “output” values — for example, values that contain the
literals “\\u0000” and “\\uFFFD" rather than “\u0000” and “\uFFFD”.
This change replaces Path.of() calls in Html5libTest with Paths.get().

Per https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Path.html#of(java.net.URI)
Path.of() was introduced in Java 11. So Java 8 has no Path.of(); see
also https://docs.oracle.com/javase/8/docs/api/java/nio/file/Path.html

We need to continue to support Java 8 for the time being. It seems
Paths.get() will eventually end up being formally deprecated; by the
time it finally is, we may also be able to quit supporting Java 8 — and
so we can just switch to Path.of() then.
@carlosame
Copy link

I see that you re-enabled testing for Java 8, which is a good thing (commits 05b9024 and a27c7b6), but using Java 8 is unlikely to get along very well with the current modularization patches (as long as there is the requirement that the minimum JDK to process the POM is 11). And running automated testing with Java 11 or higher is going to be a quite different game once the project is modularized.

It's not that you cannot do either of those things (you can even use Java 8 if the POM is complicated enough) but modularization is highly coupled with testing, and setting up an automated testing infrastructure without accounting for modules has the potential of being a waste of time.

IMHO either this PR or the modularization PR should be merged as soon as possible (BTW after making any necessary changes), and then the other PR can rebase on top of it. Yes I assume that you can't do a lot either way, just making sure that you are aware of this.

@sideshowbarker
Copy link
Contributor Author

@carlosame Thanks for the heads-up about the issue dealing with Java 8 in the face of the modularization patches.

It’d be pretty disappointing if Maven, in all its baroque complexity, didn’t provide a way to build a modularized jar for current Java versions while also remaining able to ensure the code at least continues to compile under Java 8.

I personally am not willing to cavalierly ignore all the users who possibly might be constrained to needing to use the parser in a Java 8 environment. I don’t have a way to measure how many such users the parser might have. For all I know, it could be there are so few users of the parser that need Java 8 compatibility that we don’t need to worry about it. Or it could be that there are some important users for whom we’d be abandoning when we rightly shouldn’t be.

But if we can’t even ourselves continue to have an automated way to test that the code continues compiles under Java 8, then we have no way to know when some code change we made has broken compatibility with Java 8.

@sideshowbarker
Copy link
Contributor Author

On further reflection, it occurs to me to ask: As far as building with Maven goes, couldn’t we just have one pom.xml for building/testing the “normal” build for Java 11+, and then a second pom.xml for building/testing the Java 8 build?

@carlosame
Copy link

ensure the code at least continues to compile under Java 8

Hmm that's not what I was meaning. The modularization patches in PR #43 generate bytecode for Java 8 (IMHO should be 7) except for the module-info file. One thing is "compiling for Java 8", another is "testing with Java 8". In fact, the most likely outcome for this project's modularization is that the actual bytecode being tested will be version 8 but the JDK used during the tests is 11, 15 or whatever.

Actually, your current patches in this PR are always testing Java 8 bytecode, you are just producing and testing that level-8 bytecode with different JDKs. And source-level compatibility is set to 8 as well.

@carlosame
Copy link

Additional clarification to my previous post: the idea is that Java 8 users will be able to use the compiled jar files, but won't be able to build them. A modular JDK is required to build.

If you still want to test with Java 8 and want to avoid undesirable POM hacks, you could set up a separate test project that would depend on a level-8 test-jar produced by the main build. But I see no real reason to do that.

@sideshowbarker
Copy link
Contributor Author

I believe everything in this PR branch is now also in the main branch — so I’m going ahead and closing this.

@sideshowbarker sideshowbarker deleted the sideshowbarker/test-automation branch September 2, 2021 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants