Lucene/Solr 9; Java 17; Tomcat 10 #526

jan-niestadt · 2024-07-11T09:42:16Z

Thanks to @eduarddrenth for this branch, which updates our previous experiment and solves more issues.

CURRENT STATUS: working, experimental. Will probably be merged in after releasing v4 soon

Old comments:

Main issue now seems to be that indexes created with Lucene 8 cannot be read by this Lucene 9 version. Normally, Lucene 9 should not have any issue reading Lucene 8 indexes, but our custom Codec probably causes issues with this. You can see this when running the tests, some of which run against a precreated (with Lucene 8) test index.

…nction)

This contains the more "interesting" migrations (more than just a removed function or import path) - poms updated to use latest version of jackson/jersey/xml-bind (might miss a runtime dependency still). Servlet-api still on v4 due to packaged jetty server in SOLR being stuck on v4. - implement most simple (oneliner) migrations, should be doublechecked - one remaining compilation error in BLSpanOrQuery that should be solved.

This gives it access to the package-private class used by SpanOrQuery. Might seem like a hack, but these classes are so intertwined with how Lucene works internally that they might as well live in the same package. Arguably most subclasses of BLSpanQuery and BLSpans should be moved to the same package eventually. We might need to still double-check that BLSpanTermQuery and BLSpanMultiTermQueryWrapper are up to date with their Lucene 9 counterparts.

update

1 solr test failing

exclude solr module untill jakarta compatible version is out

# Conflicts: # engine/pom.xml # engine/src/main/java/nl/inl/blacklab/indexers/config/saxon/XPathFinder.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansCaptureRelationsWithinSpan.java

no try catch needed LoggingWatcher

…riment/tomcat-10 # Conflicts: # engine/pom.xml # engine/src/main/java/nl/inl/blacklab/codec/BlackLab40StoredFieldsReader.java # engine/src/main/java/nl/inl/blacklab/search/SingleDocIdFilter.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLConjunctionSpans.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLConjunctionSpansInBuckets.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLFilterDocsSpans.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLSpanMultiTermQueryWrapper.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLSpanQuery.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQueryAnyToken.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQueryNoHits.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQuerySequence.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansAnd.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansFiltered.java # engine/src/main/java/nl/inl/blacklab/search/results/HitsInternal.java # engine/src/main/java/nl/inl/util/DocValuesUtil.java # engine/src/main/java/org/apache/lucene/queries/spans/BLSpanOrQuery.java # pom.xml # proxy/jaxb/src/main/java/org/ivdnt/blacklab/proxy/representation/ParsePatternResponse.java # proxy/jaxb/src/main/java/org/ivdnt/blacklab/proxy/representation/SummaryTextPattern.java # solr/pom.xml # solr/src/main/java/org/ivdnt/blacklab/solr/DocSetFilter.java # wslib/src/main/java/nl/inl/blacklab/server/exceptions/BadRequest.java # wslib/src/main/java/nl/inl/blacklab/server/lib/Response.java

correct analyzers version solr.XSLTResponseWriter not found

full classname for xsltresponsewriter new lucene version => rebuild index in test resources jakarta needed in solr tests

jan-niestadt · 2024-07-11T11:19:34Z

After making sure BlackLab40Codec uses Lucene87 as the delegate codec (previously it requested the default codec from Lucene, which is obvisouly different in Lucene 9), we still have a problem reading back our custom terms file.

The cause seems to be that DataInput/DataOutput have been switched to little-endian. Presumably we need to update our custom codec to explicitly read big-endian, which is how these indexes were written.

We should also add a new codec version, e.g. BlackLab41, that will use the new 9.x Lucene codec as delegate (or even adapts, using the default when creating, and recording what delegate codec was used so it can be reinstantiated when reading later). That version of the codec can use little-endian for the custom terms files as well.

When reading an index written using Lucene 8, we need to make sure we use EndiannessReverserUtil when opening the file, because Lucene 9 switched to little-endian for DataInput/Output.

API documentation was grouped by subject. BLS configuration docs were updated and expanded. Squashed commit of the following: commit 0a0cf33 Author: Jan Niestadt <[email protected]> Date: Wed Jul 9 09:03:52 2025 +0200 Redirect. commit 269459a Author: Jan Niestadt <[email protected]> Date: Tue Jun 24 16:02:39 2025 +0200 Configuration, more. commit 876bb1b Author: Jan Niestadt <[email protected]> Date: Mon Jun 23 14:40:20 2025 +0200 Typo commit 639cdd0 Author: Jan Niestadt <[email protected]> Date: Thu Jun 19 15:34:00 2025 +0200 Work on API docs. commit 8e8125a Author: Jan Niestadt <[email protected]> Date: Wed Jun 18 15:25:34 2025 +0200 Use Docker tag dev instead of latest. commit d95bffe Author: Jan Niestadt <[email protected]> Date: Tue Jun 17 16:27:50 2025 +0200 Restructure API reference. commit c7b1baa Author: Jan Niestadt <[email protected]> Date: Tue Jun 17 13:26:52 2025 +0200 API v4/5.

Values larger than 32K cannot be indexed in Lucene, so we truncate them even if no maxValueLength was set. Created WarnOnce to be able to issue warnings only once during indexing (where a problem may often occur many times, flooding the logs).

EphemeralHit's instance variables had the same name as the getter methods, which could be confusing. Reduced access to package private as well, and only used direct access to variables in HitsInternal classes. In other classes, we use the getter methods. The JVM should normally inline these anyway for hot code.

When you catch this exception and don't re-throw it, you should re-set the thread's interrupted flag so the status is not lost.

sonarqubecloud · 2025-07-14T13:57:23Z

Quality Gate failed

Failed conditions
35.7% Duplication on New Code (required ≤ 3%)
B Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

jan-niestadt · 2025-07-15T07:41:10Z

I've created a new branch jakarta where we will continue developing this version. Closing this PR, opening a new one.

KCMertens and others added 27 commits November 7, 2023 15:10

Some import migrations

25b9446

Some import migrations

dfc466f

Some import migrations

4566e7b

Update most spans to lucene 9 (update imports, remove extractTerms fu…

618f3c3

…nction)

Merge pull request #1 from INL/dev

ef684af

update

npe

e66a8d9

Merge remote-tracking branch 'origin/dev' into dev

3f52983

Merge branch 'INL:dev' into dev

cf3415f

compiling with ee 10 jdk 17

ec0bc69

1 solr test failing

revert solr module

e7feefe

exclude solr module untill jakarta compatible version is out

Merge branch 'dev' into experiment/tomcat-10

d03bd24

# Conflicts: # engine/pom.xml # engine/src/main/java/nl/inl/blacklab/indexers/config/saxon/XPathFinder.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansCaptureRelationsWithinSpan.java

building after merge in dev

f6f4a46

new solr

3983214

wrong version release plugin

772503e

new lucene

675d81a

new lucene

208af31

versions, dependencies

88c3bda

no try catch needed LoggingWatcher

completing merge

5545da4

correct analyzers version solr.XSLTResponseWriter not found

completing merge

82e1d73

correct analyzers version solr.XSLTResponseWriter not found

unnecessary analyzers version

5896394

solrconfig.xml:

84acdf2

full classname for xsltresponsewriter new lucene version => rebuild index in test resources jakarta needed in solr tests

todo solved

35ee10d

Update Dockerfiles builder base image.

a5f349e

Use Lucene87 codec as default in BlackLab40Codec.

9917246

jan-niestadt marked this pull request as draft July 11, 2024 09:42

Deal with endianness switch.

7ce202e

When reading an index written using Lucene 8, we need to make sure we use EndiannessReverserUtil when opening the file, because Lucene 9 switched to little-endian for DataInput/Output.

jan-niestadt added 26 commits July 9, 2025 08:53

Minor code improvements.

406871f

Site fixes.

ddbff7c

Update changelog.

03f2ea9

Auto-truncate values >32K. WarnOnce class.

0543e7a

Values larger than 32K cannot be indexed in Lucene, so we truncate them even if no maxValueLength was set. Created WarnOnce to be able to issue warnings only once during indexing (where a problem may often occur many times, flooding the logs).

Proxy: remove clone() methods.

1f675af

CQ (code quality): various small improvements.

656e5be

Add distributionManagement to pom.xml.

2747f09

Remove distributionManagement from build-tools/pom.xml.

3108213

Add distributionManagement to build-tools (no parent).

7fb21e8

Tweaks to get maven-release-plugin to work.

430dc74

Remove build-tools from build for release.

3933fbc

Upgrade central-publishing plugin to 0.8.0.

dd99cca

[maven-release-plugin] prepare release v4.0.0

f455594

[maven-release-plugin] prepare for next development iteration

c50041a

Site updates.

44a2db4

Remove experimental solr and proxy from build for now.

c6ccd2d

site: highlight 4.0 release.

00dbbf8

Improve handling of InterruptedException.

7995727

When you catch this exception and don't re-throw it, you should re-set the thread's interrupted flag so the status is not lost.

Restore catch in FileUploadHandler.

0597563

Merge branch 'dev' into experiment/fa-tomcat-10

2453b40

WIP fix Solr+proxy.

956fbf1

Solr: fix TextPattern serialization.

fdf9421

Catch and handle SecurityException when looking for formats.

0c22653

Add doc/technical/design/todo-jakarta-tomcat10.txt.

2a2aecf

jan-niestadt closed this Jul 15, 2025

jan-niestadt mentioned this pull request Jul 15, 2025

Jakarta / Tomcat 10 / Lucene 9 version, to become BlackLab 5.0 #572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lucene/Solr 9; Java 17; Tomcat 10 #526

Lucene/Solr 9; Java 17; Tomcat 10 #526

Uh oh!

jan-niestadt commented Jul 11, 2024 •

edited

Loading

Uh oh!

jan-niestadt commented Jul 11, 2024

Uh oh!

sonarqubecloud bot commented Jul 14, 2025

Uh oh!

jan-niestadt commented Jul 15, 2025

Uh oh!

Uh oh!

Lucene/Solr 9; Java 17; Tomcat 10 #526

Lucene/Solr 9; Java 17; Tomcat 10 #526

Uh oh!

Conversation

jan-niestadt commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan-niestadt commented Jul 11, 2024

Uh oh!

sonarqubecloud bot commented Jul 14, 2025

Quality Gate failed

Uh oh!

jan-niestadt commented Jul 15, 2025

Uh oh!

Uh oh!

jan-niestadt commented Jul 11, 2024 •

edited

Loading