Skip to content

Lucene/Solr 9; Java 17; Tomcat 10 #526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 132 commits into from
Closed

Conversation

jan-niestadt
Copy link
Member

@jan-niestadt jan-niestadt commented Jul 11, 2024

Thanks to @eduarddrenth for this branch, which updates our previous experiment and solves more issues.

CURRENT STATUS: working, experimental. Will probably be merged in after releasing v4 soon

Old comments:

Main issue now seems to be that indexes created with Lucene 8 cannot be read by this Lucene 9 version. Normally, Lucene 9 should not have any issue reading Lucene 8 indexes, but our custom Codec probably causes issues with this. You can see this when running the tests, some of which run against a precreated (with Lucene 8) test index.

KCMertens and others added 27 commits November 7, 2023 15:10
This contains the more "interesting" migrations (more than just a removed function or import path)

- poms updated to use latest version of jackson/jersey/xml-bind (might miss a runtime dependency still). Servlet-api still on v4 due to packaged jetty server in SOLR being stuck on v4.
- implement most simple (oneliner) migrations, should be doublechecked
- one remaining compilation error in BLSpanOrQuery that should be solved.
This gives it access to the package-private class used by SpanOrQuery.

Might seem like a hack, but these classes are so intertwined with how
Lucene works internally that they might as well live in the same package.

Arguably most subclasses of BLSpanQuery and BLSpans should be moved to
the same package eventually.

We might need to still double-check that BLSpanTermQuery and BLSpanMultiTermQueryWrapper
are up to date with their Lucene 9 counterparts.
1 solr test failing
exclude solr module untill jakarta compatible version is out
# Conflicts:
#	engine/pom.xml
#	engine/src/main/java/nl/inl/blacklab/indexers/config/saxon/XPathFinder.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/SpansCaptureRelationsWithinSpan.java
no try catch needed
LoggingWatcher
…riment/tomcat-10

# Conflicts:
#	engine/pom.xml
#	engine/src/main/java/nl/inl/blacklab/codec/BlackLab40StoredFieldsReader.java
#	engine/src/main/java/nl/inl/blacklab/search/SingleDocIdFilter.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/BLConjunctionSpans.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/BLConjunctionSpansInBuckets.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/BLFilterDocsSpans.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/BLSpanMultiTermQueryWrapper.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/BLSpanQuery.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQueryAnyToken.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQueryNoHits.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQuerySequence.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/SpansAnd.java
#	engine/src/main/java/nl/inl/blacklab/search/lucene/SpansFiltered.java
#	engine/src/main/java/nl/inl/blacklab/search/results/HitsInternal.java
#	engine/src/main/java/nl/inl/util/DocValuesUtil.java
#	engine/src/main/java/org/apache/lucene/queries/spans/BLSpanOrQuery.java
#	pom.xml
#	proxy/jaxb/src/main/java/org/ivdnt/blacklab/proxy/representation/ParsePatternResponse.java
#	proxy/jaxb/src/main/java/org/ivdnt/blacklab/proxy/representation/SummaryTextPattern.java
#	solr/pom.xml
#	solr/src/main/java/org/ivdnt/blacklab/solr/DocSetFilter.java
#	wslib/src/main/java/nl/inl/blacklab/server/exceptions/BadRequest.java
#	wslib/src/main/java/nl/inl/blacklab/server/lib/Response.java
correct analyzers version
solr.XSLTResponseWriter not found
correct analyzers version
solr.XSLTResponseWriter not found
full classname for xsltresponsewriter
new lucene version => rebuild index in test resources
jakarta needed in solr tests
@jan-niestadt jan-niestadt marked this pull request as draft July 11, 2024 09:42
@jan-niestadt
Copy link
Member Author

After making sure BlackLab40Codec uses Lucene87 as the delegate codec (previously it requested the default codec from Lucene, which is obvisouly different in Lucene 9), we still have a problem reading back our custom terms file.

The cause seems to be that DataInput/DataOutput have been switched to little-endian. Presumably we need to update our custom codec to explicitly read big-endian, which is how these indexes were written.

We should also add a new codec version, e.g. BlackLab41, that will use the new 9.x Lucene codec as delegate (or even adapts, using the default when creating, and recording what delegate codec was used so it can be reinstantiated when reading later). That version of the codec can use little-endian for the custom terms files as well.

When reading an index written using Lucene 8, we need to
make sure we use EndiannessReverserUtil when opening the file,
because Lucene 9 switched to little-endian for DataInput/Output.
API documentation was grouped by subject.
BLS configuration docs were updated and expanded.

Squashed commit of the following:

commit 0a0cf33
Author: Jan Niestadt <[email protected]>
Date:   Wed Jul 9 09:03:52 2025 +0200

    Redirect.

commit 269459a
Author: Jan Niestadt <[email protected]>
Date:   Tue Jun 24 16:02:39 2025 +0200

    Configuration, more.

commit 876bb1b
Author: Jan Niestadt <[email protected]>
Date:   Mon Jun 23 14:40:20 2025 +0200

    Typo

commit 639cdd0
Author: Jan Niestadt <[email protected]>
Date:   Thu Jun 19 15:34:00 2025 +0200

    Work on API docs.

commit 8e8125a
Author: Jan Niestadt <[email protected]>
Date:   Wed Jun 18 15:25:34 2025 +0200

    Use Docker tag dev instead of latest.

commit d95bffe
Author: Jan Niestadt <[email protected]>
Date:   Tue Jun 17 16:27:50 2025 +0200

    Restructure API reference.

commit c7b1baa
Author: Jan Niestadt <[email protected]>
Date:   Tue Jun 17 13:26:52 2025 +0200

    API v4/5.
Values larger than 32K cannot be indexed in Lucene, so we truncate
them even if no maxValueLength was set.

Created WarnOnce to be able to issue warnings only once during indexing
(where a problem may often occur many times, flooding the logs).
EphemeralHit's instance variables had the same name as
the getter methods, which could be confusing.

Reduced access to package private as well, and only
used direct access to variables in HitsInternal classes.
In other classes, we use the getter methods. The JVM
should normally inline these anyway for hot code.
When you catch this exception and don't re-throw it, you
should re-set the thread's interrupted flag so the status
is not lost.
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
35.7% Duplication on New Code (required ≤ 3%)
B Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@jan-niestadt
Copy link
Member Author

I've created a new branch jakarta where we will continue developing this version. Closing this PR, opening a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants