-
Notifications
You must be signed in to change notification settings - Fork 53
Lucene/Solr 9; Java 17; Tomcat 10 #526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This contains the more "interesting" migrations (more than just a removed function or import path) - poms updated to use latest version of jackson/jersey/xml-bind (might miss a runtime dependency still). Servlet-api still on v4 due to packaged jetty server in SOLR being stuck on v4. - implement most simple (oneliner) migrations, should be doublechecked - one remaining compilation error in BLSpanOrQuery that should be solved.
This gives it access to the package-private class used by SpanOrQuery. Might seem like a hack, but these classes are so intertwined with how Lucene works internally that they might as well live in the same package. Arguably most subclasses of BLSpanQuery and BLSpans should be moved to the same package eventually. We might need to still double-check that BLSpanTermQuery and BLSpanMultiTermQueryWrapper are up to date with their Lucene 9 counterparts.
update
1 solr test failing
exclude solr module untill jakarta compatible version is out
# Conflicts: # engine/pom.xml # engine/src/main/java/nl/inl/blacklab/indexers/config/saxon/XPathFinder.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansCaptureRelationsWithinSpan.java
no try catch needed LoggingWatcher
…riment/tomcat-10 # Conflicts: # engine/pom.xml # engine/src/main/java/nl/inl/blacklab/codec/BlackLab40StoredFieldsReader.java # engine/src/main/java/nl/inl/blacklab/search/SingleDocIdFilter.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLConjunctionSpans.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLConjunctionSpansInBuckets.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLFilterDocsSpans.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLSpanMultiTermQueryWrapper.java # engine/src/main/java/nl/inl/blacklab/search/lucene/BLSpanQuery.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQueryAnyToken.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQueryNoHits.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpanQuerySequence.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansAnd.java # engine/src/main/java/nl/inl/blacklab/search/lucene/SpansFiltered.java # engine/src/main/java/nl/inl/blacklab/search/results/HitsInternal.java # engine/src/main/java/nl/inl/util/DocValuesUtil.java # engine/src/main/java/org/apache/lucene/queries/spans/BLSpanOrQuery.java # pom.xml # proxy/jaxb/src/main/java/org/ivdnt/blacklab/proxy/representation/ParsePatternResponse.java # proxy/jaxb/src/main/java/org/ivdnt/blacklab/proxy/representation/SummaryTextPattern.java # solr/pom.xml # solr/src/main/java/org/ivdnt/blacklab/solr/DocSetFilter.java # wslib/src/main/java/nl/inl/blacklab/server/exceptions/BadRequest.java # wslib/src/main/java/nl/inl/blacklab/server/lib/Response.java
correct analyzers version solr.XSLTResponseWriter not found
correct analyzers version solr.XSLTResponseWriter not found
full classname for xsltresponsewriter new lucene version => rebuild index in test resources jakarta needed in solr tests
After making sure BlackLab40Codec uses Lucene87 as the delegate codec (previously it requested the default codec from Lucene, which is obvisouly different in Lucene 9), we still have a problem reading back our custom terms file. The cause seems to be that DataInput/DataOutput have been switched to little-endian. Presumably we need to update our custom codec to explicitly read big-endian, which is how these indexes were written. We should also add a new codec version, e.g. BlackLab41, that will use the new 9.x Lucene codec as delegate (or even adapts, using the default when creating, and recording what delegate codec was used so it can be reinstantiated when reading later). That version of the codec can use little-endian for the custom terms files as well. |
When reading an index written using Lucene 8, we need to make sure we use EndiannessReverserUtil when opening the file, because Lucene 9 switched to little-endian for DataInput/Output.
API documentation was grouped by subject. BLS configuration docs were updated and expanded. Squashed commit of the following: commit 0a0cf33 Author: Jan Niestadt <[email protected]> Date: Wed Jul 9 09:03:52 2025 +0200 Redirect. commit 269459a Author: Jan Niestadt <[email protected]> Date: Tue Jun 24 16:02:39 2025 +0200 Configuration, more. commit 876bb1b Author: Jan Niestadt <[email protected]> Date: Mon Jun 23 14:40:20 2025 +0200 Typo commit 639cdd0 Author: Jan Niestadt <[email protected]> Date: Thu Jun 19 15:34:00 2025 +0200 Work on API docs. commit 8e8125a Author: Jan Niestadt <[email protected]> Date: Wed Jun 18 15:25:34 2025 +0200 Use Docker tag dev instead of latest. commit d95bffe Author: Jan Niestadt <[email protected]> Date: Tue Jun 17 16:27:50 2025 +0200 Restructure API reference. commit c7b1baa Author: Jan Niestadt <[email protected]> Date: Tue Jun 17 13:26:52 2025 +0200 API v4/5.
Values larger than 32K cannot be indexed in Lucene, so we truncate them even if no maxValueLength was set. Created WarnOnce to be able to issue warnings only once during indexing (where a problem may often occur many times, flooding the logs).
EphemeralHit's instance variables had the same name as the getter methods, which could be confusing. Reduced access to package private as well, and only used direct access to variables in HitsInternal classes. In other classes, we use the getter methods. The JVM should normally inline these anyway for hot code.
When you catch this exception and don't re-throw it, you should re-set the thread's interrupted flag so the status is not lost.
|
I've created a new branch |
Thanks to @eduarddrenth for this branch, which updates our previous experiment and solves more issues.
CURRENT STATUS: working, experimental. Will probably be merged in after releasing v4 soon
Old comments: