GH-5447 LMDB Store query performance improvements #5448

hmottestad · 2025-09-22T19:10:44Z

GitHub issue resolved: #5447

Briefly describe the changes proposed in this PR:

Currently a collection of various performance improvements. Still WIP.

PR Author Checklist (see the contributor guidelines for more details):

my pull request is self-contained
I've added tests for the changes I made
I've applied code formatting (you can use mvn process-resources to format from the command line)
I've squashed my commits where necessary
every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

hmottestad · 2025-09-23T05:02:42Z

AGENTS.md

    * `-Dtest=ClassName`
    * `-Dtest=ClassName#method`
-    * `-Dit.test=ITClass#method`
+    * `-Dit.test=ITClassName[#method]`


hmottestad · 2025-09-23T05:04:55Z

...bra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/ArrayBindingSet.java

 	}

+//	@Override
+//	public boolean equals(Object other){


TODO: During GRPUP BY I noticed that ArrayBindingSet could have use for it's own equals method, which can be faster.

We should also look into if hashcode is cached in the super class.

hmottestad · 2025-09-23T05:08:32Z

core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStore.java

 			Txn txn, Resource subj, IRI pred, Value obj, boolean explicit, Resource... contexts) throws IOException {
+		if (!explicit && !mayHaveInferred) {
+			// there are no inferred statements and the iterator should only return inferred statements
+			return EMPTY_ITERATION;


This makes a very big difference when not using inferencing. We can very quickly return an empty iterator without any interaction with the underlying LMDB implementation. I have had a similar optimisation for the Memory Store for a while now and it's been working very well.

hmottestad · 2025-09-23T05:11:31Z

core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/IndexKeyWriters.java

+
+import java.nio.ByteBuffer;
+
+final class IndexKeyWriters {


This class provides hard coded variants of key generation and shouldMatch logic. It's faster to pick a specific method once, instead of having a single method that supports all key combinations using a loop.

hmottestad · 2025-09-23T05:12:30Z

core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbRecordIterator.java

+	public GroupMatcher getGroupMatcher() {
+		if (groupMatcher != null)
+			return groupMatcher;
+		if (matchValues) {
+			this.groupMatcher = index.createMatcher(subj, pred, obj, context);
+		}
+		return groupMatcher;
+	}


Lazy group matcher. Not sure this makes much of a difference.

hmottestad · 2025-09-23T05:15:49Z

core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbStatementIterator.java

-class LmdbStatementIterator extends LookAheadIteration<Statement> {
+class LmdbStatementIterator extends AbstractCloseableIteration<Statement> {


From experience we should try to avoid having these very low level iterators use inheritance. Object orientation, inheritance and polymorphism can be unexpectedly expensive. But not sure if it makes that much of a difference here as it did for the MemoryStore when I changed it there.

hmottestad · 2025-09-23T07:50:07Z

core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/IndexKeyWriters.java

+	static MatcherFactory matcherFactory(String fieldSeq) {
+		switch (fieldSeq) {
+		case "spoc":
+			return IndexKeyWriters::spocShouldMatch;
+		case "spco":
+			return IndexKeyWriters::spcoShouldMatch;
+		case "sopc":
+			return IndexKeyWriters::sopcShouldMatch;


We might benefit from fieldSeq being an enum instead of a string. Not sure if it will make that much of a performance difference though, but might be cleaner code.

hmottestad · 2025-09-23T08:00:11Z