diff --git a/AGENTS.md b/AGENTS.md index c0961300c37..a175478a36c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,4 +1,4 @@ -AGENTS.md +# AGENTS.md Welcome, AI Agent! Your persistence, curiosity, and craftsmanship make a difference. Take your time, work methodically, validate thoroughly, and iterate. This repository is large and tests can take time — that’s expected and supported. @@ -8,113 +8,141 @@ You need to read the entire AGENTS.md file and follow all instructions exactly. --- -## Read‑Me‑Now: Zero‑Exception Test‑First Rule (Stricter) +## Read‑Me‑Now: Proportional Test‑First Rule (Default) -**You may not touch production code until a smallest‑scope failing automated test exists inside this repo and you have captured its report snippet.** -No exceptions. A user‑provided stack trace or “obvious” contract violation is **not** a substitute for a failing test in the repository. +**Default:** Use **test‑first (TDD)** for any change that alters externally observable behavior. -**Auto‑stop:** If you realize you patched production before creating the failing test, **stop**, revert the patch, and resume from “Reproduce first”. +**Proportional exceptions:** You may **skip writing a new failing test** *only* when **all** Routine B gates (below) pass, or when using Routine C (Spike/Investigate) with **no production code changes**. -**Traceability trio (must appear in your handoff):** +**You may not touch production code for behavior‑changing work until a smallest‑scope failing automated test exists inside this repo and you have captured its report snippet.** A user‑provided stack trace or “obvious” contract violation is **not** a substitute for an in‑repo failing test. + +**Auto‑stop:** If you realize you patched production before creating/observing the failing test for behavior‑changing work, **stop**, revert the patch, and resume from “Reproduce first”. -1. **Preamble** (what you’re about to do + exact commands) -2. **Evidence** (surefire/failsafe snippet from this repo) +**Traceability trio (must appear in your handoff):** +1. **Descritpion** (what you’re about to do) +2. **Evidence** (Surefire/Failsafe snippet from this repo) 3. **Plan** (one and only one `in_progress` step) It is illegal to `-am` when running tests! It is illegal to `-q` when running tests! +> **Clarification:** For **strictly behavior‑neutral refactors** that are already **fully exercised by existing tests**, or for **bugfixes with an existing failing test**, you may use **Routine B — Change without new tests**. In that case you must capture **pre‑change passing evidence** at the smallest scope that hits the code you’re about to edit, prove **Hit Proof**, then show **post‑change passing evidence** from the **same selection**. +> **No exceptions for any behavior‑changing change** — for those, you must follow **Routine A — Full TDD**. + --- -## Purpose & Contract +## Three Routines: Choose Your Path -* **Bold goal:** deliver correct, minimal, well‑tested changes with clear handoff. No monkey‑patching or band‑aid fixes — always fix the underlying problem at its source. -* **Bias to action:** when inputs are ambiguous, choose a reasonable path, state assumptions, and proceed. -* **Ask only when blocked or irreversible:** escalate only if truly blocked (permissions, missing deps, conflicting requirements) or if a choice is high‑risk/irreversible. -* **Definition of Done** +**Routine A — Full TDD (Default)** +**Routine B — Change without new tests (Proportional, gated)** +**Routine C — Spike/Investigate (No production changes)** - * Code formatted and imports sorted. - * Compiles with a quick profile / targeted modules. - * Relevant module tests pass; failures triaged or crisply explained. - * Only necessary files changed; headers correct for new files. - * Clear final summary: what changed, why, where, how verified, next steps. - * **Evidence present:** failing test output (pre‑fix) and passing output (post‑fix) are both shown. +### Decision quickstart -### No Monkey‑Patching or Band‑Aid Fixes (Non‑Negotiable) +1. **Is new externally observable behavior required?** + → **Yes:** **Routine A (Full TDD)**. Add the smallest failing test first. + → **No:** continue. -This repository requires durable, root‑cause fixes. Superficial changes that mask symptoms, mute tests, or add ad‑hoc toggles are not acceptable. +2. **Does a failing test already exist in this repo that pinpoints the issue?** + → **Yes:** **Routine B (Bugfix using existing failing test).** + → **No:** continue. -What this means in practice +3. **Is the edit strictly behavior‑neutral, local in scope, and clearly hit by existing tests?** + → **Yes:** **Routine B (Refactor/micro‑perf/documentation/build).** + → **No or unsure:** continue. -* Find and fix the root cause in the correct layer/module. -* Add or adjust targeted tests that fail before the fix and pass after. -* Keep changes minimal and surgical; do not widen APIs/configs to “make tests green”. -* Maintain consistency with existing style and architecture; prefer refactoring over hacks. +4. **Is this purely an investigation/design spike with no production code changes?** + → **Yes:** **Routine C (Spike/Investigate).** + → **No or unsure:** **Routine A.** -Strictly avoid +**When in doubt, choose Routine A (Full TDD).** Ambiguity is risk; tests are insurance. -* Sleeping/timeouts to hide race conditions or flakiness. -* Broad catch‑and‑ignore or logging‑and‑continue of exceptions. -* Muting, deleting, or weakening assertions in tests to pass builds. -* Reflection or internal state manipulation to bypass proper interfaces. -* Feature flags/toggles that disable validation or logic instead of fixing it. -* Changing public APIs or configs without necessity and clear rationale tied to the root cause. +--- -Preferred approach (fast and rigorous) +## Proportionality Model (Think before you test) -* Reproduce the issue and isolate the smallest failing test (class → method). -* Trace to the true source; fix it in the right module. -* Add focused tests covering the behavior and any critical edge cases. -* Run tight, targeted verifies for the impacted module(s) and broaden scope only if needed. +Score the change on these lenses. If any are **High**, prefer **Routine A**. -Review bar and enforcement +- **Behavioral surface:** affects outputs, serialization, parsing, APIs, error text, timing/order? +- **Blast radius:** number of modules/classes touched; public vs internal. +- **Reversibility:** quick revert vs migration/data change. +- **Observability:** can existing tests or assertions expose regressions? +- **Coverage depth:** do existing tests directly hit the edited code? +- **Concurrency / IO / Time:** any risk here is **High** by default. -* Treat this policy as a blocking requirement. Changes that resemble workarounds will be rejected. -* Your final handoff must demonstrate: failing test before the fix, explanation of the root cause, minimal fix at source, and passing targeted tests after. +--- + +## Purpose & Contract + +* **Bold goal:** deliver correct, minimal, well‑tested changes with clear handoff. Fix root causes; avoid hacks. +* **Bias to action:** when inputs are ambiguous, choose a reasonable path, state assumptions, and proceed. +* **Ask only when blocked or irreversible:** permissions, missing deps, conflicting requirements, destructive repo‑wide changes. +* **Definition of Done** + * Code formatted and imports sorted. + * Compiles with a quick profile / targeted modules. + * Relevant module tests pass; failures triaged or crisply explained. + * Only necessary files changed; headers correct for new files. + * Clear final summary: what changed, why, where, how verified, next steps. + * **Evidence present:** failing test output (pre‑fix) and passing output (post‑fix) are shown for Routine A; for Routine B show **pre/post green** from the **same selection** plus **Hit Proof**. + +### No Monkey‑Patching or Band‑Aid Fixes (Non‑Negotiable) + +Durable, root‑cause fixes only. No muting tests, no broad catch‑and‑ignore, no widening APIs “to make green”. + +**Strictly avoid** +* Sleeping/timeouts to hide flakiness. +* Swallowing exceptions or weakening assertions. +* Reflection/internal state manipulation to bypass interfaces. +* Feature flags that disable validation instead of fixing logic. +* Changing public APIs/configs without necessity tied to root cause. + +**Preferred approach** +* Reproduce the issue and isolate the smallest failing test (class → method). +* Trace to the true source; fix in the right module. +* Add focused tests for behavior/edge cases (Routine A) or prove coverage/neutrality (Routine B). +* Run tight, targeted verifies; broaden only if needed. --- -## Enforcement & Auto‑Fail Triggers +## Enforcement & Auto‑Fail Triggers -Your run is **invalid** and must be restarted from “Reproduce first” if any of the following occur: +Your run is **invalid** and must be restarted from “Reproduce first” if any occur: -* You modify production code before adding and running the smallest failing test in this repo. -* You proceed without pasting a surefire/failsafe report snippet from `target/*-reports/`. +* You modify production code before adding and running the smallest failing test in this repo **for behavior‑changing work**. +* You proceed without pasting a Surefire/Failsafe report snippet from `target/*-reports/`. * Your plan does not have **exactly one** `in_progress` step. * You run tests using `-am` or `-q`. * You treat a narrative failure description or external stack trace as equivalent to an in‑repo failing test. +* **Routine B specific:** you cannot demonstrate that existing tests exercise the edited code (**Hit Proof**), or you fail to capture both pre‑ and post‑change **matching** passing snippets from the same selection. +* **Routine C breach:** you change production code while in a spike. **Recovery procedure:** -Update the plan (`in_progress: create failing test`), post a preamble, create the failing test, run it, capture the report snippet, then resume. +Update the plan (`in_progress: create failing test`), post a description of your next step, create the failing test, run it, capture the report snippet, then resume. +For Routine B refactors: if any gate fails, **switch to Full TDD** and add the smallest failing test. --- -## Preamble & Evidence Protocol (Mandatory) - -Before any grouped actions (builds, tests, patches), post a **short preamble**: - -**Preamble template** - -``` -Preamble: Reproduce bug at smallest scope. -Module: -Commands: - mvn -o -pl -Dtest=Class#method verify | tail -500 -Expectation: test fails with the reported error. -``` +## Evidence Protocol (Mandatory) -After each grouped action, post an **Evidence block**: +After each grouped action, post an **Evidence block**, then continue working: **Evidence template** - ``` Evidence: Command: mvn -o -pl -Dtest=Class#method verify Report: /target/surefire-reports/.txt Snippet: - +\ ``` +**Routine B additions** +* **Pre‑green:** capture a pre‑change **passing** snippet from the **most specific** test selection that hits your code (ideally a class or method). +* **Hit Proof (choose one):** + * An existing test class/method that directly calls the edited class/method, plus a short `rg -n` snippet showing the call site; **or** + * A Surefire/Failsafe output line containing the edited class/method names; **or** + * A temporary assertion or deliberate, isolated failing check in a **scratch test** proving the path is executed (then remove). +* **Post‑green:** after the patch, re‑run the **same selection** and capture a passing snippet. + --- ## Living Plan Protocol (Sharper) @@ -122,17 +150,19 @@ Snippet: Maintain a **living plan** with checklist items (5–7 words each). Keep **exactly one** `in_progress`. **Plan format** - ``` + Plan -- [done] sanity build quick profile -- [in_progress] add smallest failing test -- [todo] minimal root-cause fix -- [todo] rerun focused then module tests -- [todo] format, verify, summary -``` -**Rule:** If you deviate, update the plan **first** (switch `in_progress`), then proceed. Do not let plan and actions drift out of sync. +* \[done] sanity build quick profile +* \[in\_progress] add smallest failing test +* \[todo] minimal root-cause fix +* \[todo] rerun focused then module tests +* \[todo] format, verify, summary + +```` + +**Rule:** If you deviate, update the plan **first**, then proceed. --- @@ -140,22 +170,18 @@ Plan * **JDK:** 11 (minimum). The project builds and runs on Java 11+. * **Maven default:** run **offline** using `-o` whenever possible. -* **Network:** only when needed to fetch missing deps/plugins; then rerun the exact command **without** `-o` once, and return to offline. -* **Large project:** some module test suites can take **5–10 minutes**. Be patient, but bias toward **targeted** runs to keep momentum. +* **Network:** only to fetch missing deps/plugins; then rerun once without `-o`, and return offline. +* **Large project:** some module test suites can take **5–10 minutes**. Prefer **targeted** runs. ### Maven `-am` usage (house rule) -`-am` (also-make) pulls in required upstream modules. That’s helpful for **compiles**, but hazardous for **tests**: Maven will advance included modules to the same lifecycle phase and run **their** tests too. +`-am` is helpful for **compiles**, hazardous for **tests**. -**Rule of thumb** - -* ✅ Use `-am` **only** for compile/verify with tests skipped (e.g. `-Pquick`).: - - * `mvn -o -pl -am -Pquick install` +* ✅ Use `-am` **only** for compile/verify with tests skipped (e.g. `-Pquick`): + * `mvn -o -pl -am -Pquick install` * ❌ Do **not** use `-am` with `verify` when tests are enabled. **Two-step pattern (fast + safe)** - 1. **Compile deps fast (skip tests):** `mvn -o -pl -am -Pquick install` 2. **Run tests:** @@ -171,119 +197,125 @@ It is illegal to `-q` when running tests! The Maven reactor resolves inter-module dependencies from the local Maven repository (`~/.m2/repository`). Running `install` publishes your changed modules there so downstream modules and tests pick up the correct versions. -- Always run `mvn -o -Pquick install | tail -200` before you start working. This command typically takes between 10 and 30 seconds. -- Always run `mvn -o -pl -am -Pquick install | tail -200` before any `verify` or test runs. -- If offline resolution fails due to a missing dependency or plugin, rerun the exact `install` command once without `-o`, then return offline. -- Skipping this step can lead to stale or missing artifacts during tests, producing confusing compilation or linkage errors. -- Never ever change the repo location. Never use `-Dmaven.repo.local=.m2_repo`. Instead, ask for permission the first time you run `mvn -o -Pquick install | tail -200`. +* Always run `mvn -o -Pquick install | tail -200` before you start working. This command typically takes up to 30 seconds. Never use a small timeout than 30,000 ms. +* Always run `mvn -o -Pquick install | tail -200` before any `verify` or test runs. +* If offline resolution fails due to a missing dependency or plugin, rerun the exact `install` command once without `-o`, then return offline. +* Skipping this step can lead to stale or missing artifacts during tests, producing confusing compilation or linkage errors. +* Never ever change the repo location. Never use `-Dmaven.repo.local=.m2_repo`. +* Always try to run these commands first to see if they run without needing any approvals from the user w.r.t. the sandboxing. --- ## Quick Start (First 10 Minutes) 1. **Discover** - - * List modules: inspect root `pom.xml` (aggregator) and the module tree (see “Maven Module Overview” below). - * Search fast with ripgrep: `rg -n ""` + * Inspect root `pom.xml` and module tree (see “Maven Module Overview”). + * Search fast with ripgrep: `rg -n ""` 2. **Build sanity (fast, skip tests)** - - * **Preferred:** `mvn -o -Pquick install | tail -200` - * **Alternative:** `mvn -o -Pquick install | tail -200` - * This step is required before any tests. It installs artifacts to `~/.m2` so the reactor resolves fresh inter-module dependencies. + * `mvn -o -Pquick install | tail -200` 3. **Format (Java, imports, XML)** - - * `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` + * `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` 4. **Targeted tests (tight loops)** - - * By module: `mvn -o -pl verify | tail -500` - * Single class: `mvn -o -pl -Dtest=ClassName verify | tail -500` - * Single method: `mvn -o -pl -Dtest=ClassName#method verify | tail -500` - * Prerequisite: ensure `mvn -o -Pquick install` (root or `-pl -am`) has just run so artifacts are available in `~/.m2`. + * Module: `mvn -o -pl verify | tail -500` + * Class: `mvn -o -pl -Dtest=ClassName verify | tail -500` + * Method: `mvn -o -pl -Dtest=ClassName#method verify | tail -500` 5. **Inspect failures** - - * **Unit (Surefire):** `/target/surefire-reports/` - * **IT (Failsafe):** `/target/failsafe-reports/` + * **Unit (Surefire):** `/target/surefire-reports/` + * **IT (Failsafe):** `/target/failsafe-reports/` It is illegal to `-am` when running tests! It is illegal to `-q` when running tests! --- -## Bugfix Workflow (Mandatory) +## Routine A — Full TDD (Default) -* **Reproduce first:** write the smallest focused test (class/method) that reproduces the reported bug **inside this repo**. Run it and confirm it fails with the same error/stacktrace. **Do not proceed without this.** -* **Keep the test as‑is:** do not weaken assertions or mute the failure. The failing test is your proof you’ve hit the right code path. -* **Fix at the root:** implement the minimal, surgical change in the correct module that addresses the underlying cause (no band‑aids). -* **Verify locally:** re‑run the focused test, then the surrounding module’s tests. Use targeted Maven invocations (class/method → module). Avoid `-am` with tests. -* **Broaden if needed:** only after green targeted runs, expand scope to neighboring modules when changes cross boundaries. -* **Document clearly:** in your final handoff, show the failing test before the fix, the root cause, the minimal fix, and passing tests after. Include **preamble** and **evidence** blocks. +> Use for **all behavior‑changing work** and whenever Routine B gates do not all pass. -### Hard Gates (Do Not Proceed Unless True) +### Bugfix Workflow (Mandatory) -* A failing test exists at the smallest scope (method/class) reproducing the report. +* **Reproduce first:** write the smallest focused test (class/method) that reproduces the reported bug **inside this repo**. Confirm it fails. +* **Keep the test as‑is:** do not weaken assertions or mute the failure. +* **Fix at the root:** minimal, surgical change in the correct module. +* **Verify locally:** re‑run the focused test, then the module’s tests. Avoid `-am`/`-q` with tests. +* **Broaden if needed:** expand scope only after targeted greens. +* **Document clearly:** failing output (pre‑fix), root cause, minimal fix, passing output (post‑fix). - * Show the failing command and include a snippet of the error/stack from `target/*-reports/`. +### Hard Gates + +* A failing test exists at the smallest scope (method/class). * **No production patch before the failing test is observed and recorded.** * Test runs avoid `-am` and `-q`. - * Use `-am` only with `-Pquick` to compile deps with tests skipped, then run tests without `-am`. -* Maintain a living plan with exactly one `in_progress` step; send a short preamble before long actions. +--- -### Required Sequence +## Routine B — Change without new tests (Proportional, gated) -1. **Reproduce first** +> Use **only** when at least one Allowed Case applies **and** all Routine B **Gates** pass. - * Add the smallest failing test in the correct module. - * Run it directly: `mvn -o -pl -Dtest=Class#method verify | tail -500` - * Inspect `target/surefire-reports/` (or `target/failsafe-reports/`) and capture the failure. -2. **Fix at the root (minimal, surgical)** +### Allowed cases (one or more) +1. **Bugfix with existing failing test** in this repo (pinpoints class/method). +2. **Strictly behavior‑neutral refactor / cleanup / micro‑perf** with clear existing coverage hitting the edited path. +3. **Migration/rename/autogen refresh** where behavior is already characterized by existing tests. +4. **Build/CI/docs/logging/message changes** that do not alter runtime behavior or asserted outputs. +5. **Data/resource tweaks** not asserted by tests and not affecting behavior. - * Change the correct layer; avoid widening APIs/configs. -3. **Verify locally (tight loops)** +### Routine B Gates (all must pass) +- **Neutrality/Scope:** No externally observable behavior change. Localized edit. +- **Hit Proof:** Demonstrate tests exercise the edited code. +- **Pre/Post Green Match:** Same smallest‑scope selection, passing before and after. +- **Risk Check:** No concurrency/time/IO semantics touched; no public API, serialization, parsing, or ordering changes. +- **Reversibility:** Change is easy to revert if needed. - * Re-run the exact test selection; then run the whole module. -4. **Broaden only if necessary** +**If any gate fails → switch to Routine A.** - * Expand scope when changes cross module boundaries or neighbors fail. -5. **Document clearly** +--- - * Include: failing output (pre‑fix), root cause, minimal fix, passing output (post‑fix). +## Routine C — Spike / Investigate (No production changes) -### Quick Self‑Check Before First Code Patch +> Use for exploration, triage, design spikes, and measurement. **No production code edits.** -1. Do I have a failing test and its report snippet saved? -2. Am I using legal Maven flags for tests (no `-am`, no `-q`)? -3. Is my next step in the plan marked `in_progress` and did I state a preamble? -4. Is my fix located at the correct source of truth, not a workaround? +**You may:** +- Add temporary scratch tests, assertions, scripts, or notes. +- Capture measurements, traces, logs. ---- +**Hand‑off must include:** +- Description, commands, and artifacts (logs/notes). +- Findings, options, and a proposed next routine (A or B). +- Removal of any temporary code if not adopted. -## Working Loop - -* **Plan** +--- - * Break task into **small, verifiable steps**; keep one step in progress. - * Announce a short preamble before long actions (builds/tests). - * Decide and proceed autonomously; document assumptions inline. -* **Change** +## Where to Draw the Line — A Short Debate - * Make minimal, surgical edits. Keep style and structure consistent. -* **Format** +> **Purist:** “All changes must start with a failing test.” +> **Pragmatist:** “For refactors that can’t fail first without faking it, prove coverage and equality of behavior.” - * `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` -* **Compile (fast)** +**In‑scope for Routine B (examples)** +* Rename private methods; extract helper; dead‑code removal. +* Replace straightforward loop with stream (same results, same ordering). +* Tighten generics/nullability/annotations without observable change. +* Micro‑perf cache within a method with deterministic inputs and strong coverage. +* Logging/message tweaks **not** asserted by tests. +* Build/CI config that doesn’t alter runtime behavior. - * **Iterate locally:** `mvn -o -pl -am -Pquick install | tail -500` -* **Test** +**Out‑of‑scope (use Routine A)** +* Changing query results, serialization, or parsing behavior. +* Altering error messages that tests assert. +* Anything touching concurrency, timeouts, IO, or ordering. +* New SPARQL function support or extended syntax (even “tiny”). +* Public API changes or cross‑module migrations with unclear blast radius. - * Start with the smallest scope that exercises your change (class → module). - * For integration‑impacted changes, run module `verify` (includes ITs). -* **Triage** +--- - * Read reports; fix root cause; expand scope **only when needed**. -* **Iterate** +## Working Loop - * Keep moving without waiting for permission between steps. Escalate only at blocking points. - * Repeat until **Definition of Done** is satisfied. +* **Plan:** small, verifiable steps; keep one `in_progress`. +* **Change:** minimal, surgical edits; keep style/structure consistent. +* **Format:** `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` +* **Compile (fast):** `mvn -o -pl -am -Pquick install | tail -500` +* **Test:** start smallest (class/method → module). For integration, run module `verify`. +* **Triage:** read reports; fix root cause; expand scope only when needed. +* **Iterate:** keep momentum; escalate only when blocked or irreversible. It is illegal to `-am` when running tests! It is illegal to `-q` when running tests! @@ -293,174 +325,66 @@ It is illegal to `-q` when running tests! ## Testing Strategy * **Prefer module tests you touched:** `-pl ` -* **Narrow further** to a class/method for tight loops; then broaden to the module. -* **Expand scope** when: - - * Your change crosses module boundaries, or - * Neighbor module failures indicate integration impact. +* **Narrow further** to a class/method; then broaden to the module. +* **Expand scope** when changes cross boundaries or neighbor modules fail. * **Read reports** - - * Surefire (unit): `target/surefire-reports/` - * Failsafe (IT): `target/failsafe-reports/` + * Surefire (unit): `target/surefire-reports/` + * Failsafe (IT): `target/failsafe-reports/` * **Helpful flags** - - * `-Dtest=Class#method` (unit selection) - * `-Dit.test=ITClass#method` (integration selection) - * `-DtrimStackTrace=false` (full traces) - * `-DskipITs` (focus on unit tests) - * `-DfailIfNoTests=false` (when selecting a class that has no tests on some platforms) + * `-Dtest=Class#method` (unit selection) + * `-Dit.test=ITClass#method` (integration selection) + * `-DtrimStackTrace=false` (full traces) + * `-DskipITs` (focus on unit tests) + * `-DfailIfNoTests=false` (when selecting a class that has no tests on some platforms) ### Optional: Redirect test stdout/stderr to files - -To help automated agents inspect what a test printed to the console, you may redirect `System.out`/`System.err` to per‑class files generated by Surefire/Failsafe. This is **optional** and should be used only when it aids triage—house rules still apply (no `-am` with tests, no `-q`). - -**Unit tests (Surefire):** - ```bash mvn -o -pl -Dtest=ClassName[#method] -Dmaven.test.redirectTestOutputToFile=true verify | tail -500 -``` +```` -Logs will appear under: +Logs under: ``` /target/surefire-reports/ClassName-output.txt ``` -**Integration tests (Failsafe):** - -```bash -mvn -o -pl -Dit.test=ITClassName[#method] -Dmaven.test.redirectTestOutputToFile=true verify | tail -500 -``` - -Logs will appear under: - -``` -/target/failsafe-reports/ITClassName-output.txt -``` - -Notes: -* Capture is **per test class**, not per method. Multiple methods in the same class share one `*-output.txt`. -* Only output actually written to the console is captured. If your logging configuration writes solely to files, you won’t see it here. -* Continue to include the normal **Evidence** snippet from the Surefire/Failsafe report. You may additionally quote lines from the corresponding `*-output.txt` when useful for debugging. +(Use similarly for Failsafe via `-Dit.test=`.) --- ## Assertions: Make invariants explicit -Assertions are executable claims about what must be true. They’re the fastest way to surface “impossible” states and to localize bugs at the line that crossed a boundary it had no business crossing. Use them both as **temporary tripwires** during investigation and as **permanent contracts** once an invariant is known to matter. - -**Two useful flavors** - -* **Temporary tripwires (debug asserts):** Add while hunting a failing test or weird behavior. Keep them cheap, contextual, and local to the suspect path. Remove after the mystery is solved **or** convert to permanent checks if the invariant is genuinely important. -* **Permanent contracts:** Encode **preconditions** (valid inputs), **postconditions** (valid outputs), and **invariants** (state that must always hold). These stay and prevent regressions. - -**Where to add assertions** +Assertions are executable claims about what must be true. Use **temporary tripwires** during investigation and **permanent contracts** once an invariant matters. -* At **module boundaries** and **after parsing/external calls** (validate assumptions about returned/decoded data). -* Around **state transitions** (illegal transitions should fail loudly). -* In **concurrency hotspots** (e.g., “lock must be held”, “no concurrent mutation”). -* Before/after **caching, batching, or memoization** (keys, sizes, ordering, monotonicity). -* For **exhaustive enums** in `switch` statements (treat unexpected values as hard errors). - -**How to write good assertions** - -* One fact per assert. Fail **fast**, fail **usefully**. -* Include **stable context** in the message (ids, sizes, states) so the failure is self‑explanatory. -* Avoid side effects in the condition or message. Assertions may be disabled in some runtimes. -* Keep them **cheap**: no I/O, heavy allocations, or deep logging in the message. -* Don’t use asserts for **user‑facing validation**. Raise exceptions for expected bad inputs. +* One fact per assert; fail fast and usefully. +* Include stable context in messages; avoid side effects. +* Keep asserts cheap; don’t replace user input validation with asserts. **Java specifics** -* **Enable VM assertions in tests.** Tests must run with `-ea` so `assert` is active. -* Use **`assert`** for debug‑only invariants that “cannot happen.” Use **exceptions** for runtime guarantees: - - * Preconditions: `IllegalArgumentException` / `Objects.requireNonNull` (or Guava `Preconditions` if present). - * Invariants: `IllegalStateException`. -* Prefer treating unexpected enum values as **hard errors** rather than adding a quiet `default` path. - -**Concrete examples** - -Precondition (permanent) - -```java -void setPort(int port) { - if (port < 1 || port > 65_535) { - throw new IllegalArgumentException("port out of range: " + port); - } - this.port = port; -} -``` - -Invariant (permanent) - -```java -void advance(State next) { - if (!allowedTransitions.get(state).contains(next)) { - throw new IllegalStateException("Illegal transition " + state + " → " + next); - } - state = next; -} -``` - -Debug tripwire (temporary; remove or convert later) - -```java -// Narrow a flaky failure around ordering -assert isSorted(results) : "unsorted results, size=" + results.size() + " ids=" + ids(results); -``` - -Unreachable (hard error) - -```java -switch (kind) { - case A: return handleA(); - case B: return handleB(); - default: - throw new IllegalStateException("Unhandled kind: " + kind); -} -``` +* Enable VM assertions in tests (`-ea`). +* Use exceptions for runtime guarantees; `assert` for “cannot happen”. -Concurrency assumption - -```java -synchronized void put(String k, String v) { - assert Thread.holdsLock(this) : "put must hold instance monitor"; - // ... -} -``` - -House rule: Asserts are allowed and encouraged. Removing or weakening an assertion to “make it pass” is strictly forbidden — fix the cause, not the guardrail. +(Concrete examples omitted here for brevity; keep your current patterns.) --- ## Triage Playbook -* **Missing dep/plugin offline** - - * Remedy: **rerun the exact command without `-o`** once to fetch; then return offline. -* **Compilation errors** - - * Fix imports, generics, visibility; re‑run quick install (skip tests) in the **module**. -* **Flaky/slow tests** - - * Run the specific failing test; read its report; stabilize root cause before broad runs. -* **Formatting failures** - - * Run formatter/import/XML sort; re‑verify. -* **License header missing** - - * Add header for **new** files only (see “Source File Headers”); **do not** change years on existing files. +* **Missing dep/plugin offline:** rerun the exact command once **without** `-o`, then return offline. +* **Compilation errors:** fix imports/generics/visibility; quick install in the module. +* **Flaky/slow tests:** run the specific failing test; stabilize root cause before broad runs. +* **Formatting failures:** run formatter/import/XML sort; re‑verify. +* **License header missing:** add for **new** files only; do not change years on existing files. --- ## Code Formatting -* **Always run before finalizing:** +* Always run before finalizing: * `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` -* **Style:** no wildcard imports; 120‑char width; curly braces always; LF line endings. -* **Tip:** formatting/import sort may be validated during `verify`. Running the commands proactively avoids CI/style failures. +* Style: no wildcard imports; 120‑char width; curly braces always; LF endings. --- @@ -481,8 +405,6 @@ Use this exact header for **new Java files only** (replace `${year}` with curren *******************************************************************************/ ``` -Use this exact header. Be very precise. - Do **not** modify existing headers’ years. --- @@ -491,50 +413,35 @@ Do **not** modify existing headers’ years. * **Format:** `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` * **Compile (fast path):** `mvn -o -Pquick install | tail -200` -* **Tests (targeted):** `mvn -o -pl verify | tail -500` (broaden scope if needed) -* **Reports:** zero new failures in `target/surefire-reports/` or `target/failsafe-reports/`, or explain precisely. -* **Evidence:** include pre‑fix failing snippet and post‑fix passing summary. +* **Tests (targeted):** `mvn -o -pl verify | tail -500` (broaden as needed) +* **Reports:** zero new failures in Surefire/Failsafe, or explain precisely. +* **Evidence:** Routine A — failing pre‑fix + passing post‑fix. + Routine B — **pre/post green** from same selection + **Hit Proof**. --- ## Branching & Commit Conventions -- Branch names: start with `GH-XXXX` where `XXXX` is the GitHub issue number. Prefer a short, kebab‑case slug after the number when helpful, e.g., `GH-1234-add-trig-writer-check`. -- Commit messages: start with the same prefix, `GH-XXXX `, on every commit in the branch. -- Keep summaries concise, in imperative mood (e.g., “Fix NPE in TriG writer”). -- Example: - - Branch: `GH-1234-add-shacl-validation-metric` - - Commit: `GH-1234 Fix NPE when serializing empty graph` +* Branch names: start with `GH-XXXX` (GitHub issue number). Optional short slug, e.g., `GH-1234-trig-writer-check`. +* Commit messages: `GH-XXXX ` on every commit. --- ## Branch & PR Workflow (Agent) -- Confirm issue number first (mandatory): before creating a branch, pause and request/confirm the GitHub issue number. Do not proceed to branch creation until the issue number is provided or confirmed. -- Name branch: `GH--` (kebab‑case slug). -- Create branch: `git checkout -b GH-XXXX-your-slug`. -- Stage changes: `git add -A` (ensure new Java files have the required header). -- Optional but recommended: run format + quick install. - - `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` - - `mvn -o -Pquick install | tail -200` -- Commit: `git commit -m "GH-XXXX "`. -- Push branch: `git push -u origin GH-XXXX-your-slug`. -- Create PR using default template: - - Preferred: `gh pr create --title "GH-XXXX " --body-file .github/pull_request_template.md` - - Fallback: `gh pr create --title "GH-XXXX " --body "$(cat .github/pull_request_template.md)"` -- Immediately fill the template (do not leave placeholders): - - Set `GitHub issue resolved: #XXXX`. - - Write a short, accurate change summary (what/why). - - Tick applicable checklist items only (self-contained, tests, squashed, commit message prefix, formatting if run). - - Include `Fixes #XXXX` to auto-close the issue on merge. -- Target the repo default branch (e.g., `origin/HEAD`). +* Confirm issue number first (mandatory). +* Branch: `git checkout -b GH-XXXX-your-slug` +* Stage: `git add -A` (ensure new Java files have the required header). +* Optional: formatter + quick install. +* Commit: `git commit -m "GH-XXXX "` +* Push & PR: use the default template; fill all fields; include `Fixes #XXXX`. --- ## Navigation & Search -* Fast file search: `rg --files` -* Fast content search: `rg -n ""` +* Files: `rg --files` +* Content: `rg -n ""` * Read big files in chunks: * `sed -n '1,200p' path/to/File.java` @@ -544,29 +451,17 @@ Do **not** modify existing headers’ years. ## Autonomy Rules (Act > Ask) -* **Default:** act with assumptions. Document assumptions in your plan and final answer. -* **Keep going:** chain steps without waiting for permission; send short progress updates before long actions. -* **Ask only when:** - - * Blocked by sandbox/approvals/network policy or missing secrets. - * The decision is destructive/irreversible, repo‑wide, or impacts public APIs. - * Adding dependencies, changing build profiles, or altering licensing. -* **Prefer reversible moves:** take the smallest local change that unblocks progress; validate with targeted tests before expanding scope. -* **Choose defaults** - - * **Tests:** start with `-pl `, then `-Dtest=Class#method` / `-Dit.test=ITClass#method`. - * **Build:** use `-o` quick/profiled commands; briefly drop `-o` to fetch missing deps, then return offline. - * **Formatting:** run formatter/impsort/xml‑format proactively before verify. - * **Reports:** read surefire/failsafe locally; expand scope only when necessary. -* **Error handling** +* **Default:** act with assumptions; document them. +* **Keep going:** chain steps; short progress updates before long actions. +* **Ask only when:** blocked by sandbox/approvals/network, or change is destructive/irreversible, or impacts public APIs/dependencies/licensing. +* **Prefer reversible moves:** smallest local change that unblocks progress; validate with targeted tests first. - * On compile/test failure: fix root cause locally, rerun targeted tests, then broaden. - * On flaky tests: rerun class/method; stabilize cause before repo‑wide runs. - * On formatting/license issues: apply prescribed commands/headers immediately. -* **Communication** +**Defaults** - * **Preambles:** 1–2 sentences grouping upcoming actions. - * **Updates:** inform to maintain visibility; do **not** request permission unless in “Ask only when” above. +* **Tests:** start with `-pl `, then `-Dtest=Class#method` / `-Dit.test=ITClass#method`. +* **Build:** use `-o`; drop `-o` once only to fetch; return offline. +* **Formatting:** run formatter/import/XML before verify. +* **Reports:** read surefire/failsafe locally; expand scope only when necessary. --- @@ -576,36 +471,31 @@ Do **not** modify existing headers’ years. * **Files touched:** list file paths. * **Commands run:** key build/test commands. * **Verification:** which tests passed, where you checked reports. -* **Evidence:** failing output (pre‑fix) and passing output (post‑fix) snippets. -* **Assumptions:** key assumptions and autonomous decisions you made. +* **Evidence:** + *Routine A:* failing output (pre‑fix) and passing output (post‑fix). + *Routine B:* pre‑ and post‑green snippets from the **same selection** + **Hit Proof**. + *Routine C:* artifacts from investigation (logs/notes/measurements) and proposed next steps. +* **Assumptions:** key assumptions and autonomous decisions. * **Limitations:** anything left or risky edge cases. -* **Next steps:** optional suggestions for follow‑ups. +* **Next steps:** optional follow‑ups. --- ## Running Tests -* By module: - - * `mvn -o -pl core/sail/shacl verify | tail -500` -* Entire repo: - - * `mvn -o verify` (long; only when appropriate) +* By module: `mvn -o -pl core/sail/shacl verify | tail -500` +* Entire repo: `mvn -o verify` (long; only when appropriate) * Slow tests (entire repo): - - * `mvn -o verify -PslowTestsOnly,-skipSlowTests | tail -500` + `mvn -o verify -PslowTestsOnly,-skipSlowTests | tail -500` * Slow tests (by module): - - * `mvn -o -pl verify -PslowTestsOnly,-skipSlowTests | tail -500` + `mvn -o -pl verify -PslowTestsOnly,-skipSlowTests | tail -500` * Slow tests (specific test): * `mvn -o -pl core/sail/shacl -PslowTestsOnly,-skipSlowTests -Dtest=ClassName#method verify | tail -500` * Integration tests (entire repo): - - * `mvn -o verify -PskipUnitTests | tail -500` + `mvn -o verify -PskipUnitTests | tail -500` * Integration tests (by module): - - * `mvn -o -pl verify -PskipUnitTests | tail -500` + `mvn -o -pl verify -PskipUnitTests | tail -500` * Useful flags: * `-Dtest=ClassName` @@ -618,23 +508,20 @@ Do **not** modify existing headers’ years. ## Build * **Build without tests (fast path):** - - * `mvn -o -Pquick install` + `mvn -o -Pquick install` * **Verify with tests:** - - * Targeted module(s): `mvn -o -pl verify` - * Entire repo: `mvn -o verify` (use only when appropriate) + Targeted module(s): `mvn -o -pl verify` + Entire repo: `mvn -o verify` (use judiciously) * **When offline fails due to missing deps:** - - * Re‑run the **exact** command **without** `-o` once to fetch, then return to `-o`. + Re‑run the **exact** command **without** `-o` once to fetch, then return to `-o`. --- -## Prohibited Misinterpretations +## Prohibited Misinterpretations -* A user stack trace, reproduction script, or verbal description **is not evidence**. You must implement the smallest failing test **inside this repo**. -* “Obvious” contract violations (e.g., iterator returns `null`) still require a failing test first. Assumptions are not substitutes for evidence. -* “Quick fixes” are not quick if they bypass the workflow. They create audit gaps and regressions. +* A user stack trace, reproduction script, or verbal description **is not evidence** for behavior‑changing work. You must implement the smallest failing test **inside this repo**. +* For Routine B, a stack trace is neither required nor sufficient; **Hit Proof** plus **pre/post green** snippets are mandatory. +* Routine C must not change production code. --- diff --git a/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/CloseableIteration.java b/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/CloseableIteration.java index 15890fe2aa7..6023a7eb0c0 100644 --- a/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/CloseableIteration.java +++ b/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/CloseableIteration.java @@ -14,6 +14,8 @@ import java.util.Iterator; import java.util.stream.Stream; +import org.eclipse.rdf4j.model.Statement; + /** * An {@link CloseableIteration} that can be closed to free resources that it is holding. CloseableIterations * automatically free their resources when exhausted. If not read until exhaustion or if you want to make sure the @@ -33,6 +35,8 @@ */ public interface CloseableIteration extends Iterator, AutoCloseable { + EmptyIteration EMPTY_STATEMENT_ITERATION = new EmptyIteration<>(); + /** * Convert the results to a Java 8 Stream. * diff --git a/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/DualUnionIteration.java b/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/DualUnionIteration.java index 40a95ec76bd..2a490e43f2a 100644 --- a/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/DualUnionIteration.java +++ b/core/common/iterator/src/main/java/org/eclipse/rdf4j/common/iteration/DualUnionIteration.java @@ -50,7 +50,11 @@ public DualUnionIteration(Comparator cmp, public static CloseableIteration getWildcardInstance( CloseableIteration leftIteration, CloseableIteration rightIteration) { - if (rightIteration instanceof EmptyIteration) { + if (leftIteration == EMPTY_STATEMENT_ITERATION) { + return rightIteration; + } else if (rightIteration == EMPTY_STATEMENT_ITERATION) { + return leftIteration; + } else if (rightIteration instanceof EmptyIteration) { return leftIteration; } else if (leftIteration instanceof EmptyIteration) { return rightIteration; @@ -63,7 +67,11 @@ public static CloseableIteration getWildcardInstance( public static CloseableIteration getWildcardInstance(Comparator cmp, CloseableIteration leftIteration, CloseableIteration rightIteration) { - if (rightIteration instanceof EmptyIteration) { + if (leftIteration == EMPTY_STATEMENT_ITERATION) { + return rightIteration; + } else if (rightIteration == EMPTY_STATEMENT_ITERATION) { + return leftIteration; + } else if (rightIteration instanceof EmptyIteration) { return leftIteration; } else if (leftIteration instanceof EmptyIteration) { return rightIteration; @@ -75,7 +83,11 @@ public static CloseableIteration getWildcardInstance(Comparator public static CloseableIteration getInstance(CloseableIteration leftIteration, CloseableIteration rightIteration) { - if (rightIteration instanceof EmptyIteration) { + if (leftIteration == EMPTY_STATEMENT_ITERATION) { + return rightIteration; + } else if (rightIteration == EMPTY_STATEMENT_ITERATION) { + return leftIteration; + } else if (rightIteration instanceof EmptyIteration) { return leftIteration; } else if (leftIteration instanceof EmptyIteration) { return rightIteration; diff --git a/core/query/src/main/java/org/eclipse/rdf4j/query/AbstractBindingSet.java b/core/query/src/main/java/org/eclipse/rdf4j/query/AbstractBindingSet.java index dd5108c77cf..22aad0a5ef8 100644 --- a/core/query/src/main/java/org/eclipse/rdf4j/query/AbstractBindingSet.java +++ b/core/query/src/main/java/org/eclipse/rdf4j/query/AbstractBindingSet.java @@ -26,6 +26,9 @@ public abstract class AbstractBindingSet implements BindingSet { @Override public boolean equals(Object other) { + if (other == null) { + return false; + } if (this == other) { return true; } diff --git a/core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/ArrayBindingSet.java b/core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/ArrayBindingSet.java index 08b7960e499..f22b6acb07e 100644 --- a/core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/ArrayBindingSet.java +++ b/core/queryalgebra/evaluation/src/main/java/org/eclipse/rdf4j/query/algebra/evaluation/ArrayBindingSet.java @@ -403,6 +403,23 @@ public void addAll(ArrayBindingSet other) { } +// @Override +// public boolean equals(Object other){ +// if(other == null) return false; +// if(other == this) return true; +// +// if(other.getClass() != ArrayBindingSet.class){ +// return super.equals(other); +// } +// +// ArrayBindingSet that = (ArrayBindingSet) other; +// +// // TODO make a faster equals for ArrayBindingSet +// return super.equals(that); +// +// +// } + private class ArrayBindingSetIterator implements Iterator { private int index = 0; diff --git a/core/sail/base/src/main/java/org/eclipse/rdf4j/sail/base/SailDatasetImpl.java b/core/sail/base/src/main/java/org/eclipse/rdf4j/sail/base/SailDatasetImpl.java index b90a9008657..5f6fd74407d 100644 --- a/core/sail/base/src/main/java/org/eclipse/rdf4j/sail/base/SailDatasetImpl.java +++ b/core/sail/base/src/main/java/org/eclipse/rdf4j/sail/base/SailDatasetImpl.java @@ -46,7 +46,6 @@ class SailDatasetImpl implements SailDataset { private static final EmptyIteration TRIPLE_EMPTY_ITERATION = new EmptyIteration<>(); private static final EmptyIteration NAMESPACES_EMPTY_ITERATION = new EmptyIteration<>(); - private static final EmptyIteration STATEMENT_EMPTY_ITERATION = new EmptyIteration<>(); /** * {@link SailDataset} of the backing {@link SailSource}. @@ -286,7 +285,7 @@ public CloseableIteration getStatements(Resource subj, IRI } else if (iter != null) { return iter; } else { - return STATEMENT_EMPTY_ITERATION; + return CloseableIteration.EMPTY_STATEMENT_ITERATION; } } diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/IndexKeyWriters.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/IndexKeyWriters.java new file mode 100644 index 00000000000..dbc039913d0 --- /dev/null +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/IndexKeyWriters.java @@ -0,0 +1,403 @@ +/******************************************************************************* + * Copyright (c) 2025 Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + *******************************************************************************/ +package org.eclipse.rdf4j.sail.lmdb; + +import java.nio.ByteBuffer; + +final class IndexKeyWriters { + + private IndexKeyWriters() { + } + + @FunctionalInterface + interface KeyWriter { + void write(ByteBuffer bb, long subj, long pred, long obj, long context); + } + + @FunctionalInterface + interface MatcherFactory { + boolean[] create(long subj, long pred, long obj, long context); + } + + static KeyWriter forFieldSeq(String fieldSeq) { + switch (fieldSeq) { + case "spoc": + return IndexKeyWriters::spoc; + case "spco": + return IndexKeyWriters::spco; + case "sopc": + return IndexKeyWriters::sopc; + case "socp": + return IndexKeyWriters::socp; + case "scpo": + return IndexKeyWriters::scpo; + case "scop": + return IndexKeyWriters::scop; + case "psoc": + return IndexKeyWriters::psoc; + case "psco": + return IndexKeyWriters::psco; + case "posc": + return IndexKeyWriters::posc; + case "pocs": + return IndexKeyWriters::pocs; + case "pcso": + return IndexKeyWriters::pcso; + case "pcos": + return IndexKeyWriters::pcos; + case "ospc": + return IndexKeyWriters::ospc; + case "oscp": + return IndexKeyWriters::oscp; + case "opsc": + return IndexKeyWriters::opsc; + case "opcs": + return IndexKeyWriters::opcs; + case "ocsp": + return IndexKeyWriters::ocsp; + case "ocps": + return IndexKeyWriters::ocps; + case "cspo": + return IndexKeyWriters::cspo; + case "csop": + return IndexKeyWriters::csop; + case "cpso": + return IndexKeyWriters::cpso; + case "cpos": + return IndexKeyWriters::cpos; + case "cosp": + return IndexKeyWriters::cosp; + case "cops": + return IndexKeyWriters::cops; + default: + throw new IllegalArgumentException("Unsupported field sequence: " + fieldSeq); + } + } + + static MatcherFactory matcherFactory(String fieldSeq) { + switch (fieldSeq) { + case "spoc": + return IndexKeyWriters::spocShouldMatch; + case "spco": + return IndexKeyWriters::spcoShouldMatch; + case "sopc": + return IndexKeyWriters::sopcShouldMatch; + case "socp": + return IndexKeyWriters::socpShouldMatch; + case "scpo": + return IndexKeyWriters::scpoShouldMatch; + case "scop": + return IndexKeyWriters::scopShouldMatch; + case "psoc": + return IndexKeyWriters::psocShouldMatch; + case "psco": + return IndexKeyWriters::pscoShouldMatch; + case "posc": + return IndexKeyWriters::poscShouldMatch; + case "pocs": + return IndexKeyWriters::pocsShouldMatch; + case "pcso": + return IndexKeyWriters::pcsoShouldMatch; + case "pcos": + return IndexKeyWriters::pcosShouldMatch; + case "ospc": + return IndexKeyWriters::ospcShouldMatch; + case "oscp": + return IndexKeyWriters::oscpShouldMatch; + case "opsc": + return IndexKeyWriters::opscShouldMatch; + case "opcs": + return IndexKeyWriters::opcsShouldMatch; + case "ocsp": + return IndexKeyWriters::ocspShouldMatch; + case "ocps": + return IndexKeyWriters::ocpsShouldMatch; + case "cspo": + return IndexKeyWriters::cspoShouldMatch; + case "csop": + return IndexKeyWriters::csopShouldMatch; + case "cpso": + return IndexKeyWriters::cpsoShouldMatch; + case "cpos": + return IndexKeyWriters::cposShouldMatch; + case "cosp": + return IndexKeyWriters::cospShouldMatch; + case "cops": + return IndexKeyWriters::copsShouldMatch; + default: + throw new IllegalArgumentException("Unsupported field sequence: " + fieldSeq); + } + } + + static void spoc(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, context); + } + + static void spco(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, obj); + } + + static void sopc(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, context); + } + + static void socp(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, pred); + } + + static void scpo(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, obj); + } + + static void scop(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, pred); + } + + static void psoc(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, context); + } + + static void psco(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, obj); + } + + static void posc(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, context); + } + + static void pocs(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, subj); + } + + static void pcso(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, obj); + } + + static void pcos(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, subj); + } + + static void ospc(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, context); + } + + static void oscp(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, pred); + } + + static void opsc(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, context); + } + + static void opcs(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, subj); + } + + static void ocsp(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, pred); + } + + static void ocps(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, subj); + } + + static void cspo(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, obj); + } + + static void csop(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, pred); + } + + static void cpso(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, obj); + } + + static void cpos(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, subj); + } + + static void cosp(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, subj); + Varint.writeUnsigned(bb, pred); + } + + static void cops(ByteBuffer bb, long subj, long pred, long obj, long context) { + Varint.writeUnsigned(bb, context); + Varint.writeUnsigned(bb, obj); + Varint.writeUnsigned(bb, pred); + Varint.writeUnsigned(bb, subj); + } + + static boolean[] spocShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { subj > 0, pred > 0, obj > 0, context >= 0 }; + } + + static boolean[] spcoShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { subj > 0, pred > 0, context >= 0, obj > 0 }; + } + + static boolean[] sopcShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { subj > 0, obj > 0, pred > 0, context >= 0 }; + } + + static boolean[] socpShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { subj > 0, obj > 0, context >= 0, pred > 0 }; + } + + static boolean[] scpoShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { subj > 0, context >= 0, pred > 0, obj > 0 }; + } + + static boolean[] scopShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { subj > 0, context >= 0, obj > 0, pred > 0 }; + } + + static boolean[] psocShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { pred > 0, subj > 0, obj > 0, context >= 0 }; + } + + static boolean[] pscoShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { pred > 0, subj > 0, context >= 0, obj > 0 }; + } + + static boolean[] poscShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { pred > 0, obj > 0, subj > 0, context >= 0 }; + } + + static boolean[] pocsShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { pred > 0, obj > 0, context >= 0, subj > 0 }; + } + + static boolean[] pcsoShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { pred > 0, context >= 0, subj > 0, obj > 0 }; + } + + static boolean[] pcosShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { pred > 0, context >= 0, obj > 0, subj > 0 }; + } + + static boolean[] ospcShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { obj > 0, subj > 0, pred > 0, context >= 0 }; + } + + static boolean[] oscpShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { obj > 0, subj > 0, context >= 0, pred > 0 }; + } + + static boolean[] opscShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { obj > 0, pred > 0, subj > 0, context >= 0 }; + } + + static boolean[] opcsShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { obj > 0, pred > 0, context >= 0, subj > 0 }; + } + + static boolean[] ocspShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { obj > 0, context >= 0, subj > 0, pred > 0 }; + } + + static boolean[] ocpsShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { obj > 0, context >= 0, pred > 0, subj > 0 }; + } + + static boolean[] cspoShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { context >= 0, subj > 0, pred > 0, obj > 0 }; + } + + static boolean[] csopShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { context >= 0, subj > 0, obj > 0, pred > 0 }; + } + + static boolean[] cpsoShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { context >= 0, pred > 0, subj > 0, obj > 0 }; + } + + static boolean[] cposShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { context >= 0, pred > 0, obj > 0, subj > 0 }; + } + + static boolean[] cospShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { context >= 0, obj > 0, subj > 0, pred > 0 }; + } + + static boolean[] copsShouldMatch(long subj, long pred, long obj, long context) { + return new boolean[] { context >= 0, obj > 0, pred > 0, subj > 0 }; + } +} diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbRecordIterator.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbRecordIterator.java index 197c68deb5f..c9553f0fc3b 100644 --- a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbRecordIterator.java +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbRecordIterator.java @@ -29,7 +29,7 @@ import org.eclipse.rdf4j.sail.SailException; import org.eclipse.rdf4j.sail.lmdb.TripleStore.TripleIndex; import org.eclipse.rdf4j.sail.lmdb.TxnManager.Txn; -import org.eclipse.rdf4j.sail.lmdb.Varint.GroupMatcher; +import org.eclipse.rdf4j.sail.lmdb.util.GroupMatcher; import org.lwjgl.PointerBuffer; import org.lwjgl.system.MemoryStack; import org.lwjgl.util.lmdb.MDBVal; @@ -45,11 +45,17 @@ class LmdbRecordIterator implements RecordIterator { private final TripleIndex index; + private final long subj; + private final long pred; + private final long obj; + private final long context; + private final long cursor; private final MDBVal maxKey; - private final GroupMatcher groupMatcher; + private final boolean matchValues; + private GroupMatcher groupMatcher; private final Txn txnRef; @@ -81,6 +87,10 @@ class LmdbRecordIterator implements RecordIterator { LmdbRecordIterator(TripleIndex index, boolean rangeSearch, long subj, long pred, long obj, long context, boolean explicit, Txn txnRef) throws IOException { + this.subj = subj; + this.pred = pred; + this.obj = obj; + this.context = context; this.pool = Pool.get(); this.keyData = pool.getVal(); this.valueData = pool.getVal(); @@ -100,12 +110,8 @@ class LmdbRecordIterator implements RecordIterator { this.maxKey = null; } - boolean matchValues = subj > 0 || pred > 0 || obj > 0 || context >= 0; - if (matchValues) { - this.groupMatcher = index.createMatcher(subj, pred, obj, context); - } else { - this.groupMatcher = null; - } + this.matchValues = subj > 0 || pred > 0 || obj > 0 || context >= 0; + this.dbi = index.getDB(explicit); this.txnRef = txnRef; this.txnLockManager = txnRef.lockManager(); @@ -145,6 +151,7 @@ public long[] next() { } if (txnRefVersion != txnRef.version()) { + // TODO: None of the tests in the LMDB Store cover this case! // cursor must be renewed mdb_cursor_renew(txn, cursor); if (fetchNext) { @@ -188,7 +195,7 @@ public long[] next() { // if (maxKey != null && TripleStore.COMPARATOR.compare(keyData.mv_data(), maxKey.mv_data()) > 0) { if (maxKey != null && mdb_cmp(txn, dbi, keyData, maxKey) > 0) { lastResult = MDB_NOTFOUND; - } else if (groupMatcher != null && !groupMatcher.matches(keyData.mv_data())) { + } else if (matches()) { // value doesn't match search key/mask, fetch next value lastResult = mdb_cursor_get(cursor, keyData, valueData, MDB_NEXT); } else { @@ -206,6 +213,18 @@ public long[] next() { } } + private boolean matches() { + + if (groupMatcher != null) { + return !this.groupMatcher.matches(keyData.mv_data()); + } else if (matchValues) { + this.groupMatcher = index.createMatcher(subj, pred, obj, context); + return !this.groupMatcher.matches(keyData.mv_data()); + } else { + return false; + } + } + private void closeInternal(boolean maybeCalledAsync) { if (!closed) { long writeStamp = 0L; diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStore.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStore.java index 90212ad598b..3765e15d563 100644 --- a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStore.java +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStore.java @@ -29,7 +29,6 @@ import org.eclipse.rdf4j.common.iteration.CloseableIteration; import org.eclipse.rdf4j.common.iteration.CloseableIteratorIteration; import org.eclipse.rdf4j.common.iteration.ConvertingIteration; -import org.eclipse.rdf4j.common.iteration.EmptyIteration; import org.eclipse.rdf4j.common.iteration.FilterIteration; import org.eclipse.rdf4j.common.iteration.UnionIteration; import org.eclipse.rdf4j.common.order.StatementOrder; @@ -72,6 +71,7 @@ class LmdbSailStore implements SailStore { private boolean multiThreadingActive; private volatile boolean asyncTransactionFinished; private volatile boolean nextTransactionAsync; + private volatile boolean mayHaveInferred; boolean enableMultiThreading = true; @@ -143,6 +143,9 @@ class AddQuadOperation implements Operation { @Override public void execute() throws IOException { + if (!explicit) { + mayHaveInferred = true; + } if (!unusedIds.isEmpty()) { // these ids are used again unusedIds.remove(s); @@ -193,6 +196,7 @@ public LmdbSailStore(File dataDir, LmdbStoreConfig config) throws IOException, S namespaceStore = new NamespaceStore(dataDir); valueStore = new ValueStore(new File(dataDir, "values"), config); tripleStore = new TripleStore(new File(dataDir, "triples"), config); + mayHaveInferred = tripleStore.hasTriples(false); initialized = true; } finally { if (!initialized) { @@ -348,11 +352,15 @@ protected void handleClose() throws SailException { */ CloseableIteration createStatementIterator( Txn txn, Resource subj, IRI pred, Value obj, boolean explicit, Resource... contexts) throws IOException { + if (!explicit && !mayHaveInferred) { + // there are no inferred statements and the iterator should only return inferred statements + return CloseableIteration.EMPTY_STATEMENT_ITERATION; + } long subjID = LmdbValue.UNKNOWN_ID; if (subj != null) { subjID = valueStore.getId(subj); if (subjID == LmdbValue.UNKNOWN_ID) { - return new EmptyIteration<>(); + return CloseableIteration.EMPTY_STATEMENT_ITERATION; } } @@ -360,7 +368,7 @@ CloseableIteration createStatementIterator( if (pred != null) { predID = valueStore.getId(pred); if (predID == LmdbValue.UNKNOWN_ID) { - return new EmptyIteration<>(); + return CloseableIteration.EMPTY_STATEMENT_ITERATION; } } @@ -369,7 +377,7 @@ CloseableIteration createStatementIterator( objID = valueStore.getId(obj); if (objID == LmdbValue.UNKNOWN_ID) { - return new EmptyIteration<>(); + return CloseableIteration.EMPTY_STATEMENT_ITERATION; } } diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbStatementIterator.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbStatementIterator.java index 5fccf7113a1..e4b6429afa8 100644 --- a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbStatementIterator.java +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/LmdbStatementIterator.java @@ -11,8 +11,9 @@ package org.eclipse.rdf4j.sail.lmdb; import java.io.IOException; +import java.util.NoSuchElementException; -import org.eclipse.rdf4j.common.iteration.LookAheadIteration; +import org.eclipse.rdf4j.common.iteration.AbstractCloseableIteration; import org.eclipse.rdf4j.model.IRI; import org.eclipse.rdf4j.model.Resource; import org.eclipse.rdf4j.model.Statement; @@ -23,7 +24,7 @@ * A statement iterator that wraps a RecordIterator containing statement records and translates these records to * {@link Statement} objects. */ -class LmdbStatementIterator extends LookAheadIteration { +class LmdbStatementIterator extends AbstractCloseableIteration { /*-----------* * Variables * @@ -32,6 +33,7 @@ class LmdbStatementIterator extends LookAheadIteration { private final RecordIterator recordIt; private final ValueStore valueStore; + private Statement nextElement; /*--------------* * Constructors * @@ -49,7 +51,6 @@ public LmdbStatementIterator(RecordIterator recordIt, ValueStore valueStore) { * Methods * *---------*/ - @Override public Statement getNextElement() throws SailException { try { long[] quad = recordIt.next(); @@ -86,4 +87,52 @@ protected void handleClose() throws SailException { private SailException causeIOException(IOException e) { return new SailException(e); } + + @Override + public final boolean hasNext() { + if (isClosed()) { + return false; + } + + return lookAhead() != null; + } + + @Override + public final Statement next() { + if (isClosed()) { + throw new NoSuchElementException("The iteration has been closed."); + } + Statement result = lookAhead(); + + if (result != null) { + nextElement = null; + return result; + } else { + throw new NoSuchElementException(); + } + } + + /** + * Fetches the next element if it hasn't been fetched yet and stores it in {@link #nextElement}. + * + * @return The next element, or null if there are no more results. + */ + private Statement lookAhead() { + if (nextElement == null) { + nextElement = getNextElement(); + + if (nextElement == null) { + close(); + } + } + return nextElement; + } + + /** + * Throws an {@link UnsupportedOperationException}. + */ + @Override + public void remove() { + throw new UnsupportedOperationException(); + } } diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/TripleStore.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/TripleStore.java index a39762135f1..768508238c7 100644 --- a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/TripleStore.java +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/TripleStore.java @@ -14,8 +14,7 @@ import static org.eclipse.rdf4j.sail.lmdb.LmdbUtil.openDatabase; import static org.eclipse.rdf4j.sail.lmdb.LmdbUtil.readTransaction; import static org.eclipse.rdf4j.sail.lmdb.LmdbUtil.transaction; -import static org.eclipse.rdf4j.sail.lmdb.Varint.readListUnsigned; -import static org.eclipse.rdf4j.sail.lmdb.Varint.writeUnsigned; +import static org.eclipse.rdf4j.sail.lmdb.Varint.readQuadUnsigned; import static org.lwjgl.system.MemoryStack.stackPush; import static org.lwjgl.system.MemoryUtil.NULL; import static org.lwjgl.util.lmdb.LMDB.MDB_CREATE; @@ -75,7 +74,6 @@ import java.util.Properties; import java.util.Set; import java.util.StringTokenizer; -import java.util.concurrent.locks.StampedLock; import java.util.function.Consumer; import org.eclipse.rdf4j.common.concurrent.locks.StampedLongAdderLockManager; @@ -84,8 +82,8 @@ import org.eclipse.rdf4j.sail.lmdb.TxnManager.Txn; import org.eclipse.rdf4j.sail.lmdb.TxnRecordCache.Record; import org.eclipse.rdf4j.sail.lmdb.TxnRecordCache.RecordCacheIterator; -import org.eclipse.rdf4j.sail.lmdb.Varint.GroupMatcher; import org.eclipse.rdf4j.sail.lmdb.config.LmdbStoreConfig; +import org.eclipse.rdf4j.sail.lmdb.util.GroupMatcher; import org.lwjgl.PointerBuffer; import org.lwjgl.system.MemoryStack; import org.lwjgl.util.lmdb.MDBEnvInfo; @@ -508,6 +506,15 @@ public RecordIterator getTriples(Txn txn, long subj, long pred, long obj, long c return getTriplesUsingIndex(txn, subj, pred, obj, context, explicit, index, doRangeSearch); } + boolean hasTriples(boolean explicit) throws IOException { + TripleIndex mainIndex = indexes.get(0); + return txnManager.doWith((stack, txn) -> { + MDBStat stat = MDBStat.mallocStack(stack); + mdb_stat(txn, mainIndex.getDB(explicit), stat); + return stat.ms_entries() > 0; + }); + } + private RecordIterator getTriplesUsingIndex(Txn txn, long subj, long pred, long obj, long context, boolean explicit, TripleIndex index, boolean rangeSearch) throws IOException { return new LmdbRecordIterator(index, rangeSearch, subj, pred, obj, context, explicit, txn); @@ -1163,11 +1170,15 @@ private void storeProperties(File propFile) throws IOException { class TripleIndex { private final char[] fieldSeq; + private final IndexKeyWriters.KeyWriter keyWriter; + private final IndexKeyWriters.MatcherFactory matcherFactory; private final int dbiExplicit, dbiInferred; private final int[] indexMap; public TripleIndex(String fieldSeq) throws IOException { this.fieldSeq = fieldSeq.toCharArray(); + this.keyWriter = IndexKeyWriters.forFieldSeq(fieldSeq); + this.matcherFactory = IndexKeyWriters.matcherFactory(fieldSeq); this.indexMap = getIndexes(this.fieldSeq); // open database and use native sort order without comparator dbiExplicit = openDatabase(env, fieldSeq, MDB_CREATE, null); @@ -1273,52 +1284,27 @@ void getMaxKey(ByteBuffer bb, long subj, long pred, long obj, long context) { } GroupMatcher createMatcher(long subj, long pred, long obj, long context) { - ByteBuffer bb = ByteBuffer.allocate(TripleStore.MAX_KEY_LENGTH); - toKey(bb, subj == -1 ? 0 : subj, pred == -1 ? 0 : pred, obj == -1 ? 0 : obj, context == -1 ? 0 : context); - bb.flip(); - - boolean[] shouldMatch = new boolean[4]; - for (int i = 0; i < fieldSeq.length; i++) { - switch (fieldSeq[i]) { - case 's': - shouldMatch[i] = subj > 0; - break; - case 'p': - shouldMatch[i] = pred > 0; - break; - case 'o': - shouldMatch[i] = obj > 0; - break; - case 'c': - shouldMatch[i] = context >= 0; - break; - } - } - return new GroupMatcher(bb, shouldMatch); + long sanitizedSubj = subj == -1 ? 0 : subj; + long sanitizedPred = pred == -1 ? 0 : pred; + long sanitizedObj = obj == -1 ? 0 : obj; + long sanitizedContext = context == -1 ? 0 : context; + + long[] values = { sanitizedSubj, sanitizedPred, sanitizedObj, sanitizedContext }; + long value0 = values[indexMap[0]]; + long value1 = values[indexMap[1]]; + long value2 = values[indexMap[2]]; + long value3 = values[indexMap[3]]; + + return new GroupMatcher(value0, value1, value2, value3, matcherFactory.create(subj, pred, obj, context)); } void toKey(ByteBuffer bb, long subj, long pred, long obj, long context) { - for (int i = 0; i < fieldSeq.length; i++) { - switch (fieldSeq[i]) { - case 's': - writeUnsigned(bb, subj); - break; - case 'p': - writeUnsigned(bb, pred); - break; - case 'o': - writeUnsigned(bb, obj); - break; - case 'c': - writeUnsigned(bb, context); - break; - } - } + keyWriter.write(bb, subj, pred, obj, context); } void keyToQuad(ByteBuffer key, long[] quad) { // directly use index map to read values in to correct positions - readListUnsigned(key, indexMap, quad); + readQuadUnsigned(key, indexMap, quad); } @Override diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/ValueStore.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/ValueStore.java index 4cfc59a5be0..ba5a1609e1e 100644 --- a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/ValueStore.java +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/ValueStore.java @@ -1390,7 +1390,7 @@ private LmdbIRI data2uri(long id, byte[] data, LmdbIRI value) throws IOException ByteBuffer bb = ByteBuffer.wrap(data); // skip type marker bb.get(); - long nsID = Varint.readUnsigned(bb); + long nsID = Varint.readUnsignedHeap(bb); String namespace = getNamespace(nsID); String localName = new String(data, bb.position(), bb.remaining(), StandardCharsets.UTF_8); @@ -1417,7 +1417,7 @@ private LmdbLiteral data2literal(long id, byte[] data, LmdbLiteral value) throws // skip type marker bb.get(); // Get datatype - long datatypeID = Varint.readUnsigned(bb); + long datatypeID = Varint.readUnsignedHeap(bb); IRI datatype = null; if (datatypeID != LmdbValue.UNKNOWN_ID) { datatype = (IRI) getValue(datatypeID); diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/Varint.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/Varint.java index 283186c0246..f6c6e002f0d 100644 --- a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/Varint.java +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/Varint.java @@ -11,12 +11,21 @@ package org.eclipse.rdf4j.sail.lmdb; import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +import org.eclipse.rdf4j.sail.lmdb.util.SignificantBytesBE; /** * Encodes and decodes unsigned values using variable-length encoding. */ public final class Varint { + public static final byte[] ENCODED_LONG_MAX = new byte[] { + (byte) 0xFF, // header: 8 payload bytes + 0x7F, // MSB of Long.MAX_VALUE + (byte) 0xFF, (byte) 0xFF, (byte) 0xFF, (byte) 0xFF, (byte) 0xFF, (byte) 0xFF, (byte) 0xFF + }; + private Varint() { } @@ -89,19 +98,101 @@ private Varint() { * @param value value to encode */ public static void writeUnsigned(final ByteBuffer bb, final long value) { + // Fast path for Long.MAX_VALUE (0xFF header + 8 data bytes) + if (value == Long.MAX_VALUE) { + final ByteOrder prev = bb.order(); + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(ByteOrder.BIG_ENDIAN); + } + try { + bb.put(ENCODED_LONG_MAX); + } finally { + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(prev); + } + } + return; + } + if (value <= 240) { bb.put((byte) value); } else if (value <= 2287) { - bb.put((byte) ((value - 240) / 256 + 241)); - bb.put((byte) ((value - 240) % 256)); + // header: 241..248, then 1 payload byte + // Using bit ops instead of div/mod and putShort to batch the two bytes. + long v = value - 240; // 1..2047 + final ByteOrder prev = bb.order(); + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(ByteOrder.BIG_ENDIAN); + } + try { + int hi = (int) (v >>> 8) + 241; // 241..248 + int lo = (int) (v & 0xFF); // 0..255 + bb.putShort((short) ((hi << 8) | lo)); + } finally { + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(prev); + } + } } else if (value <= 67823) { + // header 249, then 2 payload bytes (value - 2288), big-endian + long v = value - 2288; // 0..65535 bb.put((byte) 249); - bb.put((byte) ((value - 2288) / 256)); - bb.put((byte) ((value - 2288) % 256)); + final ByteOrder prev = bb.order(); + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(ByteOrder.BIG_ENDIAN); + } + try { + bb.putShort((short) v); + } finally { + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(prev); + } + } } else { - int bytes = descriptor(value) + 1; - bb.put((byte) (250 + (bytes - 3))); - writeSignificantBits(bb, value, bytes); + int bytes = descriptor(value) + 1; // 3..8 + bb.put((byte) (250 + (bytes - 3))); // header 250..255 + writeSignificantBits(bb, value, bytes); // payload (batched) + } + } + + // Writes the top `bytes` significant bytes of `value` in big-endian order. +// Uses putLong/putInt/putShort to batch writes and a single leading byte if needed. + public static void writeSignificantBits(ByteBuffer bb, long value, int bytes) { + final ByteOrder prev = bb.order(); + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(ByteOrder.BIG_ENDIAN); + } + try { + int i = bytes; + + // If odd number of bytes, write the leading MSB first + if ((i & 1) != 0) { + bb.put((byte) (value >>> ((i - 1) * 8))); + i--; + } + + // Now i is even: prefer largest chunks first + if (i == 8) { // exactly 8 bytes + bb.putLong(value); + return; + } + + if (i >= 4) { // write next 4 bytes, if any + int shift = (i - 4) * 8; + bb.putInt((int) (value >>> shift)); + i -= 4; + } + + while (i >= 2) { // write remaining pairs + int shift = (i - 2) * 8; + bb.putShort((short) (value >>> shift)); + i -= 2; + } + // i must be 0 here. + } finally { + if (prev != ByteOrder.BIG_ENDIAN) { + bb.order(prev); + } } } @@ -147,13 +238,15 @@ public static int calcListLengthUnsigned(long a, long b, long c, long d) { * @param value the long value * @return the descriptor encoded as byte */ - private static byte descriptor(long value) { + public static byte descriptor(long value) { return value == 0 ? 0 : (byte) (7 - Long.numberOfLeadingZeros(value) / 8); } /** - * Decodes a value using the variable-length encoding of - * SQLite. + * Decodes a value using SQLite's variable-length integer encoding. + * + * Lead-byte layout → number of additional bytes: 0..240 → 0 241..248→ 1 249 → 2 250..255→ 3..8 (i.e., 250→3, 251→4, + * …, 255→8) * * @param bb buffer for reading bytes * @return decoded value @@ -161,7 +254,74 @@ private static byte descriptor(long value) { * @see #writeUnsigned(ByteBuffer, long) */ public static long readUnsigned(ByteBuffer bb) throws IllegalArgumentException { + final int a0 = bb.get() & 0xFF; // lead byte, unsigned + + if (a0 <= 240) { + return a0; + } + + final int extra = VARINT_EXTRA_BYTES[a0]; // 0..8 additional bytes + + switch (extra) { + case 1: { + // 1 extra byte; 241..248 + final int a1 = bb.get() & 0xFF; + // 240 + 256*(a0-241) + a1 + return 240L + ((long) (a0 - 241) << 8) + a1; + } + + case 2: { + // 2 extra bytes; lead byte == 249 + final int a1 = bb.get() & 0xFF; + final int a2 = bb.get() & 0xFF; + // 2288 + 256*a1 + a2 + return 2288L + ((long) a1 << 8) + a2; + } + + case 3: + case 4: + case 5: + case 6: + case 7: + case 8: + return readSignificantBitsDirect(bb, extra); + // 3..8 extra bytes; 250..255 + default: + throw new IllegalArgumentException("Bytes is higher than 8: " + extra); + + } + } + + /** Lookup: lead byte (0..255) → number of additional bytes (0..8). */ + private static final byte[] VARINT_EXTRA_BYTES = buildVarintExtraBytes(); + + private static byte[] buildVarintExtraBytes() { + final byte[] t = new byte[256]; + + // 0..240 → 0 extra bytes + for (int i = 0; i <= 240; i++) { + t[i] = 0; + } + + // 241..248 → 1 extra byte + for (int i = 241; i <= 248; i++) { + t[i] = 1; + } + + // 249 → 2 extra bytes + t[249] = 2; + + // 250..255 → 3..8 extra bytes + for (int i = 250; i <= 255; i++) { + t[i] = (byte) (i - 247); // 250→3, …, 255→8 + } + + return t; + } + + public static long readUnsignedHeap(ByteBuffer bb) throws IllegalArgumentException { int a0 = bb.get() & 0xFF; + if (a0 <= 240) { return a0; } else if (a0 <= 248) { @@ -173,7 +333,7 @@ public static long readUnsigned(ByteBuffer bb) throws IllegalArgumentException { return 2288 + 256 * a1 + a2; } else { int bytes = a0 - 250 + 3; - return readSignificantBits(bb, bytes); + return readSignificantBitsHeap(bb, bytes); } } @@ -195,7 +355,7 @@ public static long readUnsigned(ByteBuffer bb, int pos) throws IllegalArgumentEx return 240 + 256 * (a0 - 241) + a1; } else if (a0 == 249) { int a1 = bb.get(pos + 1) & 0xFF; - int a2 = bb.get(pos + 1) & 0xFF; + int a2 = bb.get(pos + 2) & 0xFF; return 2288 + 256 * a1 + a2; } else { int bytes = a0 - 250 + 3; @@ -203,6 +363,27 @@ public static long readUnsigned(ByteBuffer bb, int pos) throws IllegalArgumentEx } } + private static final int[] FIRST_TO_LENGTH = buildFirstToLength(); + + private static int[] buildFirstToLength() { + int[] t = new int[256]; + // 0..240 → 1 + for (int i = 0; i <= 240; i++) { + t[i] = 1; + } + // 241..248 → 2 + for (int i = 241; i <= 248; i++) { + t[i] = 2; + } + // 249 → 3 + t[249] = 3; + // 250..255 → 4..9 + for (int i = 250; i <= 255; i++) { + t[i] = i - 246; // 250→4, 255→9 + } + return t; + } + /** * Determines length of an encoded varint value by inspecting the first byte. * @@ -210,17 +391,7 @@ public static long readUnsigned(ByteBuffer bb, int pos) throws IllegalArgumentEx * @return decoded value */ public static int firstToLength(byte a0) { - int a0Unsigned = a0 & 0xFF; - if (a0Unsigned <= 240) { - return 1; - } else if (a0Unsigned <= 248) { - return 2; - } else if (a0Unsigned == 249) { - return 3; - } else { - int bytes = a0Unsigned - 250 + 3; - return 1 + bytes; - } + return FIRST_TO_LENGTH[a0 & 0xFF]; } /** @@ -245,6 +416,7 @@ public static long readListElementUnsigned(ByteBuffer bb, int index) { * @param values array with values to write */ public static void writeListUnsigned(final ByteBuffer bb, final long[] values) { + // TODO: Optimise for quads and also call writeUnsigned for (int i = 0; i < values.length; i++) { final long value = values[i]; if (value <= 240) { @@ -276,6 +448,13 @@ public static void readListUnsigned(ByteBuffer bb, long[] values) { } } + public static void readQuadUnsigned(ByteBuffer bb, long[] values) { + values[0] = readUnsigned(bb); + values[1] = readUnsigned(bb); + values[2] = readUnsigned(bb); + values[3] = readUnsigned(bb); + } + /** * Decodes multiple values using variable-length encoding from the given buffer. * @@ -289,32 +468,33 @@ public static void readListUnsigned(ByteBuffer bb, int[] indexMap, long[] values } } - /** - * Writes only the significant bytes of the given value in big-endian order. - * - * @param bb buffer for writing bytes - * @param value value to encode - * @param bytes number of significant bytes - */ - private static void writeSignificantBits(ByteBuffer bb, long value, int bytes) { - while (bytes-- > 0) { - bb.put((byte) (0xFF & (value >>> (bytes * 8)))); - } + public static void readQuadUnsigned(ByteBuffer bb, int[] indexMap, long[] values) { + values[indexMap[0]] = readUnsigned(bb); + values[indexMap[1]] = readUnsigned(bb); + values[indexMap[2]] = readUnsigned(bb); + values[indexMap[3]] = readUnsigned(bb); } /** * Reads only the significant bytes of the given value in big-endian order. * - * @param bb buffer for reading bytes - * @param bytes number of significant bytes + * @param bb buffer for reading bytes + * @param n number of significant bytes */ - private static long readSignificantBits(ByteBuffer bb, int bytes) { - bytes--; - long value = (long) (bb.get() & 0xFF) << (bytes * 8); - while (bytes-- > 0) { - value |= (long) (bb.get() & 0xFF) << (bytes * 8); + private static long readSignificantBits(ByteBuffer bb, int n) { + if (bb.isDirect()) { + return readSignificantBitsDirect(bb, n); + } else { + return readSignificantBitsHeap(bb, n); } - return value; + } + + private static long readSignificantBitsDirect(ByteBuffer bb, int n) { + return SignificantBytesBE.readDirect(bb, n); + } + + private static long readSignificantBitsHeap(ByteBuffer bb, int n) { + return SignificantBytesBE.read(bb, n); } /** @@ -342,8 +522,9 @@ private static int compareRegion(ByteBuffer bb1, int startIdx1, ByteBuffer bb2, } /** - * A matcher for partial equality tests of varint lists. + * Use of this class is deprecated, use {@link org.eclipse.rdf4j.sail.lmdb.util.GroupMatcher} instead. */ + @Deprecated(forRemoval = true) public static class GroupMatcher { final ByteBuffer value; @@ -383,4 +564,5 @@ public boolean matches(ByteBuffer other) { return true; } } + } diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/Bytes.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/Bytes.java new file mode 100644 index 00000000000..613137646c8 --- /dev/null +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/Bytes.java @@ -0,0 +1,631 @@ +/******************************************************************************* + * Copyright (c) 2025 Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + ******************************************************************************/ + +package org.eclipse.rdf4j.sail.lmdb.util; + +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +public final class Bytes { + private Bytes() { + } + + @FunctionalInterface + public interface RegionComparator { + boolean equals(byte firstByte, ByteBuffer other); + } + + private static boolean equals(int a, int b) { + return a == b; + } + + private static short toShort(byte[] array, int offset) { + return (short) (((array[offset] & 0xFF) << 8) | (array[offset + 1] & 0xFF)); + } + + private static int toInt(byte[] array, int offset) { + return ((array[offset] & 0xFF) << 24) + | ((array[offset + 1] & 0xFF) << 16) + | ((array[offset + 2] & 0xFF) << 8) + | (array[offset + 3] & 0xFF); + } + + public static RegionComparator capturedComparator(byte[] array, int len) { + if (len <= 0) { + return (firstByte, b) -> true; + } + switch (len) { + case 1: + return comparatorLen1(array); + case 2: + return comparatorLen2(array); + case 3: + return comparatorLen3(array); + case 4: + return comparatorLen4(array); + case 5: + return comparatorLen5(array); + case 6: + return comparatorLen6(array); + case 7: + return comparatorLen7(array); + case 8: + return comparatorLen8(array); + case 9: + return comparatorLen9(array); + case 10: + return comparatorLen10(array); + case 11: + return comparatorLen11(array); + case 12: + return comparatorLen12(array); + case 13: + return comparatorLen13(array); + case 14: + return comparatorLen14(array); + case 15: + return comparatorLen15(array); + case 16: + return comparatorLen16(array); + case 17: + return comparatorLen17(array); + case 18: + return comparatorLen18(array); + case 19: + return comparatorLen19(array); + case 20: + return comparatorLen20(array); + default: + return comparatorGeneric(array, len); + } + } + + private static RegionComparator comparatorLen1(byte[] array) { + return (firstByte, b) -> equals(array[0], firstByte); + } + + private static RegionComparator comparatorLen2(byte[] array) { + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + return equals(array[0 + 1], b.get()); + }; + } + + private static RegionComparator comparatorLen3(byte[] array) { + + final short expected = toShort(array, 0 + 1); + final short expectedLE = Short.reverseBytes(expected); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + return equals(bigEndian ? expected : expectedLE, b.getShort()); + }; + } + + private static RegionComparator comparatorLen4(byte[] array) { + + final short expected = toShort(array, 0 + 1); + final short expectedLE = Short.reverseBytes(expected); + final byte expectedTail = array[0 + 3]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected : expectedLE, b.getShort())) { + return false; + } + + return equals(expectedTail, b.get()); + }; + } + + private static RegionComparator comparatorLen5(byte[] array) { + + final int expected = toInt(array, 0 + 1); + final int expectedLE = Integer.reverseBytes(expected); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + return equals(bigEndian ? expected : expectedLE, b.getInt()); + }; + } + + private static RegionComparator comparatorLen6(byte[] array) { + + final int expected = toInt(array, 0 + 1); + final int expectedLE = Integer.reverseBytes(expected); + final byte tail = array[0 + 5]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected : expectedLE, b.getInt())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen7(byte[] array) { + + final int expected = toInt(array, 0 + 1); + final int expectedLE = Integer.reverseBytes(expected); + final short expectedTail = toShort(array, 0 + 5); + final short expectedTailLE = Short.reverseBytes(expectedTail); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected : expectedLE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expectedTail : expectedTailLE, b.getShort()); + }; + } + + private static RegionComparator comparatorLen8(byte[] array) { + + final int expected = toInt(array, 0 + 1); + final int expectedLE = Integer.reverseBytes(expected); + final short expectedShort = toShort(array, 0 + 5); + final short expectedShortLE = Short.reverseBytes(expectedShort); + final byte tail = array[0 + 7]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected : expectedLE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expectedShort : expectedShortLE, b.getShort())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen9(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expected2 : expected2LE, b.getInt()); + }; + } + + private static RegionComparator comparatorLen10(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final byte tail = array[0 + 9]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen11(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final short expectedShort = toShort(array, 0 + 9); + final short expectedShortLE = Short.reverseBytes(expectedShort); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expectedShort : expectedShortLE, b.getShort()); + }; + } + + private static RegionComparator comparatorLen12(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final short expectedShort = toShort(array, 0 + 9); + final short expectedShortLE = Short.reverseBytes(expectedShort); + final byte tail = array[0 + 11]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expectedShort : expectedShortLE, b.getShort())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen13(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expected3 : expected3LE, b.getInt()); + }; + } + + private static RegionComparator comparatorLen14(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final byte tail = array[0 + 13]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen15(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final short expectedShort = toShort(array, 0 + 13); + final short expectedShortLE = Short.reverseBytes(expectedShort); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expectedShort : expectedShortLE, b.getShort()); + }; + } + + private static RegionComparator comparatorLen16(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final short expectedShort = toShort(array, 0 + 13); + final short expectedShortLE = Short.reverseBytes(expectedShort); + final byte tail = array[0 + 15]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expectedShort : expectedShortLE, b.getShort())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen17(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final int expected4 = toInt(array, 0 + 13); + final int expected4LE = Integer.reverseBytes(expected4); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expected4 : expected4LE, b.getInt()); + }; + } + + private static RegionComparator comparatorLen18(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final int expected4 = toInt(array, 0 + 13); + final int expected4LE = Integer.reverseBytes(expected4); + final byte tail = array[0 + 17]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected4 : expected4LE, b.getInt())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorLen19(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final int expected4 = toInt(array, 0 + 13); + final int expected4LE = Integer.reverseBytes(expected4); + final short expectedShort = toShort(array, 0 + 17); + final short expectedShortLE = Short.reverseBytes(expectedShort); + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected4 : expected4LE, b.getInt())) { + return false; + } + + return equals(bigEndian ? expectedShort : expectedShortLE, b.getShort()); + }; + } + + private static RegionComparator comparatorLen20(byte[] array) { + + final int expected1 = toInt(array, 0 + 1); + final int expected1LE = Integer.reverseBytes(expected1); + final int expected2 = toInt(array, 0 + 5); + final int expected2LE = Integer.reverseBytes(expected2); + final int expected3 = toInt(array, 0 + 9); + final int expected3LE = Integer.reverseBytes(expected3); + final int expected4 = toInt(array, 0 + 13); + final int expected4LE = Integer.reverseBytes(expected4); + final short expectedShort = toShort(array, 0 + 17); + final short expectedShortLE = Short.reverseBytes(expectedShort); + final byte tail = array[0 + 19]; + + return (firstByte, b) -> { + if (!equals(array[0], firstByte)) { + return false; + } + + boolean bigEndian = b.order() == ByteOrder.BIG_ENDIAN; + if (!equals(bigEndian ? expected1 : expected1LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected2 : expected2LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected3 : expected3LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expected4 : expected4LE, b.getInt())) { + return false; + } + + if (!equals(bigEndian ? expectedShort : expectedShortLE, b.getShort())) { + return false; + } + + return equals(tail, b.get()); + }; + } + + private static RegionComparator comparatorGeneric(byte[] array, int len) { + final int start = 0; + final int end = 0 + len; + return (firstByte, b) -> { + if (!equals(array[start], firstByte)) { + return false; + } + + int idx = start + 1; + while (idx < end) { + if (!equals(array[idx], b.get())) { + return false; + } + idx++; + } + return true; + }; + } +} diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/GroupMatcher.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/GroupMatcher.java new file mode 100644 index 00000000000..78cdbe8c118 --- /dev/null +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/GroupMatcher.java @@ -0,0 +1,455 @@ +/******************************************************************************* + * Copyright (c) 2025 Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + ******************************************************************************/ + +package org.eclipse.rdf4j.sail.lmdb.util; + +import static org.eclipse.rdf4j.sail.lmdb.Varint.firstToLength; + +import java.nio.ByteBuffer; + +import org.eclipse.rdf4j.sail.lmdb.Varint; + +/** + * A matcher for partial equality tests of varint lists. + */ +public class GroupMatcher { + + public static final byte[] BYTES_FOR_ZERO = { 0 }; + private final int length0; + private final int length1; + private final int length2; + private final int length3; + private final Bytes.RegionComparator cmp0; + private final Bytes.RegionComparator cmp1; + private final Bytes.RegionComparator cmp2; + private final Bytes.RegionComparator cmp3; + private final byte firstByte0; + private final byte firstByte1; + private final byte firstByte2; + private final byte firstByte3; + private final MatchFn matcher; + + public GroupMatcher(long value0, long value1, long value2, long value3, boolean[] shouldMatch) { + assert shouldMatch.length == 4; + + // Loop is unrolled for performance. Do not change back to a loop, do not extract into method, unless you + // benchmark with QueryBenchmark first! + { + if (!shouldMatch[0]) { + this.length0 = 1; + this.firstByte0 = BYTES_FOR_ZERO[0]; + this.cmp0 = null; + } else { + byte[] valueArray = getByteArray(value0); + + this.length0 = valueArray.length; + this.firstByte0 = valueArray[0]; + this.cmp0 = Bytes.capturedComparator(valueArray, length0); + } + } + { + if (!shouldMatch[1]) { + byte[] valueArray = BYTES_FOR_ZERO; + this.length1 = valueArray.length; + this.firstByte1 = valueArray[0]; + this.cmp1 = null; + } else { + byte[] valueArray = getByteArray(value1); + + this.length1 = valueArray.length; + this.firstByte1 = valueArray[0]; + this.cmp1 = Bytes.capturedComparator(valueArray, length1); + } + } + { + if (!shouldMatch[2]) { + byte[] valueArray = BYTES_FOR_ZERO; + + this.length2 = valueArray.length; + this.firstByte2 = valueArray[0]; + this.cmp2 = null; + } else { + byte[] valueArray = getByteArray(value2); + + this.length2 = valueArray.length; + this.firstByte2 = valueArray[0]; + this.cmp2 = Bytes.capturedComparator(valueArray, length2); + } + } + { + if (!shouldMatch[3]) { + byte[] valueArray = BYTES_FOR_ZERO; + + this.length3 = valueArray.length; + this.firstByte3 = valueArray[0]; + this.cmp3 = null; + } else { + byte[] valueArray = getByteArray(value3); + + this.length3 = valueArray.length; + this.firstByte3 = valueArray[0]; + this.cmp3 = Bytes.capturedComparator(valueArray, length3); + } + } + + this.matcher = selectMatcher(shouldMatch); + + } + + private static byte[] getByteArray(long value0) { + + if (value0 <= 240) { + return new byte[] { (byte) value0 }; + } + + if (value0 <= 2287) { + long v = value0 - 240; // 1..2047 + int hi = (int) (v >>> 8) + 241; // 241..248 + int lo = (int) (v & 0xFF); // 0..255 + return new byte[] { (byte) hi, (byte) lo }; + } + + if (value0 <= 67823) { + long v = value0 - 2288; // 0..65535 + return new byte[] { (byte) 249, (byte) (v >>> 8), (byte) v }; + } + + int bytes = Varint.descriptor(value0) + 1; // 3..8 + byte[] result = new byte[bytes + 1]; + result[0] = (byte) (250 + (bytes - 3)); // header 250..255 + for (int i = 0; i < bytes; i++) { + int shift = (bytes - 1 - i) * 8; + result[1 + i] = (byte) (value0 >>> shift); + } + return result; + } + + public boolean matches(ByteBuffer other) { + return matcher.matches(other); + } + + @FunctionalInterface + private interface MatchFn { + boolean matches(ByteBuffer other); + } + + private MatchFn selectMatcher(boolean[] shouldMatch) { + byte mask = 0; + if (shouldMatch[0]) { + mask |= 0b0001; + } + if (shouldMatch[1]) { + mask |= 0b0010; + } + if (shouldMatch[2]) { + mask |= 0b0100; + } + if (shouldMatch[3]) { + mask |= 0b1000; + } + + switch (mask) { + case 0b0000: + return this::match0000; + case 0b0001: + return this::match0001; + case 0b0010: + return this::match0010; + case 0b0011: + return this::match0011; + case 0b0100: + return this::match0100; + case 0b0101: + return this::match0101; + case 0b0110: + return this::match0110; + case 0b0111: + return this::match0111; + case 0b1000: + return this::match1000; + case 0b1001: + return this::match1001; + case 0b1010: + return this::match1010; + case 0b1011: + return this::match1011; + case 0b1100: + return this::match1100; + case 0b1101: + return this::match1101; + case 0b1110: + return this::match1110; + case 0b1111: + return this::match1111; + default: + throw new IllegalStateException("Unsupported matcher mask: " + mask); + } + } + + private boolean match0000(ByteBuffer other) { + return true; + } + + private boolean match0001(ByteBuffer other) { + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + return length0 == 1 || cmp0.equals(otherFirst0, other); + } + return false; + } + + private boolean match0010(ByteBuffer other) { + + skipAhead(other); + + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + return length1 == 1 || cmp1.equals(otherFirst1, other); + } + return false; + } + + private boolean match0011(ByteBuffer other) { + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + return length1 == 1 || cmp1.equals(otherFirst1, other); + } + } + } + + return false; + } + + private boolean match0100(ByteBuffer other) { + + skipAhead(other); + skipAhead(other); + + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + return length2 == 1 || cmp2.equals(otherFirst2, other); + } + return false; + } + + private boolean match0101(ByteBuffer other) { + + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + skipAhead(other); + + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + return length2 == 1 || cmp2.equals(otherFirst2, other); + } + } + } + return false; + } + + private boolean match0110(ByteBuffer other) { + + skipAhead(other); + + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + if (length1 == 1 || cmp1.equals(otherFirst1, other)) { + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + return length2 == 1 || cmp2.equals(otherFirst2, other); + } + } + } + return false; + } + + private void skipAhead(ByteBuffer other) { + int i = firstToLength(other.get()) - 1; + assert i >= 0; + if (i > 0) { + other.position(i + other.position()); + } + } + + private boolean match0111(ByteBuffer other) { + + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + if (length1 == 1 || cmp1.equals(otherFirst1, other)) { + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + return length2 == 1 || cmp2.equals(otherFirst2, other); + } + } + } + } + } + return false; + } + + private boolean match1000(ByteBuffer other) { + + skipAhead(other); + skipAhead(other); + skipAhead(other); + + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + return false; + } + + private boolean match1001(ByteBuffer other) { + + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + skipAhead(other); + skipAhead(other); + + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + return false; + } + + private boolean match1010(ByteBuffer other) { + + skipAhead(other); + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + if (length1 == 1 || cmp1.equals(otherFirst1, other)) { + skipAhead(other); + + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + return false; + } + + private boolean match1011(ByteBuffer other) { + + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + if (length1 == 1 || cmp1.equals(otherFirst1, other)) { + skipAhead(other); + + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + } + } + return false; + } + + private boolean match1100(ByteBuffer other) { + + skipAhead(other); + skipAhead(other); + + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + if (length2 == 1 || cmp2.equals(otherFirst2, other)) { + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + return false; + } + + private boolean match1101(ByteBuffer other) { + + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + skipAhead(other); + + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + if (length2 == 1 || cmp2.equals(otherFirst2, other)) { + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + } + } + return false; + } + + private boolean match1110(ByteBuffer other) { + + skipAhead(other); + + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + if (length1 == 1 || cmp1.equals(otherFirst1, other)) { + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + if (length2 == 1 || cmp2.equals(otherFirst2, other)) { + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + } + } + return false; + } + + private boolean match1111(ByteBuffer other) { + byte otherFirst0 = other.get(); + if (firstByte0 == otherFirst0) { + if (length0 == 1 || cmp0.equals(otherFirst0, other)) { + byte otherFirst1 = other.get(); + if (firstByte1 == otherFirst1) { + if (length1 == 1 || cmp1.equals(otherFirst1, other)) { + byte otherFirst2 = other.get(); + if (firstByte2 == otherFirst2) { + if (length2 == 1 || cmp2.equals(otherFirst2, other)) { + byte otherFirst3 = other.get(); + if (firstByte3 == otherFirst3) { + return length3 == 1 || cmp3.equals(otherFirst3, other); + } + } + } + } + } + } + } + return false; + } + +} diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/SignificantBytesBE.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/SignificantBytesBE.java new file mode 100644 index 00000000000..a335023bb73 --- /dev/null +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/SignificantBytesBE.java @@ -0,0 +1,95 @@ +/******************************************************************************* + * Copyright (c) 2025 Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + ******************************************************************************/ + +package org.eclipse.rdf4j.sail.lmdb.util; + +import java.nio.BufferUnderflowException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; + +/** + * Fast reader for 1..8 "significant" big-endian bytes from a ByteBuffer. Chooses optimal path at runtime: - heap-backed + * buffers: raw array indexing (no per-byte virtual calls), - direct/read-only buffers: absolute wide reads + + * conditional byte-swap. + * + * Returns an unsigned value in the low bits of the long (0 .. 2^(8*n)-1). + */ +public final class SignificantBytesBE { + private SignificantBytesBE() { + } + + /** + * Read n (1..8) big-endian significant bytes from the buffer and advance position by n. + * + * @throws IllegalArgumentException if n is not in [1,8] + * @throws BufferUnderflowException if fewer than n bytes remain + */ + public static long read(ByteBuffer bb, int n) { + return readDirect(bb, n); + } + + // -------- Direct/read-only fast path (absolute wide reads + conditional byte swap) -------- + + private static long u32(int x) { + return x & 0xFFFF_FFFFL; + } + + private static int u16(short x) { + return x & 0xFFFF; + } + + private static short getShortBE(ByteBuffer bb, boolean littleEndian) { + short s = bb.getShort(); + return (littleEndian) ? Short.reverseBytes(s) : s; + } + + private static int getIntBE(ByteBuffer bb, boolean littleEndian) { + int i = bb.getInt(); + return (littleEndian) ? Integer.reverseBytes(i) : i; + } + + private static long getLongBE(ByteBuffer bb, boolean littleEndian) { + long l = bb.getLong(); + return (littleEndian) ? Long.reverseBytes(l) : l; + } + + public static long readDirect(ByteBuffer bb, int n) { + if (n < 3 || n > 8) { + throw new IllegalArgumentException("n must be in [3,8]"); + } + + boolean littleEndian = bb.order() == ByteOrder.LITTLE_ENDIAN; + + switch (n) { + case 8: + return getLongBE(bb, littleEndian); + case 7: + return ((bb.get() & 0xFFL) << 48) + | ((u32(getIntBE(bb, littleEndian)) << 16)) + | (u16(getShortBE(bb, littleEndian))); + case 6: + return (((long) u16(getShortBE(bb, littleEndian)) << 32)) + | (u32(getIntBE(bb, littleEndian))); + case 5: + return ((bb.get() & 0xFFL) << 32) + | (u32(getIntBE(bb, littleEndian))); + case 4: + return u32(getIntBE(bb, littleEndian)); + case 3: + return (((long) u16(getShortBE(bb, littleEndian)) << 8)) + | (bb.get() & 0xFFL); + // TODO: add 1 and 2 byte cases here!!! + default: + throw new AssertionError("unreachable"); + } + } + +} diff --git a/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/package-info.java b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/package-info.java new file mode 100644 index 00000000000..6501904cc48 --- /dev/null +++ b/core/sail/lmdb/src/main/java/org/eclipse/rdf4j/sail/lmdb/util/package-info.java @@ -0,0 +1,21 @@ +/******************************************************************************* + * Copyright (c) 2020 Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + *******************************************************************************/ + +/** + * @apiNote This feature is for internal use only: its existence, signature or behavior may change without warning from + * one release to the next. + */ + +@InternalUseOnly + +package org.eclipse.rdf4j.sail.lmdb.util; + +import org.eclipse.rdf4j.common.annotation.InternalUseOnly; diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/GroupMatcherTest.java b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/GroupMatcherTest.java new file mode 100644 index 00000000000..fe9121fbddc --- /dev/null +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/GroupMatcherTest.java @@ -0,0 +1,278 @@ +/******************************************************************************* + * Copyright (c) 2025 Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + ******************************************************************************/ +package org.eclipse.rdf4j.sail.lmdb; + +import static org.junit.jupiter.api.Assertions.assertFalse; +import static org.junit.jupiter.api.Assertions.assertTrue; + +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +import org.eclipse.rdf4j.sail.lmdb.util.GroupMatcher; +import org.junit.jupiter.api.Test; + +class GroupMatcherTest { + + private static final int FIELD_COUNT = 4; + private static final int MAX_LENGTH = 9; + + private static final ValueVariants[] VALUE_VARIANTS = buildValueVariants(); + private static final List ALL_LENGTH_COMBINATIONS = buildAllLengthCombinations(); + private static final CandidateStrategy[] CANDIDATE_STRATEGIES = CandidateStrategy.values(); + + @Test + void coversEveryMatcherMaskAcrossAllLengthCombinations() { + for (int mask = 0; mask < (1 << FIELD_COUNT); mask++) { + final int maskBits = mask; + boolean[] shouldMatch = maskToArray(mask); + for (byte[] valueLengths : ALL_LENGTH_COMBINATIONS) { + final byte[] lengthsRef = valueLengths; + long[] referenceValues = valuesForLengths(valueLengths); + GroupMatcher matcher = new GroupMatcher(referenceValues[0], referenceValues[1], + referenceValues[2], referenceValues[3], shouldMatch); + + for (CandidateStrategy strategy : CANDIDATE_STRATEGIES) { + final CandidateStrategy strategyRef = strategy; + long[] candidateValues = buildCandidateValues(referenceValues, valueLengths, shouldMatch, strategy); + final long[] candidateCopy = candidateValues; + ByteBuffer matchBuffer = encode(candidateCopy); + + assertTrue(matcher.matches(matchBuffer.duplicate()), + () -> failureMessage("expected match", maskBits, lengthsRef, strategyRef, candidateCopy, + null)); + + if (hasMatch(shouldMatch)) { + for (int index = 0; index < FIELD_COUNT; index++) { + if (!shouldMatch[index]) { + continue; + } + for (MismatchType mismatchType : MismatchType.values()) { + long[] mismatchValues = createMismatch(candidateCopy, lengthsRef, index, mismatchType); + if (mismatchValues == null) { + continue; + } + final long[] mismatchCopy = mismatchValues; + ByteBuffer mismatchBuffer = encode(mismatchCopy); + assertFalse(matcher.matches(mismatchBuffer.duplicate()), + () -> failureMessage("expected mismatch", + maskBits, lengthsRef, strategyRef, mismatchCopy, mismatchType)); + } + } + } + } + } + } + } + + private static long[] valuesForLengths(byte[] lengthIndices) { + long[] values = new long[FIELD_COUNT]; + for (int i = 0; i < FIELD_COUNT; i++) { + int lengthIndex = Byte.toUnsignedInt(lengthIndices[i]); + values[i] = VALUE_VARIANTS[lengthIndex].base; + } + return values; + } + + private static long[] buildCandidateValues(long[] referenceValues, byte[] valueLengths, boolean[] shouldMatch, + CandidateStrategy strategy) { + long[] candidateValues = new long[FIELD_COUNT]; + for (int i = 0; i < FIELD_COUNT; i++) { + if (shouldMatch[i]) { + candidateValues[i] = referenceValues[i]; + } else { + int lengthIndex = selectLengthIndex(valueLengths, i, strategy); + candidateValues[i] = VALUE_VARIANTS[lengthIndex].nonMatchingSameLength; + } + } + return candidateValues; + } + + private static int selectLengthIndex(byte[] lengths, int position, CandidateStrategy strategy) { + int base = Byte.toUnsignedInt(lengths[position]); + switch (strategy) { + case SAME_LENGTHS: + return base; + case ROTATED_LENGTHS: + return Byte.toUnsignedInt(lengths[(position + 1) % FIELD_COUNT]); + case INCREMENTED_LENGTHS: + return base == MAX_LENGTH ? 1 : base + 1; + default: + throw new IllegalStateException("Unsupported strategy: " + strategy); + } + } + + private static ByteBuffer encode(long[] values) { + ByteBuffer buffer = ByteBuffer + .allocate(Varint.calcListLengthUnsigned(values[0], values[1], values[2], values[3])); + for (long value : values) { + Varint.writeUnsigned(buffer, value); + } + buffer.flip(); + return buffer; + } + + private static boolean[] maskToArray(int mask) { + boolean[] shouldMatch = new boolean[FIELD_COUNT]; + for (int i = 0; i < FIELD_COUNT; i++) { + shouldMatch[i] = (mask & (1 << i)) != 0; + } + return shouldMatch; + } + + private static boolean hasMatch(boolean[] shouldMatch) { + for (boolean flag : shouldMatch) { + if (flag) { + return true; + } + } + return false; + } + + private static int firstMatchedIndex(boolean[] shouldMatch) { + for (int i = 0; i < FIELD_COUNT; i++) { + if (shouldMatch[i]) { + return i; + } + } + return -1; + } + + private static List buildAllLengthCombinations() { + List combos = new ArrayList<>((int) Math.pow(MAX_LENGTH, FIELD_COUNT)); + buildCombos(combos, new byte[FIELD_COUNT], 0); + return combos; + } + + private static void buildCombos(List combos, byte[] current, int index) { + if (index == FIELD_COUNT) { + combos.add(current.clone()); + return; + } + for (int len = 1; len <= MAX_LENGTH; len++) { + current[index] = (byte) len; + buildCombos(combos, current, index + 1); + } + } + + private static String failureMessage(String expectation, int mask, byte[] valueLengths, CandidateStrategy strategy, + long[] candidateValues, MismatchType mismatchType) { + return expectation + " for mask " + toMask(mask) + ", valueLengths=" + Arrays.toString(toIntArray(valueLengths)) + + ", strategy=" + strategy + + (mismatchType == null ? "" : ", mismatchType=" + mismatchType) + + ", candidate=" + Arrays.toString(candidateValues); + } + + private static String toMask(int mask) { + return String.format("%4s", Integer.toBinaryString(mask)).replace(' ', '0'); + } + + private static int[] toIntArray(byte[] values) { + int[] ints = new int[values.length]; + for (int i = 0; i < values.length; i++) { + ints[i] = Byte.toUnsignedInt(values[i]); + } + return ints; + } + + private static long[] createMismatch(long[] baseCandidate, byte[] valueLengths, int index, + MismatchType mismatchType) { + int lengthIndex = Byte.toUnsignedInt(valueLengths[index]); + ValueVariants variants = VALUE_VARIANTS[lengthIndex]; + long replacement; + switch (mismatchType) { + case SAME_FIRST_BYTE: + if (variants.sameFirstVariant == null) { + return null; + } + replacement = variants.sameFirstVariant; + break; + case DIFFERENT_FIRST_BYTE: + replacement = variants.differentFirstVariant; + break; + default: + throw new IllegalStateException("Unsupported mismatch type: " + mismatchType); + } + if (replacement == baseCandidate[index]) { + return null; + } + long[] mismatch = baseCandidate.clone(); + mismatch[index] = replacement; + return mismatch; + } + + private static ValueVariants[] buildValueVariants() { + ValueVariants[] variants = new ValueVariants[MAX_LENGTH + 1]; + variants[1] = new ValueVariants(42L, 99L, null, 99L); + variants[2] = new ValueVariants(241L, 330L, 330L, 600L); + variants[3] = new ValueVariants(50_000L, 60_000L, 60_000L, 70_000L); + variants[4] = new ValueVariants(1_048_576L, 1_048_577L, 1_048_577L, 16_777_216L); + variants[5] = new ValueVariants(16_777_216L, 16_777_217L, 16_777_217L, 4_294_967_296L); + variants[6] = new ValueVariants(4_294_967_296L, 4_294_967_297L, 4_294_967_297L, 1_099_511_627_776L); + variants[7] = new ValueVariants(1_099_511_627_776L, 1_099_511_627_777L, 1_099_511_627_777L, + 281_474_976_710_656L); + variants[8] = new ValueVariants(281_474_976_710_656L, 281_474_976_710_657L, 281_474_976_710_657L, + 72_057_594_037_927_936L); + variants[9] = new ValueVariants(72_057_594_037_927_936L, 72_057_594_037_927_937L, + 72_057_594_037_927_937L, 281_474_976_710_656L); + + for (int len = 1; len <= MAX_LENGTH; len++) { + ValueVariants v = variants[len]; + if (Varint.calcLengthUnsigned(v.base) != len) { + throw new IllegalStateException("Unexpected length for base value " + v.base + " (len=" + len + ")"); + } + if (Varint.calcLengthUnsigned(v.nonMatchingSameLength) != len) { + throw new IllegalStateException( + "Unexpected length for same-length variant " + v.nonMatchingSameLength + " (len=" + len + ")"); + } + if (v.sameFirstVariant != null && firstByte(v.sameFirstVariant.longValue()) != firstByte(v.base)) { + throw new IllegalStateException("Expected same-first variant to share header for length " + len); + } + if (firstByte(v.differentFirstVariant) == firstByte(v.base)) { + throw new IllegalStateException("Expected different-first variant to differ for length " + len); + } + } + + return variants; + } + + private static byte firstByte(long value) { + ByteBuffer buffer = ByteBuffer.allocate(Varint.calcLengthUnsigned(value)); + Varint.writeUnsigned(buffer, value); + return buffer.array()[0]; + } + + private static final class ValueVariants { + final long base; + final long nonMatchingSameLength; + final Long sameFirstVariant; + final long differentFirstVariant; + + ValueVariants(long base, long nonMatchingSameLength, Long sameFirstVariant, long differentFirstVariant) { + this.base = base; + this.nonMatchingSameLength = nonMatchingSameLength; + this.sameFirstVariant = sameFirstVariant; + this.differentFirstVariant = differentFirstVariant; + } + } + + private enum MismatchType { + SAME_FIRST_BYTE, + DIFFERENT_FIRST_BYTE + } + + private enum CandidateStrategy { + SAME_LENGTHS, + ROTATED_LENGTHS, + INCREMENTED_LENGTHS + } +} diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/GroupMatcherTest2.java b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/GroupMatcherTest2.java new file mode 100644 index 00000000000..00f58e384ab --- /dev/null +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/GroupMatcherTest2.java @@ -0,0 +1,312 @@ +///******************************************************************************* +// * Copyright (c) 2025 Eclipse RDF4J contributors. +// * +// * All rights reserved. This program and the accompanying materials +// * are made available under the terms of the Eclipse Distribution License v1.0 +// * which accompanies this distribution, and is available at +// * http://www.eclipse.org/org/documents/edl-v10.php. +// * +// * SPDX-License-Identifier: BSD-3-Clause +// ******************************************************************************/ +//package org.eclipse.rdf4j.sail.lmdb; +// +//import org.eclipse.rdf4j.sail.lmdb.util.GroupMatcher; +//import org.junit.jupiter.api.DynamicTest; +//import org.junit.jupiter.api.TestFactory; +// +//import java.nio.ByteBuffer; +//import java.util.ArrayList; +//import java.util.Arrays; +//import java.util.List; +//import java.util.Optional; +//import java.util.stream.IntStream; +//import java.util.stream.Stream; +// +//import static org.junit.jupiter.api.Assertions.assertFalse; +//import static org.junit.jupiter.api.Assertions.assertTrue; +// +//class GroupMatcherTest2 { +// +// private static final int FIELD_COUNT = 4; +// private static final int MAX_LENGTH = 9; +// +// private static final ValueVariants[] VALUE_VARIANTS = buildValueVariants(); +// private static final List ALL_LENGTH_COMBINATIONS = buildAllLengthCombinations(); +// private static final CandidateStrategy[] CANDIDATE_STRATEGIES = CandidateStrategy.values(); +// +// @TestFactory +// Stream coversEveryMatcherMaskAcrossAllLengthCombinations() { +// return IntStream.range(0, 1 << FIELD_COUNT) +// .mapToObj(Integer::valueOf) +// .flatMap(this::dynamicTestsForMask); +// } +// +// private Stream dynamicTestsForMask(int maskBits) { +// boolean[] shouldMatch = maskToArray(maskBits); +// return ALL_LENGTH_COMBINATIONS.stream() +// .flatMap(lengths -> dynamicTestsForLengths(maskBits, shouldMatch, lengths)); +// } +// +// private Stream dynamicTestsForLengths(int maskBits, boolean[] shouldMatch, byte[] valueLengths) { +// return Arrays.stream(CANDIDATE_STRATEGIES) +// .flatMap(strategy -> dynamicTestsForStrategy(maskBits, shouldMatch, valueLengths, strategy)); +// } +// +// private Stream dynamicTestsForStrategy(int maskBits, boolean[] shouldMatch, byte[] valueLengths, +// CandidateStrategy strategy) { +// long[] referenceValues = valuesForLengths(valueLengths); +// Stream matchTest = Stream.of(createMatchTest(maskBits, shouldMatch, valueLengths, strategy, +// referenceValues)); +// Stream mismatchTests = hasMatch(shouldMatch) +// ? dynamicMismatchTests(maskBits, shouldMatch, valueLengths, referenceValues, strategy) +// : Stream.empty(); +// return Stream.concat(matchTest, mismatchTests); +// } +// +// private DynamicTest createMatchTest(int maskBits, boolean[] shouldMatch, byte[] valueLengths, +// CandidateStrategy strategy, long[] referenceValues) { +// String displayName = "match mask=" + toMask(maskBits) + ", valueLengths=" +// + Arrays.toString(toIntArray(valueLengths)) +// + ", strategy=" + strategy; +// return DynamicTest.dynamicTest(displayName, () -> { +// boolean[] shouldMatchCopy = Arrays.copyOf(shouldMatch, shouldMatch.length); +// GroupMatcher matcher = new GroupMatcher(encode(referenceValues).duplicate(), shouldMatchCopy); +// long[] candidateValues = buildCandidateValues(referenceValues, valueLengths, shouldMatchCopy, strategy); +// assertTrue(matcher.matches(encode(candidateValues).duplicate()), +// () -> failureMessage("expected match", maskBits, valueLengths, strategy, candidateValues, null)); +// }); +// } +// +// private Stream dynamicMismatchTests(int maskBits, boolean[] shouldMatch, byte[] valueLengths, +// long[] referenceValues, CandidateStrategy strategy) { +// return IntStream.range(0, FIELD_COUNT) +// .filter(index -> shouldMatch[index]) +// .mapToObj(Integer::valueOf) +// .flatMap(index -> Arrays.stream(MismatchType.values()) +// .map(mismatchType -> createMismatchTest(maskBits, shouldMatch, valueLengths, referenceValues, +// strategy, +// index, mismatchType)) +// .flatMap(Optional::stream)); +// } +// +// private Optional createMismatchTest(int maskBits, boolean[] shouldMatch, byte[] valueLengths, +// long[] referenceValues, CandidateStrategy strategy, int index, MismatchType mismatchType) { +// long[] candidateValues = buildCandidateValues(referenceValues, valueLengths, shouldMatch, strategy); +// long[] mismatchValues = createMismatch(candidateValues, valueLengths, index, mismatchType); +// if (mismatchValues == null) { +// return Optional.empty(); +// } +// String displayName = "mismatch mask=" + toMask(maskBits) + ", valueLengths=" +// + Arrays.toString(toIntArray(valueLengths)) + ", strategy=" + strategy + ", index=" + index + ", type=" +// + mismatchType; +// return Optional.of(DynamicTest.dynamicTest(displayName, () -> { +// boolean[] shouldMatchCopy = Arrays.copyOf(shouldMatch, shouldMatch.length); +// GroupMatcher matcher = new GroupMatcher(encode(referenceValues).duplicate(), shouldMatchCopy); +// assertFalse(matcher.matches(encode(mismatchValues).duplicate()), +// () -> failureMessage("expected mismatch", maskBits, valueLengths, strategy, mismatchValues, +// mismatchType)); +// })); +// } +// +// private static long[] valuesForLengths(byte[] lengthIndices) { +// long[] values = new long[FIELD_COUNT]; +// for (int i = 0; i < FIELD_COUNT; i++) { +// int lengthIndex = Byte.toUnsignedInt(lengthIndices[i]); +// values[i] = VALUE_VARIANTS[lengthIndex].base; +// } +// return values; +// } +// +// private static long[] buildCandidateValues(long[] referenceValues, byte[] valueLengths, boolean[] shouldMatch, +// CandidateStrategy strategy) { +// long[] candidateValues = new long[FIELD_COUNT]; +// for (int i = 0; i < FIELD_COUNT; i++) { +// if (shouldMatch[i]) { +// candidateValues[i] = referenceValues[i]; +// } else { +// int lengthIndex = selectLengthIndex(valueLengths, i, strategy); +// candidateValues[i] = VALUE_VARIANTS[lengthIndex].nonMatchingSameLength; +// } +// } +// return candidateValues; +// } +// +// private static int selectLengthIndex(byte[] lengths, int position, CandidateStrategy strategy) { +// int base = Byte.toUnsignedInt(lengths[position]); +// switch (strategy) { +// case SAME_LENGTHS: +// return base; +// case ROTATED_LENGTHS: +// return Byte.toUnsignedInt(lengths[(position + 1) % FIELD_COUNT]); +// case INCREMENTED_LENGTHS: +// return base == MAX_LENGTH ? 1 : base + 1; +// default: +// throw new IllegalStateException("Unsupported strategy: " + strategy); +// } +// } +// +// private static ByteBuffer encode(long[] values) { +// ByteBuffer buffer = ByteBuffer +// .allocate(Varint.calcListLengthUnsigned(values[0], values[1], values[2], values[3])); +// for (long value : values) { +// Varint.writeUnsigned(buffer, value); +// } +// buffer.flip(); +// return buffer; +// } +// +// private static boolean[] maskToArray(int mask) { +// boolean[] shouldMatch = new boolean[FIELD_COUNT]; +// for (int i = 0; i < FIELD_COUNT; i++) { +// shouldMatch[i] = (mask & (1 << i)) != 0; +// } +// return shouldMatch; +// } +// +// private static boolean hasMatch(boolean[] shouldMatch) { +// for (boolean flag : shouldMatch) { +// if (flag) { +// return true; +// } +// } +// return false; +// } +// +// private static int firstMatchedIndex(boolean[] shouldMatch) { +// for (int i = 0; i < FIELD_COUNT; i++) { +// if (shouldMatch[i]) { +// return i; +// } +// } +// return -1; +// } +// +// private static List buildAllLengthCombinations() { +// List combos = new ArrayList<>((int) Math.pow(MAX_LENGTH, FIELD_COUNT)); +// buildCombos(combos, new byte[FIELD_COUNT], 0); +// return combos; +// } +// +// private static void buildCombos(List combos, byte[] current, int index) { +// if (index == FIELD_COUNT) { +// combos.add(current.clone()); +// return; +// } +// for (int len = 1; len <= MAX_LENGTH; len++) { +// current[index] = (byte) len; +// buildCombos(combos, current, index + 1); +// } +// } +// +// private static String failureMessage(String expectation, int mask, byte[] valueLengths, CandidateStrategy strategy, +// long[] candidateValues, MismatchType mismatchType) { +// return expectation + " for mask " + toMask(mask) + ", valueLengths=" + Arrays.toString(toIntArray(valueLengths)) +// + ", strategy=" + strategy +// + (mismatchType == null ? "" : ", mismatchType=" + mismatchType) +// + ", candidate=" + Arrays.toString(candidateValues); +// } +// +// private static String toMask(int mask) { +// return String.format("%4s", Integer.toBinaryString(mask)).replace(' ', '0'); +// } +// +// private static int[] toIntArray(byte[] values) { +// int[] ints = new int[values.length]; +// for (int i = 0; i < values.length; i++) { +// ints[i] = Byte.toUnsignedInt(values[i]); +// } +// return ints; +// } +// +// private static long[] createMismatch(long[] baseCandidate, byte[] valueLengths, int index, +// MismatchType mismatchType) { +// int lengthIndex = Byte.toUnsignedInt(valueLengths[index]); +// ValueVariants variants = VALUE_VARIANTS[lengthIndex]; +// long replacement; +// switch (mismatchType) { +// case SAME_FIRST_BYTE: +// if (variants.sameFirstVariant == null) { +// return null; +// } +// replacement = variants.sameFirstVariant; +// break; +// case DIFFERENT_FIRST_BYTE: +// replacement = variants.differentFirstVariant; +// break; +// default: +// throw new IllegalStateException("Unsupported mismatch type: " + mismatchType); +// } +// if (replacement == baseCandidate[index]) { +// return null; +// } +// long[] mismatch = baseCandidate.clone(); +// mismatch[index] = replacement; +// return mismatch; +// } +// +// private static ValueVariants[] buildValueVariants() { +// ValueVariants[] variants = new ValueVariants[MAX_LENGTH + 1]; +// variants[1] = new ValueVariants(42L, 99L, null, 99L); +// variants[2] = new ValueVariants(241L, 330L, 330L, 600L); +// variants[3] = new ValueVariants(50_000L, 60_000L, 60_000L, 70_000L); +// variants[4] = new ValueVariants(1_048_576L, 1_048_577L, 1_048_577L, 16_777_216L); +// variants[5] = new ValueVariants(16_777_216L, 16_777_217L, 16_777_217L, 4_294_967_296L); +// variants[6] = new ValueVariants(4_294_967_296L, 4_294_967_297L, 4_294_967_297L, 1_099_511_627_776L); +// variants[7] = new ValueVariants(1_099_511_627_776L, 1_099_511_627_777L, 1_099_511_627_777L, +// 281_474_976_710_656L); +// variants[8] = new ValueVariants(281_474_976_710_656L, 281_474_976_710_657L, 281_474_976_710_657L, +// 72_057_594_037_927_936L); +// variants[9] = new ValueVariants(72_057_594_037_927_936L, 72_057_594_037_927_937L, +// 72_057_594_037_927_937L, 281_474_976_710_656L); +// +// for (int len = 1; len <= MAX_LENGTH; len++) { +// ValueVariants v = variants[len]; +// if (Varint.calcLengthUnsigned(v.base) != len) { +// throw new IllegalStateException("Unexpected length for base value " + v.base + " (len=" + len + ")"); +// } +// if (Varint.calcLengthUnsigned(v.nonMatchingSameLength) != len) { +// throw new IllegalStateException( +// "Unexpected length for same-length variant " + v.nonMatchingSameLength + " (len=" + len + ")"); +// } +// if (v.sameFirstVariant != null && firstByte(v.sameFirstVariant.longValue()) != firstByte(v.base)) { +// throw new IllegalStateException("Expected same-first variant to share header for length " + len); +// } +// if (firstByte(v.differentFirstVariant) == firstByte(v.base)) { +// throw new IllegalStateException("Expected different-first variant to differ for length " + len); +// } +// } +// +// return variants; +// } +// +// private static byte firstByte(long value) { +// ByteBuffer buffer = ByteBuffer.allocate(Varint.calcLengthUnsigned(value)); +// Varint.writeUnsigned(buffer, value); +// return buffer.array()[0]; +// } +// +// private static final class ValueVariants { +// final long base; +// final long nonMatchingSameLength; +// final Long sameFirstVariant; +// final long differentFirstVariant; +// +// ValueVariants(long base, long nonMatchingSameLength, Long sameFirstVariant, long differentFirstVariant) { +// this.base = base; +// this.nonMatchingSameLength = nonMatchingSameLength; +// this.sameFirstVariant = sameFirstVariant; +// this.differentFirstVariant = differentFirstVariant; +// } +// } +// +// private enum MismatchType { +// SAME_FIRST_BYTE, +// DIFFERENT_FIRST_BYTE +// } +// +// private enum CandidateStrategy { +// SAME_LENGTHS, +// ROTATED_LENGTHS, +// INCREMENTED_LENGTHS +// } +//} diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStoreTest.java b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStoreTest.java index 2e416067a18..b735074c00c 100644 --- a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStoreTest.java +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/LmdbSailStoreTest.java @@ -16,6 +16,8 @@ import java.io.File; +import org.eclipse.rdf4j.common.iteration.CloseableIteration; +import org.eclipse.rdf4j.common.iteration.EmptyIteration; import org.eclipse.rdf4j.common.transaction.IsolationLevels; import org.eclipse.rdf4j.model.IRI; import org.eclipse.rdf4j.model.Resource; @@ -28,6 +30,8 @@ import org.eclipse.rdf4j.repository.Repository; import org.eclipse.rdf4j.repository.RepositoryConnection; import org.eclipse.rdf4j.repository.sail.SailRepository; +import org.eclipse.rdf4j.sail.SailException; +import org.eclipse.rdf4j.sail.base.SailDataset; import org.eclipse.rdf4j.sail.lmdb.config.LmdbStoreConfig; import org.junit.jupiter.api.AfterEach; import org.junit.jupiter.api.BeforeEach; @@ -193,6 +197,18 @@ public void testPassConnectionBetweenThreadsWithTx() throws InterruptedException } } + @Test + public void testInferredSourceHasEmptyIterationWithoutInferredStatements() throws SailException { + LmdbStore sail = (LmdbStore) ((SailRepository) repo).getSail(); + LmdbSailStore backingStore = sail.getBackingStore(); + + try (SailDataset dataset = backingStore.getInferredSailSource().dataset(IsolationLevels.NONE); + CloseableIteration iteration = dataset.getStatements(null, null, null)) { + assertTrue(iteration instanceof EmptyIteration); + assertFalse(iteration.hasNext()); + } + } + @AfterEach public void after() { repo.shutDown(); diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/README.md b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/README.md new file mode 100644 index 00000000000..58f6273cc2e --- /dev/null +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/README.md @@ -0,0 +1,17 @@ +Original benchmark results + +``` +Benchmark Mode Cnt Score Error Units +QueryBenchmark.complexQuery avgt 5 6.786 ± 0.968 ms/op +QueryBenchmark.different_datasets_with_similar_distributions avgt 5 4.056 ± 0.040 ms/op +QueryBenchmark.groupByQuery avgt 5 1.425 ± 0.005 ms/op +QueryBenchmark.long_chain avgt 5 1180.336 ± 46.383 ms/op +QueryBenchmark.lots_of_optional avgt 5 428.926 ± 7.985 ms/op +QueryBenchmark.minus avgt 5 1042.468 ± 46.901 ms/op +QueryBenchmark.nested_optionals avgt 5 254.052 ± 4.293 ms/op +QueryBenchmark.pathExpressionQuery1 avgt 5 44.147 ± 1.200 ms/op +QueryBenchmark.pathExpressionQuery2 avgt 5 10.732 ± 0.176 ms/op +QueryBenchmark.query_distinct_predicates avgt 5 70.255 ± 3.541 ms/op +QueryBenchmark.simple_filter_not avgt 5 11.890 ± 0.761 ms/op + +``` diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/RecordIteratorBenchmark.java b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/RecordIteratorBenchmark.java index b26b86a6279..25a4a689eb6 100644 --- a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/RecordIteratorBenchmark.java +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/RecordIteratorBenchmark.java @@ -18,8 +18,12 @@ import org.apache.commons.io.FileUtils; import org.assertj.core.util.Files; import org.eclipse.rdf4j.sail.lmdb.config.LmdbStoreConfig; +import org.openjdk.jmh.Main; import org.openjdk.jmh.annotations.*; import org.openjdk.jmh.infra.Blackhole; +import org.openjdk.jmh.runner.Runner; +import org.openjdk.jmh.runner.options.Options; +import org.openjdk.jmh.runner.options.OptionsBuilder; /** * @author Piotr Sowiński @@ -27,10 +31,10 @@ @State(Scope.Benchmark) @Warmup(iterations = 5) @BenchmarkMode({ Mode.AverageTime }) -@Fork(value = 4, jvmArgs = { "-Xms1G", "-Xmx1G" }) +@Fork(value = 1, jvmArgs = { "-Xms1G", "-Xmx1G" }) //@Fork(value = 1, jvmArgs = {"-Xms1G", "-Xmx1G", "-XX:StartFlightRecording=jdk.CPUTimeSample#enabled=true,filename=profile.jfr,method-profiling=max","-XX:FlightRecorderOptions=stackdepth=1024", "-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints"}) @Threads(value = 8) -@Measurement(iterations = 10) +@Measurement(iterations = 5) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class RecordIteratorBenchmark { @@ -67,4 +71,18 @@ public void iterateAll(Blackhole blackhole) throws IOException { } } } + + public static void main(String[] args) throws Exception { + if (args != null && args.length > 0) { + Main.main(args); + return; + } + + Options options = new OptionsBuilder() + .include(RecordIteratorBenchmark.class.getSimpleName() + ".iterateAll") + .forks(0) + .build(); + + new Runner(options).run(); + } } diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/VarintTest.java b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/VarintTest.java index 79cdf1b293d..fef3bd67313 100644 --- a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/VarintTest.java +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/VarintTest.java @@ -14,6 +14,7 @@ import static org.junit.Assert.assertEquals; import java.nio.ByteBuffer; +import java.nio.ByteOrder; import org.junit.jupiter.api.Test; @@ -26,7 +27,7 @@ public class VarintTest { @Test public void testVarint() { - ByteBuffer bb = ByteBuffer.allocate(9); + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); for (int i = 0; i < values.length; i++) { bb.clear(); Varint.writeUnsigned(bb, values[i]); @@ -36,9 +37,94 @@ public void testVarint() { } } + @Test + public void testVarint2() { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, values[1]); + bb.flip(); + assertEquals("Encoding should use " + (2) + " bytes", 2, bb.remaining()); + assertEquals("Encoded and decoded value should be equal", values[1], Varint.readUnsigned(bb)); + + } + + @Test + public void testVarint3() { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, 67823); + bb.flip(); + assertEquals("Encoded and decoded value should be equal", 67823, Varint.readUnsigned(bb)); + + } + + @Test + public void testVarint4() { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, 67824); + bb.flip(); + assertEquals("Encoded and decoded value should be equal", 67824, Varint.readUnsigned(bb)); + + } + + @Test + public void testVarint5() { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, 4299999999L); + bb.flip(); + assertEquals("Encoded and decoded value should be equal", 4299999999L, Varint.readUnsigned(bb)); + + } + + @Test + public void testVarintSequential() { + for (long i = 0; i < 99999999; i++) { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, i); + bb.flip(); + try { + assertEquals("Encoded and decoded value should be equal", i, Varint.readUnsigned(bb)); + } catch (Exception e) { + System.err.println("Failed for i=" + i); + throw e; + } + } + + for (long i = 99999999; i < 999999999999999L; i += 10000000) { + try { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, i); + bb.flip(); + + assertEquals("Encoded and decoded value should be equal", i, Varint.readUnsigned(bb)); + } catch (Exception e) { + System.err.println("Failed for i=" + i); + throw e; + } + } + + for (long i = Long.MAX_VALUE; i > Long.MAX_VALUE - 999999L; i -= 1) { + ByteBuffer bb = ByteBuffer.allocate(9).order(ByteOrder.nativeOrder()); + bb.clear(); + Varint.writeUnsigned(bb, i); + bb.flip(); + try { + assertEquals("Encoded and decoded value should be equal", i, Varint.readUnsigned(bb)); + } catch (Exception e) { + System.err.println("Failed for i=" + i); + throw e; + } + } + + } + @Test public void testVarintList() { - ByteBuffer bb = ByteBuffer.allocate(2 + 4 * Long.BYTES); + ByteBuffer bb = ByteBuffer.allocate(2 + 4 * Long.BYTES).order(ByteOrder.nativeOrder()); for (int i = 0; i < values.length - 4; i++) { long[] expected = new long[4]; System.arraycopy(values, 0, expected, 0, 4); @@ -50,4 +136,16 @@ public void testVarintList() { assertArrayEquals("Encoded and decoded value should be equal", expected, actual); } } + + @Test + public void testVarintReadUnsignedAtPositionThreeByteEncoding() { + long value = 3000L; + ByteBuffer bb = ByteBuffer.allocate(Varint.calcLengthUnsigned(value)) + .order(ByteOrder.nativeOrder()); + Varint.writeUnsigned(bb, value); + bb.flip(); + assertEquals("Expected three byte encoding", 3, bb.remaining()); + long decoded = Varint.readUnsigned(bb, 0); + assertEquals("Encoded and decoded value using positional read should match", value, decoded); + } } diff --git a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/benchmark/QueryBenchmark.java b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/benchmark/QueryBenchmark.java index 504b9cd3b5c..03df3ebee1d 100644 --- a/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/benchmark/QueryBenchmark.java +++ b/core/sail/lmdb/src/test/java/org/eclipse/rdf4j/sail/lmdb/benchmark/QueryBenchmark.java @@ -52,7 +52,7 @@ @Warmup(iterations = 5) @BenchmarkMode({ Mode.AverageTime }) @Fork(value = 1, jvmArgs = { "-Xms1G", "-Xmx1G" }) -//@Fork(value = 1, jvmArgs = {"-Xms1G", "-Xmx1G", "-XX:StartFlightRecording=delay=60s,duration=120s,filename=recording.jfr,settings=profile", "-XX:FlightRecorderOptions=samplethreads=true,stackdepth=1024", "-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints"}) +//@Fork(value = 1, jvmArgs = {"-Xms1G", "-Xmx1G", "-XX:StartFlightRecording=jdk.CPUTimeSample#enabled=true,filename=profile.jfr,method-profiling=max","-XX:FlightRecorderOptions=stackdepth=1024", "-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints"}) @Measurement(iterations = 5) @OutputTimeUnit(TimeUnit.MILLISECONDS) public class QueryBenchmark { @@ -112,8 +112,8 @@ public class QueryBenchmark { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() - .include("QueryBenchmark.*") // adapt to run other benchmark tests - .forks(1) + .include("QueryBenchmark.complexQuery$") // adapt to run other benchmark tests + .forks(0) .build(); new Runner(opt).run();