Skip to content

fix(query): clarify condition resolution semantics for label queries#2994

Open
contrueCT wants to merge 10 commits into
apache:masterfrom
contrueCT:task/improve-condition-query-semantics
Open

fix(query): clarify condition resolution semantics for label queries#2994
contrueCT wants to merge 10 commits into
apache:masterfrom
contrueCT:task/improve-condition-query-semantics

Conversation

@contrueCT

Copy link
Copy Markdown
Contributor

Purpose of the PR

ConditionQuery.condition() currently mixes several different meanings in one API, including:

  • no condition
  • conflicting conditions resolved to empty
  • a unique resolved value
  • a raw multi-value result
  • an exception for ambiguous resolved values

This PR keeps the legacy condition() behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-risk LABEL call sites to use the clearer semantics.

Main Changes

  • Add explicit condition-resolution APIs to ConditionQuery
    • containsCondition(Object key)
    • conditionValues(Object key)
    • conditionValue(Object key)
  • Keep the legacy condition() method backward-compatible
  • Document the semantic differences between the legacy API and the new explicit APIs
  • Migrate LABEL-related high-risk callers to the new APIs in:
    • graph/index transactions
    • serializers
    • traversers
    • in-memory / hstore paths
  • Preserve the old behavior for non-LABEL legacy usages in this first step

Verifying these changes

Added and extended regression coverage for the new semantics:

  • QueryTest#testConditionWithoutLabel
  • QueryTest#testConditionWithEqAndIn
  • QueryTest#testConditionWithSingleInValues
  • QueryTest#testConditionWithConflictingEqAndIn
  • QueryTest#testConditionWithMultipleMatchedInValues

Added a targeted regression for the label-index fallback path:

  • VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedProperties

This test verifies:

  • a multi-label query can conservatively fall back and still match the indexed label
  • conflicting label conditions produce no matched indexes

Existing label-query regressions were also rechecked to ensure no behavior regression:

  • EdgeCoreTest#testQueryInEdgesOfVertexByLabels
  • EdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabels
  • EdgeCoreTest#testQueryInEdgesOfVertexBySortkey
  • VertexCoreTest#testQueryByJointLabels
  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='QueryTest#testConditionWithoutLabel+testConditionWithEqAndIn+testConditionWithSingleInValues+testConditionWithConflictingEqAndIn+testConditionWithMultipleMatchedInValues' test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='EdgeCoreTest#testQueryInEdgesOfVertexByLabels+testQueryInEdgesOfVertexByConflictingLabels+testQueryInEdgesOfVertexBySortkey' test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='VertexCoreTest#testQueryByJointLabels+testCollectMatchedIndexesByJointLabelsWithIndexedProperties' test

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 13, 2026
@contrueCT contrueCT changed the title improve(query): clarify condition resolution semantics for label queries fix(query): clarify condition resolution semantics for label queries Apr 19, 2026
@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.43816% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 31.60%. Comparing base (1f61c48) to head (9300691).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...he/hugegraph/backend/tx/GraphIndexTransaction.java 65.69% 40 Missing and 19 partials ⚠️
...apache/hugegraph/backend/query/ConditionQuery.java 92.42% 1 Missing and 4 partials ⚠️
...e/hugegraph/backend/serializer/TextSerializer.java 0.00% 5 Missing ⚠️
...he/hugegraph/backend/store/hstore/HstoreStore.java 0.00% 3 Missing ⚠️
...g/apache/hugegraph/backend/store/ram/RamTable.java 0.00% 2 Missing ⚠️
.../apache/hugegraph/backend/tx/GraphTransaction.java 81.81% 0 Missing and 2 partials ⚠️
...hugegraph/backend/serializer/BinarySerializer.java 80.00% 1 Missing ⚠️
...e/hugegraph/traversal/algorithm/HugeTraverser.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2994      +/-   ##
============================================
- Coverage     36.11%   31.60%   -4.51%     
- Complexity      338      500     +162     
============================================
  Files           803      814      +11     
  Lines         68234    69612    +1378     
  Branches       8964     9211     +247     
============================================
- Hits          24640    22004    -2636     
- Misses        40936    45143    +4207     
+ Partials       2658     2465     -193     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch 2 times, most recently from 4c42786 to cc9af24 Compare May 26, 2026 12:29

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one correctness issue in the latest revision. The CI failures were posted separately as a PR-level reminder.

CI/status checks are failing on the latest head (cc9af24929e42af1c90e1f55f3e60adc351e0318). Could you check the failed jobs before the next review round?

Failed checks include:

@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from cc9af24 to 2e82f83 Compare May 30, 2026 10:20

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a clear blocking correctness issue in the latest head, and the previous LABEL-resolution comments look addressed. One remaining merge risk is that the latest checks are still red: hstore failed in VertexCoreTest#testQueryByDateProperty.

Since this PR also touches HstoreStore, could you rerun or clarify whether the hstore failure is an existing flaky/environment issue?

contrueCT added 5 commits June 5, 2026 12:59
Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics.

Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.
@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from 94408b7 to b10e3c2 Compare June 5, 2026 05:10
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 5, 2026
@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from 801923a to ebc31c8 Compare June 5, 2026 18:06
@contrueCT

contrueCT commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for your patience. The hstore CI failure exposed an existing latent issue in hstore's range-index query path. For range-index scans with limit/paging, the upper layer assumed that backend scan results were globally ordered by the range-index key and that the returned page state could be reused as a HugeGraph range cursor. In hstore, multi-node/tablet scans can return entries in backend iterator order, and the page state is an internal storage cursor, so those assumptions may lead to unstable ordering or skipped results. This PR keeps the fix intentionally scoped: hstore range-index queries whose visible result depends on limit/offset/paging are sorted and sliced in the index layer, while unbounded scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths. I think this is enough for the current PR, but the underlying hstore scan/page-state contract should be handled in a dedicated follow-up, ideally by defining whether range scans must be globally ordered and fixing the hstore iterator/page-state semantics at the storage-client layer.

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: HStore range-index offset queries can skip too many sorted results. Evidence: static review of GraphIndexTransaction/query offset handling.

@contrueCT

Copy link
Copy Markdown
Contributor Author

Thanks. I fixed this by resetting scanQuery.offset(0L) before the full sorted range-index scan, so the fallback now reads the complete matched range first and lets the original query apply offset/limit only once after sorting. I also added range-offset coverage to VertexCoreTest#testQueryByDateProperty to guard the double-skip case. Local checks passed with git diff --check, hugegraph-core compile, and VertexCoreTest#testQueryByDateProperty under the rocksdb core-test profile.

@VGalaxies VGalaxies left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

  • Blocking: yes
  • Summary: The new HStore range-index holder can drop results in joint-index filtering and does not preserve the sorted range-index order into returned elements.
  • Evidence:
    • Static review of git diff origin/master...HEAD and surrounding IdHolder / QueryList code

@contrueCT contrueCT requested review from VGalaxies and imbajin June 11, 2026 03:15

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: Found one label-query regression in the current head. Evidence: related unit tests pass, but static path shows non-EQ label predicates still route into the strict label-index lookup.

assert label != null;
// Query-by-label builds a label index entry and requires one
// deterministically resolved label instead of best-effort fallback.
Id label = query.conditionValue(HugeKeys.LABEL);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Do not route non-EQ label predicates into queryByLabel()

Evidence: queryByUserprop() selects queryByLabel() whenever the only sysprop is LABEL, but queryByLabel() now immediately requires conditionValue(HugeKeys.LABEL), which only resolves EQ/IN label relations. A pure label predicate such as g.V().has(T.label, P.neq("author")) is converted by TraversalUtil into Condition.neq(HugeKeys.LABEL, ...), so it reaches this branch and then fails the single-label check.

Impact: label-only non-EQ traversals regress, while the new testQueryByNonEqLabelAndIndexedProperty() does not catch it because the extra city predicate sends that query through the user-prop path. Please gate the label-index fast path on an EQ/IN-resolvable label, for example containsConditionValues(HugeKeys.LABEL) or uniqueConditionValue(HugeKeys.LABEL) != null, and add pure label-only neq coverage for vertices and edges.

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve]: clarify ConditionQuery.condition() semantics for missing, conflicting, and multi-value conditions

3 participants