Add FederatedQueryPlanner #2216

lbschanno · 2024-01-12T21:24:55Z

Adds a FederatedQueryPlanner that will break up a query into multiple queries scanning over subsets of the original target date range if field index holes are identified to be present for the query in the target date range.

Note: the work in this PR is dependent on the work in:

Closes #825

Modify the generation of 'i' (indexed rows) and 'ri' (reverse indexed rows) in the metadata table such that the column qualifier contains the event date. This is required as a first step to support efforts for issue #825 so that we can identify dates when an event was ingested and included in a frequency count for an associated 'f' row, but was not indexed.

warehouse/query-core/src/main/java/datawave/query/tables/ShardQueryLogic.java

warehouse/query-core/src/main/java/datawave/query/planner/FederatedQueryPlanner.java

…e keys Removed some of the code which I believe was trying to diagnose the test issues

ivakegg · 2024-03-13T18:54:29Z

From @lbschanno

I have been working on getting tests to pass when the FederatedQueryPlanner is the default query planner. Some cases have shown that it may not be enough to simply use the first config returned from a sub-query as the finalized query string.

For instance, in MaxExpansionIndexOnlyQueryTest.testMaxAnyField(), the sub-queries have the following results when the max value expansion threshold is set to 20:

Result 0 over 2015/04/04-2015/10/09
Query String: false
Query Data Iterable: Empty

Result 1 over 2015/10/10-2015/10/10
Query String: (CODE == 'b-code' || CITY == 'b-city' || CITY == 'b-2' || CITY == 'b-1' || STATE == 'b-state') && (CODE == 'a-code' || CITY == 'a-1' || STATE == 'a-state' || STATE == 'a-s2')
Query Data Iterable: contains 3 query datas

Sub-query 2 over 2015/10/11-2015/11/11
Query String: false
Query Data Iterable: empty

In MaxExpansionIndexOnlyQueryTest.testMaxValueRegexIndexOnly(), we receive the following sub-query results when the max expansion threshold is set to 20:

Sub-query 0 over 2015/04/04-2015/10/09
Query String: CITY == 'a-1' && STATE =~ 'b.*'
Query Data Iterable: Empty

Sub-query 1 over 2015/10/10-2015/10/10
Query String: CITY == 'a-1' && (STATE == 'b3-state' || STATE == 'b-state' || STATE == 'bi-s' || STATE == 'b2-state' || STATE == 'ba-s2')
Query Data Iterable: contains 2 query datas

Sub-query 2 over 2015/10/11-2015/11/11
Query String: CITY == 'a-1' && STATE =~ 'b.*'
Query Data Iterable: Empty

In MaxExpansionIndexOnlyQueryTest.testMaxValueNegAnyField(), we receive the following sub-query results when the max expansion threshold is set to 10.

Sub-query 0 over 2015/04/04-2015/10/09
Query String: false
Query Data Iterable: Empty

Sub-query 1 over 2015/10/10-2015/10/10
Query String: STATE == 'b-state' && !(((Delayed = true) && (ANYFIELD =~ 'a.*')) || CODE == 'a-code' || CITY == 'a-1' || STATE == 'a-state' || STATE == 'a-s2')
Query Data Iterable: contains 4 query datas

Sub-query 2 over 2015/10/11-2015/11/11
Query String: false
Query Data Iterable: Empty

I have seen some similar results for MaxExpansionQueryTest and AnyFieldQueryTest. Given that we can have differing query strings, how do you want to handle determining which query string to set in the original config that's passed into the FederatedQueryPlanner.process() method? Do we need the query string unique to the query data iterable returned from each sub-queries when setting up the schedulers in ShardQueryLogic.setUpQuery(GenericQueryConfiguration config)? Is there somewhere else where we need to know the specific query strings from each sub-query?

I have pushed an update that adds tests to MaxExpansionIndexOnlyQueryTest with versions of each test using either the DefaultQueryPlanner or FederatedQueryPlanner so that you can see the results for yourself.

I suppose another question would be do the results above even look correct to you?

ivakegg · 2024-03-15T12:30:20Z

For documentation purposes, here is the response in the conversation we had:

The tests that test the query plan would need to be changed to handle the one returned by the federated query planner.
The federated query plan that gets returned should probably concatenate the plans (as a unique set) into something like this:
((plan = 1) && ()) || (plan = 2) && ()) ...
simply use if only one plan is in the set
We can work later to allow the query metrics to handle muliple top level plans after the sub-plan work is pulled in and proven viable.

…QueryPlanner

…xed fields for holes) * Fixed test cases with correct responses and periodic failing test cases * Updated AncestorQueryLogic to handle federate query planner

…QueryPlanner

lbschanno and others added 9 commits September 19, 2023 04:57

Merge branch 'integration' into task/datedIndexMetadata

0e4f806

Merge branch 'integration' into task/datedIndexMetadata

f2aa20b

Merge branch 'integration' into task/datedIndexMetadata

ab9ee4c

Merge branch 'integration' into task/datedIndexMetadata

43aeae8

Merge branch 'integration' into task/datedIndexMetadata

83014c1

Add counts to 'i' and 'ri' rows

ab14fe2

Merge branch 'integration' into task/datedIndexMetadata

9aad9f6

Initial federated query planner implementation

da6ee69

lbschanno requested a review from ivakegg January 12, 2024 21:24

lbschanno added 3 commits January 12, 2024 17:00

code formatting

b40201b

Fixed issues with FederatedQueryIterable

8ca62b3

Fix test failures

59d3be9

lbschanno changed the title ~~Task/federated query planner~~ Add FederatedQueryPlanner Jan 13, 2024

lbschanno added 2 commits January 13, 2024 12:45

Fix failing tests

bde374d

Additional test fixes

461a526

ivakegg requested changes Jan 16, 2024

View reviewed changes

lbschanno added 13 commits January 18, 2024 13:54

pr feedback

7784b4c

Use new MetadataHelper function version

b288196

Extract fields to filter index holes

c34a543

Correct logic for determining sub date ranges

e0ef160

Merge branch 'integration' into task/federatedQueryPlanner

9ea3a84

Remove unnecessary check

7052bcb

code formatting

d44e0f0

Merge branch 'integration' into task/federatedQueryPlanner

fca2fff

Add check for null query model

e341c72

Limit config arg to function scope

8779a99

Update metadata-utils submodule commit

a875a20

code formatting

cd20ca5

Merge branch 'integration' into task/federatedQueryPlanner

56e3bad

pr feedback

c019c78

lbschanno mentioned this pull request Feb 29, 2024

Allow fetching counts for date ranges spanning one day NationalSecurityAgency/datawave-metadata-utils#31

Merged

ivakegg and others added 12 commits March 6, 2024 14:07

metadata-utils 3.0.3 tag

9c72dab

Fixed the index hole data ingest to set appropriate time stamps on th…

e5af693

…e keys Removed some of the code which I believe was trying to diagnose the test issues

Merge branch 'integration' into task/federatedQueryPlanner

0842cec

Updated applyModel to use the passed in script

d774f6f

Remove unneeded changes

3da641d

Make FederatedQueryPlanner the default

235dad8

Restore original log4j.properties

b63b24c

Merge branch 'integration' into task/federatedQueryPlanner

0c4ccff

code formatting

5515fdc

Fix QueryPlanTest

fede1a0

Updated to test with teardown

6ed7a39

Test debugging edits

1d506e3

ivakegg and others added 5 commits March 13, 2024 18:54

Updated formatting

0578bc2

Concatenate sub-plans

e9a76e6

Merge branch 'integration' into task/federatedQueryPlanner

195dabe

Make FederatedQueryPlanner implement Cloneable

d6354ce

code formatting

c2a57f0

lbschanno and others added 9 commits June 3, 2024 19:54

Merge branch 'integration' into task/federatedQueryPlanner

6e1f131

Merge branch 'integration' into task/federatedQueryPlanner

a2da80c

Merge branch 'integration' into task/federatedQueryPlanner

49c4bc6

Merge branch 'integration' into task/federatedQueryPlanner

53a3ea8

Merge remote-tracking branch 'origin/integration' into task/federated…

cbe8d87

…QueryPlanner

* Updated with metadata-utils 4.0.5 (index markers and avoid non-inde…

74c9db8

…xed fields for holes) * Fixed test cases with correct responses and periodic failing test cases * Updated AncestorQueryLogic to handle federate query planner

Merge branch 'integration' into task/federatedQueryPlanner

6608e5f

* Allow subclasses of ShardQueryConfiguration

390ae48

Merge remote-tracking branch 'origin/integration' into task/federated…

5b52f8b

…QueryPlanner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FederatedQueryPlanner #2216

Add FederatedQueryPlanner #2216

lbschanno commented Jan 12, 2024 •

edited

Loading

ivakegg commented Mar 13, 2024

ivakegg commented Mar 15, 2024

Add FederatedQueryPlanner #2216

Are you sure you want to change the base?

Add FederatedQueryPlanner #2216

Conversation

lbschanno commented Jan 12, 2024 • edited Loading

ivakegg commented Mar 13, 2024

ivakegg commented Mar 15, 2024

lbschanno commented Jan 12, 2024 •

edited

Loading