-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FederatedQueryPlanner #2216
base: integration
Are you sure you want to change the base?
Conversation
Modify the generation of 'i' (indexed rows) and 'ri' (reverse indexed rows) in the metadata table such that the column qualifier contains the event date. This is required as a first step to support efforts for issue #825 so that we can identify dates when an event was ingested and included in a frequency count for an associated 'f' row, but was not indexed.
warehouse/query-core/src/main/java/datawave/query/tables/ShardQueryLogic.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/FederatedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/FederatedQueryPlanner.java
Outdated
Show resolved
Hide resolved
warehouse/query-core/src/main/java/datawave/query/planner/FederatedQueryPlanner.java
Outdated
Show resolved
Hide resolved
…e keys Removed some of the code which I believe was trying to diagnose the test issues
From @lbschanno I have been working on getting tests to pass when the FederatedQueryPlanner is the default query planner. Some cases have shown that it may not be enough to simply use the first config returned from a sub-query as the finalized query string. For instance, in MaxExpansionIndexOnlyQueryTest.testMaxAnyField(), the sub-queries have the following results when the max value expansion threshold is set to 20: Result 0 over 2015/04/04-2015/10/09 Result 1 over 2015/10/10-2015/10/10 Sub-query 2 over 2015/10/11-2015/11/11 In MaxExpansionIndexOnlyQueryTest.testMaxValueRegexIndexOnly(), we receive the following sub-query results when the max expansion threshold is set to 20: Sub-query 0 over 2015/04/04-2015/10/09 Sub-query 1 over 2015/10/10-2015/10/10 Sub-query 2 over 2015/10/11-2015/11/11 In MaxExpansionIndexOnlyQueryTest.testMaxValueNegAnyField(), we receive the following sub-query results when the max expansion threshold is set to 10. Sub-query 0 over 2015/04/04-2015/10/09 Sub-query 1 over 2015/10/10-2015/10/10 Sub-query 2 over 2015/10/11-2015/11/11 I have seen some similar results for MaxExpansionQueryTest and AnyFieldQueryTest. Given that we can have differing query strings, how do you want to handle determining which query string to set in the original config that's passed into the FederatedQueryPlanner.process() method? Do we need the query string unique to the query data iterable returned from each sub-queries when setting up the schedulers in ShardQueryLogic.setUpQuery(GenericQueryConfiguration config)? Is there somewhere else where we need to know the specific query strings from each sub-query? I have pushed an update that adds tests to MaxExpansionIndexOnlyQueryTest with versions of each test using either the DefaultQueryPlanner or FederatedQueryPlanner so that you can see the results for yourself. I suppose another question would be do the results above even look correct to you? |
For documentation purposes, here is the response in the conversation we had:
|
…xed fields for holes) * Fixed test cases with correct responses and periodic failing test cases * Updated AncestorQueryLogic to handle federate query planner
Adds a FederatedQueryPlanner that will break up a query into multiple queries scanning over subsets of the original target date range if field index holes are identified to be present for the query in the target date range.
Note: the work in this PR is dependent on the work in:
Closes #825