Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msqe-case-insensitive-enabled #14830

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

albertobastos
Copy link
Contributor

closes #14675

Backwards compatibility:

When enable.case.insensitive=true (the default value) responses using MSQE will return the contents of tablesQueried always in lowercase. That's due to how internally Calcite is used for column validation and how it handles a case-insensitive scenario. Given that it happens when a cluster is configured as case-insensitive, hopefully the consumer will also be prepared to ignore case when using that response field.

@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 63.72%. Comparing base (59551e4) to head (946f2f2).
Report is 1598 commits behind head on master.

Files with missing lines Patch % Lines
...t/controller/api/resources/PinotQueryResource.java 0.00% 9 Missing ⚠️
.../java/org/apache/pinot/query/QueryEnvironment.java 85.71% 0 Missing and 1 partial ⚠️
...a/org/apache/pinot/query/catalog/PinotCatalog.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14830      +/-   ##
============================================
+ Coverage     61.75%   63.72%   +1.97%     
- Complexity      207     1612    +1405     
============================================
  Files          2436     2708     +272     
  Lines        133233   151297   +18064     
  Branches      20636    23359    +2723     
============================================
+ Hits          82274    96416   +14142     
- Misses        44911    47632    +2721     
- Partials       6048     7249    +1201     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.69% <50.00%> (+1.98%) ⬆️
java-21 63.62% <50.00%> (+2.00%) ⬆️
skip-bytebuffers-false 63.71% <50.00%> (+1.96%) ⬆️
skip-bytebuffers-true 63.60% <50.00%> (+35.87%) ⬆️
temurin 63.72% <50.00%> (+1.97%) ⬆️
unittests 63.72% <50.00%> (+1.97%) ⬆️
unittests1 56.29% <69.23%> (+9.40%) ⬆️
unittests2 34.04% <31.81%> (+6.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang added backward-incompat Referenced by PRs that introduce or fix backward compat issues bugfix multi-stage Related to the multi-stage query engine labels Jan 17, 2025
.collect(Collectors.toSet());
//return _tableCache.getTableNameMap().keySet().stream().filter(n -> DatabaseUtils.isPartOfDatabase(n, _databaseName))
// .collect(Collectors.toSet());
return _tableCache.getTableNameMap().keySet();
Copy link
Contributor Author

@albertobastos albertobastos Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling calcite case-insensitive configuration led to a lot of unexpected consequences. One of them is that, during table name resolution and validation, it now relies in schema.getTableNames() to find if a table exists:

https://github.com/apache/calcite/blob/f9cc84f35aeba0effd6654da8471d4bef9eaa925/core/src/main/java/org/apache/calcite/jdbc/SimpleCalciteSchema.java#L122

As it was implemented, this method got rid of all non-prefixed tableNames when _databaseName != null, and probably that was not intended.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it was implemented, this method got rid of all non-prefixed tableNames when _databaseName != null, and probably that was not intended.

The non-prefixed table names are assumed to belong to the "default" database IIUC. So shouldn't we want to exclude them when there's a non-default database being used?

cc - @shounakmk219

Copy link
Collaborator

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @albertobastos! I've left a few minor comments and questions.

@Test
public void testCaseInsensitiveNames() throws Exception {
String query = "select ACTualELAPsedTIMe from mYtABLE where actUALelAPSedTIMe > 0 limit 1";
JsonNode jsonNode = postQuery(query);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the broker query API; let's also add a call to the controller query API (via postQueryToController) to test that path as well here - I see that you've made the necessary changes for that to work already.

@@ -118,9 +111,15 @@ public QueryEnvironment(Config config) {
String database = config.getDatabase();
PinotCatalog catalog = new PinotCatalog(config.getTableCache(), database);
CalciteSchema rootSchema = CalciteSchema.createRootSchema(false, false, database, catalog);
Properties connectionConfigProperties = new Properties();
connectionConfigProperties.setProperty(CalciteConnectionProperty.CASE_SENSITIVE.camelName(), Boolean.toString(
config.getTableCache() == null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the table cache ever be null here? Doesn't look like it should be.

.collect(Collectors.toSet());
//return _tableCache.getTableNameMap().keySet().stream().filter(n -> DatabaseUtils.isPartOfDatabase(n, _databaseName))
// .collect(Collectors.toSet());
return _tableCache.getTableNameMap().keySet();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it was implemented, this method got rid of all non-prefixed tableNames when _databaseName != null, and probably that was not intended.

The non-prefixed table names are assumed to belong to the "default" database IIUC. So shouldn't we want to exclude them when there's a non-default database being used?

cc - @shounakmk219

Comment on lines +3003 to +3008
"SELECT count(*) FROM mytable WHERE DaysSinceEpoch = " + daysSinceEpoch + " limit 10000",
"SELECT count(*) FROM mytable WHERE timeConvert(DaysSinceEpoch,'DAYS','HOURS') = " + hoursSinceEpoch
+ " limit 10000",
"SELECT count(*) FROM mytable WHERE timeConvert(DaysSinceEpoch,'DAYS','SECONDS') = " + secondsSinceEpoch
+ " limit 10000",
"SELECT MAX(timeConvert(DaysSinceEpoch,'DAYS','SECONDS')) FROM mytable limit 10000",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these changes for?

testQueryError(query, QueryException.QUERY_PLANNING_ERROR_CODE);
//testQueryError(query, QueryException.QUERY_PLANNING_ERROR_CODE);
JsonNode response = postQuery(query);
assertTrue(response.get("numSegmentsProcessed").asLong() >= 1L, "Query: " + query + " failed");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intention is to change the assertion to verify that the query no longer fails (since we're case insensitive by default now), we can use the assertNoError method here.

List<String> queries = new ArrayList<>();
baseQueries.forEach(q -> queries.add(q.replace("mytable", "MYTABLE").replace("DaysSinceEpoch", "DAYSSinceEpOch")));
baseQueries.forEach(
q -> queries.add(q.replace("mytable", "MYDB.MYTABLE").replace("DaysSinceEpoch", "DAYSSinceEpOch")));
q -> queries.add(q.replace("mytable", "DEFAULT.MYTABLE").replace("DaysSinceEpoch", "DAYSSinceEpOch")));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the database name need to be changed here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backward-incompat Referenced by PRs that introduce or fix backward compat issues bugfix multi-stage Related to the multi-stage query engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-stage query engine doesn't respect the enable.case.insensitive cluster config
4 participants