-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for OpenSearch alias type #3246
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Heng Qian <[email protected]> (cherry picked from commit 2f475af)
Signed-off-by: Heng Qian <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice - thanks !
* The type of alias. See <a | ||
* href="https://opensearch.org/docs/latest/opensearch/supported-field-types/alias/">doc</a> | ||
*/ | ||
public class OpenSearchAliasType extends OpenSearchDataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should an alias be treated as a new data type, or should it remain as just an alias for the original field? Additionally, what are the expectations for the output of the DESCRIBE TABLE command? for example, describe try
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I've thought of resolving alias type to
NamedExpression
which is a resolved expression forAlias
. In that proposal, we still need to maintain thisOpenSearchAliasType
inTypeEnvironment
when analyzing, so as to identifyident
of alias type, and resolve them toNamedExpression
(i.e. Alias) with itsdelegated
to be aReferenceExpression
to the original field. -
With the current code,
describe try
returns:
{
"schema": [{"name": "TABLE_CAT","type": "string"},{"name": "TABLE_SCHEM","type": "string"},{"name": "TABLE_NAME","type": "string"},{"name": "COLUMN_NAME","type": "string"},{"name": "DATA_TYPE","type": "string"},{"name": "TYPE_NAME","type": "string"},{"name": "COLUMN_SIZE","type": "string"},{"name": "BUFFER_LENGTH","type": "string"},{"name": "DECIMAL_DIGITS","type": "string"},{"name": "NUM_PREC_RADIX","type": "string"},{"name": "NULLABLE","type": "string"},{"name": "REMARKS","type": "string"},{"name": "COLUMN_DEF","type": "string"},{"name": "SQL_DATA_TYPE","type": "string"},{"name": "SQL_DATETIME_SUB","type": "string"},{"name": "CHAR_OCTET_LENGTH","type": "string"},{"name": "ORDINAL_POSITION","type": "string"},{"name": "IS_NULLABLE","type": "string"},{"name": "SCOPE_CATALOG","type": "string"},{"name": "SCOPE_SCHEMA","type": "string"},{"name": "SCOPE_TABLE","type": "string"},{"name": "SOURCE_DATA_TYPE","type": "string"},{"name": "IS_AUTOINCREMENT","type": "string"},{"name": "IS_GENERATEDCOLUMN","type": "string"}
],
"datarows": [
[ "opensearch", null, "try", "time1", null, "timestamp", null, null, null, 10, 2, null, null, null, null, null, 0, "", null, null, null, null, "NO", ""],
[ "opensearch", null, "try", "time2", null, "timestamp", null, null, null, 10, 2, null, null, null, null, null, 1, "", null, null, null, null, "NO", ""]
],
"total": 2,
"size": 2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought of resolving alias type to NamedExpression which is a resolved expression for Alias
+1, agree
instead create new data type, could we add alias
field when parseMapping? for instance,
Set<Alias> aliases = new HashSet<>()
indexMapping.forEach(
(k, v) -> {
var type = ((String) innerMap.getOrDefault("type", "object")).replace("_", "");
if (type == "alias") {
aliases.add(new Alias(aliasField, rawField)
}
// current code
result
});
for(alias: aliases) {
result.put(alias.aliasField, result.get(alias.rawField)
}
return result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding alias
field when parse mapping is more simply. But what if the original field definition changed? In DSL, the alias and field definitions are "static". But in PPL, the field definition can be changed by eval <original_field> = xxx
. Should the alias change its returns correspondingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@penghuo it seems not work to parse alias
field in the way like that, as the information of path
cannot be recorded anywhere. For the example in the description, in analyzing process, we only know the datatype of time2
but we cannot resolve its value when executing, because time2
isn't an actual exist value in the source of OpenSearch index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in analyzing process, we only know the datatype of time2 but we cannot resolve its value when executing
agree.
my concern is alias is not a data type even when move to Calcite, right? is it possible to resolve alias field as NamedExpression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking alias field will be "replaced" everywhere in the query plan after analyzing, similar as table alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we need to add alias type as well when moving to Calcite. But it's a data type only exist in analyzing process and will be resolved before execution.
is it possible to resolve alias field as NamedExpression?
Yes if using the way mentioned in the beginning of this thread. Taking the above example, we can change resolving time2
from
ReferenceExpression(attr = `time2`, path =`time1`, type = `date`)
to
NamedExpression(name = `time2`, delegated = ReferenceExpression(attr = `time1`, path =`time1`, type = `date`))
They two are nearly semantic equal I think.
Description
Add Support for OpenSearch alias type, so PPL can query fields of alias type correctly.
Implementation:
OpenSearchAliasType
OpenSearchDataType::parseMapping
ReferenceExpression
. For now, only alias type will have different original path and type._source
inOpenSearchRequestBuilder::pushDownProjects
, use path instead of attr since they are different for alias type. The former one is more accurate since OpenSearch don't accept fields of alias type in_source
.e.g. After this PR:
Related Issues
Resolves #3069
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.