-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved patterns command with new algorithm #3263
base: main
Are you sure you want to change the base?
Improved patterns command with new algorithm #3263
Conversation
Signed-off-by: Songkan Tang <[email protected]>
Thanks for the initiative!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general,
- add user facing doc to explain new BRAIN alg with example will be helpfull.
- add UT could help reviwer understand the logic also. @dai-chen did a good example. https://github.com/opensearch-project/sql/blob/main/core/src/main/java/org/opensearch/sql/expression/window/frame/PeerRowsWindowFrame.java
- add benchmark to compare BRAIN and SIMPLE also help a lot.
new Function( | ||
ctx.pattern_method != null | ||
? ctx.pattern_method.getText().toLowerCase(Locale.ROOT) | ||
: BuiltinFunctionName.BRAIN.name(), // By default, use new algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we introduce a setting to control the default pattern algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. Like a java property to specify default algorithm name? I can add that.
((NamedArgumentExpression) expression).getValue().valueOf().stringValue()) | ||
.findFirst() | ||
.orElse(""); | ||
return new StreamPatternRowWindowFrame( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need StreamPatternRowWindowFrame? could we just use CurrentRowWindowFrame instead? patternExpression could be member of StreamPatternWindowFunction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will remove the need of StreamPatternRowWindowFrame in the next revision.
List<Expression> sortFields = | ||
windowDefinition.getSortList().stream().map(Pair::getRight).collect(Collectors.toList()); | ||
|
||
ExprValue last = peers.get(peers.size() - 1); | ||
return resolve(sortFields, last).equals(resolve(sortFields, next)); | ||
} | ||
|
||
private boolean isSamePartition(ExprValue next) { | ||
protected boolean isSamePartition(ExprValue next) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert unnecessary change.
return value.stringValue(); | ||
}) | ||
.toList(); | ||
this.preprocessedMessages.addAll(logParser.preprocessAllLogs(logMessages)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BufferPatternRowsWindowFrame should have it own spec, doet it means over all rows? @dai-chen WindowFrame definition and How to use WIndowFrame should be seperate, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, I set the spec to empty partition by
and empty sort by
during AST tree parsing unresolved WindowFunction, which treats the window frame range is all rows on coordinator node. Because I haven't seen requirements on sorting and partitioning on other columns.
I think we can add partition by
and sort by
syntax if we see values in case users want to specify them. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think users are free to use pattern functions added with any window frame definition.
Signed-off-by: Songkan Tang <[email protected]>
return value.stringValue(); | ||
}) | ||
.toList(); | ||
this.preprocessedMessages.addAll(logParser.preprocessAllLogs(logMessages)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think users are free to use pattern functions added with any window frame definition.
repository.register(brain()); | ||
repository.register(simplePattern()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking are these new algorithm really window function? If users specify order by
or partition by
, does it still generate meaningful result?
UnresolvedExpression sourceField, | ||
String alias, | ||
java.util.Map<String, Literal> arguments) { | ||
return new Pattern( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wonder if pattern is mostly syntax sugar for pattern window function, is new logical operator still required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the integration with LogicalWindow, I see they are quite similar. Yeah, I think Pattern operator is probably not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the Window unresolvedPlan. So replace Pattern with Window instead.
import org.opensearch.sql.expression.window.frame.WindowFrame; | ||
|
||
@EqualsAndHashCode(callSuper = true) | ||
public class BufferPatternWindowFunction extends FunctionExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order by
expression in window definition decides who're peers right? What's the order by expression for buffer (offline) pattern function? I didn't find it and may miss it in design doc:
Project[message#1, patterns_field#2]
+- Window[brain(message#1), windowsSpec(partitionBy=null, frame=PeerRowsWindowFrame)]
+- OpenSearchIndexScan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high-level question about the long-term approach: Currently, the pattern command always generates a new field, which offers flexibility as users can perform aggregation (stats) or other operations on it afterward. However, for large datasets, I assume most users would prefer functionality similar to CloudWatch Logs Query Syntax - Pattern.
Do you see any potential issues with the current implementation, which is based on SQL window functions combined with aggregate functions, as we scale this feature in the future?
@dai-chen It's a good question. Actually, it's one of my previous thought to use a new specific operator to directly return grouped logs with sample count or grouped samples. It's more like a compound aggregation operator with a default As to |
To scale the pattern function, I think other issues are pending investigation.
|
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
AvgTime benchmark:
|
Signed-off-by: Songkan Tang <[email protected]>
This PR still needs some action items.
|
Description
This PR introduces enhancement on original patterns command. It keeps the patterns command naming and rebuild it with specific window functions to parse log messages with different pattern method(log parser algorithms). See this RFC: #3251
The sample query input and output will be like:
Caveat:
For now, this is a draft PR for initial review so that we can collect major feedbacks to be addressed later.
Related Issues
Resolves #3251
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.