Flink support #8849
Replies: 9 comments 6 replies
-
Thank everyone putting efforts on Flink support. Glad to see Flink experts willing to join. For the implementation, my 2 cents so far is to extract a Java accelerator library out for our next framework support. I have been working on one https://github.com/velox4j/velox4j at which you may want to take a look. @shuai-xu Substrait haven't been helping much on the extensibility of Gluten. We have to do a bunch of customizations on it that makes Gluten difficult to integrate with other Susbtrait consumer libraries. Moreover, Velox and CH both have incompatible usages on same Substrait features, for example the My feeling is a layered design will be the better arch for such accelerators. We build a Java library that tightly binds to the native libraries through JNI, then translate framework's query plan into library's query plan in Java in one shot. Things will be made much clearer and maintenance will be much simpler with such architecture. |
Beta Was this translation helpful? Give feedback.
-
How much code we can reuse from Gluten? if we can't reuse it may be a better idea to create a subproject. @weiting-chen |
Beta Was this translation helpful? Give feedback.
-
Some Ideas:
Velox does not have an internal state. We can introduce RocksDB as the state, which is not too difficult. However, this could become a potential performance bottleneck, so careful consideration is necessary. Since RocksDB involves disk operations, it is essential to orchestrate IO as asynchronous batch reads.
This is one of the fundamental differences between stream and batch processing. Streams are updated, while batches are not. For example, a table from MySQL naturally has updates and deletes. Flink introduces a stream retract mechanism to address this issue, which involves carrying metadata on the data to indicate whether it is an update, delete, or insert. Clearly, Velox does not have this capability. Several modifications are needed:
Once both the operators and the data stream are modified to support rollback, the overall execution flow should be fine.
|
Beta Was this translation helpful? Give feedback.
-
I noticed that in the Nexmark benchmark, a C++ implementation of the source has been developed. Would it be a fair comparison to evaluate its performance against the native Java version of Nexmark? |
Beta Was this translation helpful? Give feedback.
-
Yes, but only q0-q4 is ready to compare, the other queries have not been
equal to flink in semantic.
ryyyyyy1 ***@***.***> 于2025年8月13日周三 11:51写道:
… I noticed that in the Nexmark benchmark, a C++ implementation of the
source has been developed. Would it be a fair comparison to evaluate its
performance against the native Java version of Nexmark?
—
Reply to this email directly, view it on GitHub
<#8849 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE75MRRY3VOPQWAFWVCHKWT3NKY4TAVCNFSM6AAAAABYBK62NCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMBYHA2TIMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Hello, I would like to know whether the |
Beta Was this translation helpful? Give feedback.
-
Yes, they are fully supported, you can fire a bug to report it if you got
errors.
ryyyyyy1 ***@***.***> 于2025年8月14日周四 14:18写道:
… Thanks for your reply, by the way has the current community version
already fully implemented queries Q0–Q4 (including Gluten, velox4j, and the
velox component)? I seemed to encounter an error when trying to test the
stateful Q3.
—
Reply to this email directly, view it on GitHub
<#8849 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE75MRTVCSMFHWTBFRUKW6T3NQS5DAVCNFSM6AAAAABYBK62NCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMBZHE3TMMQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
GlutenVectorOneInputOperator process RowVector, a RowVector contains
multiple RowData in it. We generate data in batch from the sources, so all
operators in gluten process data in batch.
miao ***@***.***> 于2025年8月25日周一 20:40写道:
… Hello, I would like to know whether the processElement method in
GlutenVectorOneInputOperator and GlutenOneInputOperator processes data by
converting a single RowData into a RowVector, or multiple RowData into a
RowVector. Additionally, does the underlying Velox process one RowData at
a time or multiple RowData at once? If it is the former, do you have any
plans to implement batch processing in the future?
—
Reply to this email directly, view it on GitHub
<#8849 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE75MRWBXV6MVUTBBEFSCKL3PL73LAVCNFSM6AAAAABYBK62NCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMRRGAZDIMY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Oh, sorry, Q3 has join, which has not been equal to flink, we are still
working on it.
ryyyyyy1 ***@***.***> 于2025年8月26日周二 17:29写道:
… in q3, for the SQL condition WHERE A.category = 10 AND (P.state = 'OR' OR
P.state = 'ID' OR P.state = 'CA'), Flink seems to use SARG for processing
this part, but when it's converted to the Velox side, it appears that SARG
is not handled, which causes the filter condition to become ineffective.
—
Reply to this email directly, view it on GitHub
<#8849 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE75MRXZIF3CNUBP35SNGB33PQSF3AVCNFSM6AAAAABYBK62NCVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIMRRHE3DENA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
There is a PoC PR of Flink support recently submitted: #8839 (author: @shuai-xu)
The background doc: https://docs.google.com/document/d/1VNMs1oR0c02kuQLFLQXZeTyk3CoPjrtAfiKGoLGxGzE/edit?usp=sharing (author: @weiting-chen )
The design doc: WIP
There are no existing issues / discussions tracking on Flink support. So let's discuss here.
Beta Was this translation helpful? Give feedback.
All reactions