Replies: 4 comments 2 replies
-
Really great work! Out of curiosity - why was Avro chosen as the initial format? Is it because it's believed to be most commonly adopted amongst Kafka users? Also how much additional work is it to add Protobuf support? I assume JSON is more work, but Protobuf perhaps may be similar to Avro? Happy to hear about any intricacies in supporting different format conversions into the Parquet schema |
Beta Was this translation helpful? Give feedback.
-
I know that the full scale benchmark is mentioned in the upcoming work section, but I'd love to understand the impact on read and write latency, e.g. compared with tiered storage without Iceberg support. Would you mind sharing anything? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I went through the whitepaper and also saw the upcoming work, but I have a few concerns and questions that go back to what @sap1ens mentioned about benchmarks with and without iceberg tiered storage. For one-way streaming, i.e, Kafka -> Iceberg, this approach might make sense, but I'm curious about the following:
There are a few more thoughts, but I want to focus on these for now.. What I'm looking to understand, I guess, is what's the actual benefit of adding Iceberg or delta lake support natively into the tiered storage is. A "single-copy", yes, but in this scenario, does it indeed make sense and yield more benefits than drawbacks? Eager to get your thoughts |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This is for discussing the Iceberg whitepaper.
Beta Was this translation helpful? Give feedback.
All reactions