We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance
The last non null dedup implementation is too slow, and the memtable may take a long time to flush.
If a key has more than 2M rows, then the dedupliation will be very expensive and stall write requests.
2024-12-24T09:24:09.200271Z INFO mito2::read::dedup: LastNonNullIter inner iter returns batch, region: 4483945857024(1044, 0), batch len: 2578923, timestamps: Some([1734968880013721400, 1734968880013721400, 1734968880013721400, 1734968880013721400, 1734968880013721400
Some metrics
2024-12-24T09:24:02.606851Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 1, num_rows: 598196, num_splits: 235112, num_push_batches: 235112, num_return_batches: 3628, num_finish_batches: 0 2024-12-24T09:24:07.615855Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 11, num_rows: 931664, num_splits: 888700, num_push_batches: 888710, num_return_batches: 16853, num_finish_batches: 0 2024-12-24T09:24:12.617940Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 949850, num_push_batches: 949861, num_return_batches: 18191, num_finish_batches: 0 2024-12-24T09:24:17.621942Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1006868, num_push_batches: 1006879, num_return_batches: 19135, num_finish_batches: 0 2024-12-24T09:24:22.622635Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1065109, num_push_batches: 1065120, num_return_batches: 20084, num_finish_batches: 0 2024-12-24T09:24:27.625763Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1124782, num_push_batches: 1124793, num_return_batches: 21048, num_finish_batches: 0 2024-12-24T09:24:32.631291Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1185770, num_push_batches: 1185781, num_return_batches: 22046, num_finish_batches: 0 2024-12-24T09:24:37.633937Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1248650, num_push_batches: 1248661, num_return_batches: 23087, num_finish_batches: 0 2024-12-24T09:24:42.633966Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1313255, num_push_batches: 1313266, num_return_batches: 24107, num_finish_batches: 0 2024-12-24T09:24:47.634862Z INFO mito2::read::dedup: LastNonNullIter, region: 4483945857024(1044, 0), num_batches: 12, num_rows: 3510587, num_splits: 1380601, num_push_batches: 1380612, num_return_batches: 25152, num_finish_batches: 0
The hotspots
We may need to refactor the iter or memtable to make this computation faster.
The text was updated successfully, but these errors were encountered:
evenyag
No branches or pull requests
What type of enhancement is this?
Performance
What does the enhancement do?
The last non null dedup implementation is too slow, and the memtable may take a long time to flush.
If a key has more than 2M rows, then the dedupliation will be very expensive and stall write requests.
Some metrics
The hotspots
Implementation challenges
We may need to refactor the iter or memtable to make this computation faster.
The text was updated successfully, but these errors were encountered: