-
Notifications
You must be signed in to change notification settings - Fork 16
No need for caching in sorted-iterator #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: pre_order
Are you sure you want to change the base?
Conversation
|
What... How is that possible? When you reach a leaf block, you have to go back and seek elsewhere, no? |
|
I was also surprised that it just worked, but what happens is just that the parent blocks are saved in memory, and every time you need to read the next block from the file it is always the "correct" block as we saved it pre-order. |
|
Wow |
|
Are the parents explicitly cached by us? Do we make sure to uncache them? Just making sure there is no memory leak or something |
|
good point. BTW, I also need to verify this "Random IO" in the "search" iterator ( |
|
ok.. so after looking at |
097d0a7 to
207040e
Compare
|
Is this ready for release? Is it good for all three of our versions? |
Summary
In #71 we changed the block order. while checking the full iteration on Object Store (S3), we saw it is amazingly slow.
What happened is that it switched to Random IO mode:
.. 15:42:50,084 INFO fs.s3a.S3AInputStream - Switching to Random IO seek policyDetailed Description
After debugging it looks like it happened because we try to seek back compared to the "natural" readahead buffer.
Luckily, now that we changed the block order we can just read the file sequentially without hopping at all. So just removing the seeking part works!
How was it tested?
regular unit tests