No need for caching in sorted-iterator #73

uzadude · 2023-01-04T14:56:11Z

Summary

In #71 we changed the block order. while checking the full iteration on Object Store (S3), we saw it is amazingly slow.
What happened is that it switched to Random IO mode:
.. 15:42:50,084 INFO fs.s3a.S3AInputStream - Switching to Random IO seek policy

Detailed Description

After debugging it looks like it happened because we try to seek back compared to the "natural" readahead buffer.

Luckily, now that we changed the block order we can just read the file sequentially without hopping at all. So just removing the seeking part works!

How was it tested?

regular unit tests

shay1bz · 2023-01-04T16:31:05Z

What... How is that possible? When you reach a leaf block, you have to go back and seek elsewhere, no?

uzadude · 2023-01-04T18:18:45Z

I was also surprised that it just worked, but what happens is just that the parent blocks are saved in memory, and every time you need to read the next block from the file it is always the "correct" block as we saved it pre-order.

shay1bz · 2023-01-05T07:21:07Z

Wow

shay1bz · 2023-01-05T07:31:05Z

Are the parents explicitly cached by us? Do we make sure to uncache them? Just making sure there is no memory leak or something

uzadude · 2023-01-05T07:45:03Z

good point.
I actually thought about that and forgot to check. but it looks like we're good.
the sorted-iterator only has one pointer to Node next and the Node object only has one pointer to parent. So we will maximum log(n) nodes in memory.

BTW, I also need to verify this "Random IO" in the "search" iterator (get()).

uzadude · 2023-01-12T09:47:59Z

ok.. so after looking at get(key) , I saw that we can hop backwards there if we were calling get() multiple times.
So I added assertion to allow only bigger keys, and added functionality to reuse the current state.

eyala · 2023-05-01T11:21:05Z

Is this ready for release? Is it good for all three of our versions?

No need for caching

b102569

uzadude changed the title ~~No need for caching~~ No need for caching in sorted-iterator Jan 4, 2023

uzadude requested a review from shay1bz January 4, 2023 14:56

This was referenced Jan 4, 2023

Changing the implementation of the joinWithIndex to use the B-Tree #74

Open

Adding LRU Caching for sorted iterator #72

Closed

remove sync()

fa8acbc

uzadude force-pushed the cache2 branch 2 times, most recently from 097d0a7 to 207040e Compare January 12, 2023 10:34

support continuous get(key) with forward only seeks

ec959c0

uzadude force-pushed the cache2 branch from 207040e to ec959c0 Compare January 12, 2023 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No need for caching in sorted-iterator #73

No need for caching in sorted-iterator #73

Uh oh!

uzadude commented Jan 4, 2023

Uh oh!

shay1bz commented Jan 4, 2023

Uh oh!

uzadude commented Jan 4, 2023

Uh oh!

shay1bz commented Jan 5, 2023

Uh oh!

shay1bz commented Jan 5, 2023 •

edited

Loading

Uh oh!

uzadude commented Jan 5, 2023

Uh oh!

uzadude commented Jan 12, 2023

Uh oh!

eyala commented May 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

No need for caching in sorted-iterator #73

Are you sure you want to change the base?

No need for caching in sorted-iterator #73

Uh oh!

Conversation

uzadude commented Jan 4, 2023

Summary

Detailed Description

How was it tested?

Uh oh!

shay1bz commented Jan 4, 2023

Uh oh!

uzadude commented Jan 4, 2023

Uh oh!

shay1bz commented Jan 5, 2023

Uh oh!

shay1bz commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uzadude commented Jan 5, 2023

Uh oh!

uzadude commented Jan 12, 2023

Uh oh!

eyala commented May 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shay1bz commented Jan 5, 2023 •

edited

Loading