Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop downloading entire segments when retrieving objects #3362

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

teor2345
Copy link
Member

@teor2345 teor2345 commented Jan 31, 2025

What it does

This PR fixes two performance edge cases in object retrieval:

  • downloading a segment to work out how much padding the segment has
  • downloading a segment to remove the parent segment header at the start of that segment

Instead, it:

  • tries all the possible padding lengths against the object hash
  • manually decodes and discards the segment version variant, parent segment header, and the start of the block continuation containing the continuing object

How it's written

There is no slow path any more, because that path downloaded full segments, which this code doesn’t need to do.

Instead, each piece is turned into a PartialData struct. Then when there are enough pieces (2) to calculate the object's length(s), those pieces are turned into a PartialObject struct. After that, each new piece becomes a PartialData (to track padding and segment headers), then gets added to the PartialObject.

When the PartialObject has enough data for its shortest length, the data (and corresponding padding) is checked against the object hash. If that fails, we check more padding lengths, or fetch more data.

This code has more constants, structs, and documentation, because the assumptions in the old code were undocumented, or implemented as hard-coded calculations inside each function. That made them tricky to understand, review, maintain, and test. (Which is why multiple bugs have slipped through review and testing.)

Other Changes

Edit: Moved to PR #3367.

Close #3318.

TODO

  • Write tests for each object reconstruction edge case (in this PR, or a future PR)

Code contributor checklist:

@teor2345 teor2345 added bug Something isn't working improvement it is already working, but can be better labels Jan 31, 2025
@teor2345 teor2345 self-assigned this Jan 31, 2025
@teor2345 teor2345 requested a review from nazar-pc as a code owner January 31, 2025 02:22
Copy link
Member

@vedhavyas vedhavyas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense
Will look through again once remaining tests are added

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went until "Stop downloading full segments during object retrieval" commit and it was not obvious to me how it stops downloading full segments due to large number of changes and new implementation not being an extension of the previous implementation, at least not in any obvious way.

Another challenge is that there is a huge public API surface provided, but it is not clear why all of those bits need to be public and makes it hard to figure out where to even start looking at this. Previous version had somewhat straightforward approach: there was a single public struct + error enum + 2 constants (not sure if both needed to be public though). Struct had just constructor and fetching method, fetching method in turn had a few conditional paths for code to go through that were each mostly self-contained.

I no no longer see fast paths in the code as clearly as before, also judging by number of constants present and extensive documentation there is a lot of assumptions baked in and not even all cases covered yet.

PR description was not enough for me to get at least an overview of the chosen implementation approach here, there is just a brief mention of the outcome and 1451 changed lines in a single commit that changes everything. I could really benefit from a more extensive explanation of "what" and "why", maybe even a P2P discussion of the approach and the changes. I'll have to fully understand what this is doing, how and why before I can approve it.

Non-controversial cleanups better go into a separate PR to merge them quickly and reducing the diff here.

Copy link
Member Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nazar (and Ved if you want), let’s schedule a video call after I’m finished with the private EVM stuff?

I went until "Stop downloading full segments during object retrieval" commit and it was not obvious to me how it stops downloading full segments due to large number of changes and new implementation not being an extension of the previous implementation, at least not in any obvious way.

I understand it would be easier to review a minor rewrite, which added extra steps to the existing functions. But there’s a bunch of reasons why that wasn’t possible.

It was tricky to extend the old code, because it had a lot of unnecessary assumptions (and in some cases, they were wrong). That’s why this code is a partial rewrite - we don’t want to download segments, so I had to remove that entire codepath, and update the other codepath to handle all the edge cases.

Another challenge is that there is a huge public API surface provided, but it is not clear why all of those bits need to be public and makes it hard to figure out where to even start looking at this. Previous version had somewhat straightforward approach: there was a single public struct + error enum + 2 constants (not sure if both needed to be public though). Struct had just constructor and fetching method, fetching method in turn had a few conditional paths for code to go through that were each mostly self-contained.

Sure, happy to make most of the types and constants private.

I was trying to keep code roughly in the same place to minimise the diff, but it might be more readable if I move some of the lower-level operations to a submodule.

I no no longer see fast paths in the code as clearly as before, also judging by number of constants present and extensive documentation there is a lot of assumptions baked in

There is no slow path any more, because that path downloaded full segments, which this code doesn’t need to do. The assumptions in the old code were undocumented, or implemented as hard-coded calculations inside each function. That made them tricky to understand, review, maintain, and test. (Which is why multiple bugs have slipped through review and testing.)

and not even all cases covered yet.

I’m pretty sure this code covers all the necessary cases. Why are you concerned that some cases aren’t covered?

PR description was not enough for me to get at least an overview of the chosen implementation approach here, there is just a brief mention of the outcome and 1451 changed lines in a single commit that changes everything. I could really benefit from a more extensive explanation of "what" and "why", maybe even a P2P discussion of the approach and the changes. I'll have to fully understand what this is doing, how and why before I can approve it.

Yep, happy to have a chat. Currently it’s not clear to me what the specific gaps are between the PR description and the code comments, so a call is a good way to work that out.

I’m also not sure how an incremental commit series could be used to make these changes, let’s talk about that too.

Non-controversial cleanups better go into a separate PR to merge them quickly and reducing the diff here.

Sure, happy to do that.

I feel like there’s a fairly complicated set of rules about what gets a separate PR, what gets put together in the same PR, what gets separate commits, and what gets put together in the same commit. Can one of the more experienced team members write it down somewhere?

@nazar-pc
Copy link
Member

nazar-pc commented Feb 2, 2025

There is no slow path any more, because that path downloaded full segments, which this code doesn’t need to do. The assumptions in the old code were undocumented, or implemented as hard-coded calculations inside each function. That made them tricky to understand, review, maintain, and test. (Which is why multiple bugs have slipped through review and testing.)

Thanks, that is definitely a helpful context. Having a single code path that handles everything does make sense and would probably be better, though I'll have to understand better how this is achieved because I do believe they are somewhat distinct and not that difficult to understand in isolation (though they certainly must be non-buggy to be useful).

I’m pretty sure this code covers all the necessary cases. Why are you concerned that some cases aren’t covered?

I meant TODOs and some limitations (like size of supported objects) mentioned in the code. I don't necessarily have anything more specific than that.

Can one of the more experienced team members write it down somewhere?

I don't think there is or there even need s to be a set of hard rules. Just looking at changes it is not hard to guess that some of them will take more time to understand and merge than others, while others are mostly trivial and will be carried with all of the potential rebases and reviews. So it just logically makes sense to send them separately to be merged and done with.

@teor2345 teor2345 marked this pull request as draft February 3, 2025 08:38
@teor2345

This comment was marked as resolved.

@teor2345 teor2345 force-pushed the object-fewer-pieces branch 3 times, most recently from f289d71 to b24ed40 Compare February 13, 2025 04:10
@teor2345 teor2345 marked this pull request as ready for review February 13, 2025 04:12
Copy link
Member Author

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now ready for another review. Reviews are a low priority for now, because I'm focused on revisions and testing for Private EVM.

I resolved all the previous review comments, and also:

  • added a minor refactor as the first commit, let me know if you want this as a separate PR
  • split the reconstruction internals into separate modules
  • re-arranged the added/deleted errors to minimise the diff
  • added more module, function, and PR documentation

The tests aren't finished yet, I'd like to make sure we have a stable API/behaviour before I write them all.

Apologies for the rebase, I needed to pick up some changes from main, and drop a bunch of commits/changes from the previous branch.

@teor2345 teor2345 force-pushed the object-fewer-pieces branch from b24ed40 to 96e2da5 Compare February 13, 2025 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working improvement it is already working, but can be better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce the number of pieces downloaded by the object fetcher in subspace-gateway
3 participants