Remove boost assumptions that pieces are CARs #1878

Gozala · 2024-02-06T18:59:58Z

Checklist

This is not a question or a support request. If you have any boost related questions, please ask in the discussion forum.
This is not a new feature request. If it is, please file a feature request instead.
This is not an enhancement request. If it is, please file a improvement suggestion instead.
I have searched on the issue tracker and the discussion forum, and there is no existing related issue or discussion.
I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
I did not make any code changes to boost.

Boost component

Boost Version

n/a

Describe the Bug

At the moment boost assumes that first piece in the aggregate is CAR

boost/storagemarket/deal_commp.go

Lines 148 to 154 in baf26c6

    
           // (willscott - oct 2023 - remove once raw byte supported): confirm file is a car file 
        
           if _, err := carv2.ReadVersion(rd); err != nil { 
        
           	return nil, &dealMakingError{ 
        
           		retry: types.DealRetryFatal, 
        
           		error: fmt.Errorf("failed to read car header: %w", err), 
        
           	} 
        
           }

This is problematic for aggregators like web3.storage as doing this verification has non-negligible cost, please see storacha/w3up#1304.

Please note that wrapping a single block raw files in a CAR is impractical, it introduces (negligible) size overhead, but lot more complex processing pipeline for the aggregators and incur added operational costs.

Also please note that aggregate piece specification FRC-0058 does not require pieces to be CARs. In fact it breaks down assumption that filecoin piece is the CAR, because it's built up from segments which are in most cases CARs. Having to unpack and concatenate those CARs would defeat the purpose of aggregates.

Logging Information

n/a

Repo Steps

n/a

The text was updated successfully, but these errors were encountered:

LexLuthr · 2024-02-07T07:13:20Z

@willscott @masih Can you please chime in here? We have had multiple discussions around this in Singapore and IIRC, Juan wants everything to be car for retrievals. There was a discussion involving Juan and Alex about this with differing views.

LexLuthr · 2024-02-07T07:15:09Z

@Gozala Without cars we cannot serve partial retrievals. Every SP needs to run booster-http which they don't and serve back the whole piece with padded bytes. It would be upto the aggregators to deal with the client retrievals.

willscott · 2024-02-07T07:21:20Z

Boost, as a PL project, continues to push for content addressed formatting of deals. This is part of broader ecosystem alignment as well - while you correctly note that there is a packing overhead incurred, it's worth noting that in the world where this requirement is relaxed in boost and you begin making deals in a non-car form, they would not pass fil+ compliance checks, which attempt and would fail at retrieval validation.

More relevant will be the question of whether boost is the right place to build out the full podsi market. In particular, boost retrieval today doesn't fully implement the partial retrieval api defined in the FRC of http retrieval of individual segments. It should be evaluated if this would be faster implemented as a special purpose, simpler, DDO market pathway.

Gozala · 2024-02-07T19:46:28Z

I am sorry but I'm not able to follow most of this as I'm not familiar with details mentioned. I do want to however call out few things:

Operating aggregator that has to turn segments into a CAR segments in order to make the whole piece be a valid CAR is financially not viable.
Content addressing does not imply CARs, you could have content addressed files (like raw codec ones).

Given the lack of domain knowledge I'm unable to propose anything actionable, so perhaps expects in the domain could propose path forward that is financially viable for the aggregators ?

willscott · 2024-02-08T08:59:12Z

The padding implied by PODSI packing is a significantly higher overhead and financial issue to work through than CAR overhead.

Gozala added kind/bug Kind: Bug need/triage labels Feb 6, 2024

github-project-automation bot added this to Boost Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove boost assumptions that pieces are CARs #1878

Remove boost assumptions that pieces are CARs #1878

Gozala commented Feb 6, 2024

LexLuthr commented Feb 7, 2024

LexLuthr commented Feb 7, 2024

willscott commented Feb 7, 2024

Gozala commented Feb 7, 2024

willscott commented Feb 8, 2024

Remove boost assumptions that pieces are CARs #1878

Remove boost assumptions that pieces are CARs #1878

Comments

Gozala commented Feb 6, 2024

Checklist

Boost component

Boost Version

Describe the Bug

Logging Information

Repo Steps

LexLuthr commented Feb 7, 2024

LexLuthr commented Feb 7, 2024

willscott commented Feb 7, 2024

Gozala commented Feb 7, 2024

willscott commented Feb 8, 2024