Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(dag-protobuf): cache dag pb directory structure and block indexes #147

Merged
merged 9 commits into from
Jan 28, 2025

Conversation

fforbeck
Copy link
Member

@fforbeck fforbeck commented Jan 23, 2025

Context

The requests to fetch a DAG Protobuf directory structure using a CID execute the following steps:

  1. Get all content claims lookups to identify where we are pulling data from
  2. Fetch cid bafy...cid - which represents the folder containing the target file, so that we can determine the verifiable cid for the file (let's call that bafy...file)
  3. Fetch cid bafy...file to get the root block of the file, which in UnixFS contains NO raw data, but rather is a list of sub-blocks that contain the file (let's call those bafy...bytes1 and bafy...bytes2)
  4. Fetch the first raw data blocks to send the first byte

This PR enables the caching strategy for steps 2 to 4 where instead of fetching the directory structure from the locator and navigating the DAG for every request, it caches the DAGs if they have a Protobuf structure and content size <= 2MB.

Changes

  • Updated withContentClaimsDagula middleware to cache DAG PB content requests
  • New KV Store
    • DAGPB_CONTENT_CACHE
  • Caching rules
    • FF_DAGPB_CONTENT_CACHE_TTL_SECONDS: The number that represents when to expire the key-value pair in seconds from now. The minimum value is 60 seconds. Any value less than 60MB will not be used. We will use 30 days TTL by default for Production environment.
    • FF_DAGPB_CONTENT_CACHE_MAX_SIZE_MB: The maximum size of the key-value pair in MB. The minimum value is 1 MB. Any value less than 1MB will not be used. We will use 2MB max file size by default.
    • FF_DAGPB_CONTENT_CACHE_ENABLED: The flag that enables the DAGPB content cache. The cache is disabled in prod by default.

Samples

2MB file - no cache
2mb-file-no-cache

  • 5.2 seconds

2MB file - cached
2mb-file-cached

  • 1.4 seconds

KV Limits

  • Reads: unlimited
  • Writes (different keys): unlimited
  • Writes (same key): 1w / sec (rate limiting)
  • Storage/account & Storage/namespace: unlimited
  • key size: <= 512 bytes
  • value size: <= 25MiB
  • Minimum cache ttl: 60 seconds
  • Higher limit? -> https://forms.gle/ukpeZVLWLnKeixDu7

resolves storacha/project-tracking#301

@fforbeck fforbeck self-assigned this Jan 23, 2025
@fforbeck fforbeck marked this pull request as ready for review January 23, 2025 19:27
@fforbeck fforbeck changed the title feat(dag-protobuf): cache dag pb content feat(dag-protobuf): cache dag pb directory structure and block indexes Jan 23, 2025
Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though I think we should set longer TTL -- we can just drop expiry and see how large the cache gets over time. the big expense is writes anyway.

src/middleware/withContentClaimsDagula.js Outdated Show resolved Hide resolved
src/middleware/withContentClaimsDagula.js Outdated Show resolved Hide resolved
src/middleware/withContentClaimsDagula.js Outdated Show resolved Hide resolved
src/middleware/withContentClaimsDagula.js Outdated Show resolved Hide resolved
Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd set a TTL of a month or something.

Needs tests.

port: 8787,
inspectorPort: 9898,
log: new Log(LogLevel.INFO),
cache: false, // Disable Worker Global Cache to test cache middlewares
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't disable this global cache in test the requests won't reach the Content Claim Dagula middleware where we have the KV cache.

Copy link
Member

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on condition we set the prod cache to 30 days.

test/miniflare/freeway.spec.js Show resolved Hide resolved
wrangler.toml Outdated Show resolved Hide resolved
@fforbeck fforbeck dismissed alanshaw’s stale review January 28, 2025 12:44

changes implemented

@fforbeck fforbeck merged commit e367852 into main Jan 28, 2025
1 check passed
@fforbeck fforbeck deleted the feat/cache-dag-pb branch January 28, 2025 12:51
fforbeck pushed a commit that referenced this pull request Jan 28, 2025
🤖 I have created a release *beep* *boop*
---


##
[2.25.0](v2.24.0...v2.25.0)
(2025-01-28)


### Features

* **dag-protobuf:** cache dag pb directory structure and block indexes
([#147](#147))
([e367852](e367852))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache small blocks on the gateway seperately from responses
3 participants