Introduce SnapshotRepository and object store integration #2310

pcholakov · 2024-11-18T10:56:27Z

This change introduces a SnapshotRepository responsible for uploading snapshots to a remote object store.

Sample usage

Configuration:

[worker.snapshots]
destination = "file:///Users/pavel/test-cluster-snapshots"

Currently only s3:// and file:// URLs are supported and work just as expected.

Snapshot creation:

> tree ~/test-cluster-snapshots/
/Users/pavel/test-cluster-snapshots/
├── 0
│   ├── latest.json
│   ├── lsn_85
│   │   ├── 000499.sst
│   │   ├── 000559.sst
│   │   ├── 000578.sst
│   │   └── metadata.json
│   └── lsn_88
│       ├── 000579.sst
│       ├── 000602.sst
│       └── metadata.json
├── 1
│   ├── latest.json
│   ├── lsn_94
│   │   ├── 000491.sst
│   │   ├── 000555.sst
│   │   ├── 000586.sst
│   │   └── metadata.json
│   └── lsn_96
│       ├── 000587.sst
│       ├── 000603.sst
│       └── metadata.json
└── 2
    ├── latest.json
    ├── lsn_71
    │   ├── 000475.sst
    │   ├── 000547.sst
    │   ├── 000594.sst
    │   └── metadata.json
    └── lsn_72
        ├── 000595.sst
        ├── 000604.sst
        └── metadata.json

10 directories, 24 files

> cat ~/test-cluster-snapshots/1/latest.json
{
  "version": "V1",
  "lsn": 96,
  "partition_id": 1,
  "node_name": "Pavels-MacBook-Pro.local",
  "created_at": "2024-11-21T19:17:15.755049000Z",
  "snapshot_id": "snap_17IyaexCTaiY7ZvMJcsPdg5",
  "min_applied_lsn": 96,
  "path": "lsn_96"
}

> cat ~/test-cluster-snapshots/1/lsn_96/metadata.json
{
  "version": "V1",
  "cluster_name": "localcluster",
  "partition_id": 1,
  "node_name": "Pavels-MacBook-Pro.local",
  "created_at": "2024-11-21T19:17:15.755049000Z",
  "snapshot_id": "snap_17IyaexCTaiY7ZvMJcsPdg5",
  "key_range": {
    "start": 768614336404564651,
    "end": 1537228672809129301
  },
  "min_applied_lsn": 96,
  "db_comparator_name": "leveldb.BytewiseComparator",
  "files": [
    {
      "column_family_name": "",
      "name": "/000603.sst",
      "directory": "/Users/pavel/restate/restate/restate-data/Pavels-MacBook-Pro.local/db-snapshots/1/snap_17IyaexCTaiY7ZvMJcsPdg5",
      "size": 1268,
      "level": 0,
      "start_key": "64650000000000000001010453454c46",
      "end_key": "667300000000000000010000000000000002",
      "smallest_seqno": 6063,
      "largest_seqno": 6064,
      "num_entries": 0,
      "num_deletions": 0
    },
    {
      "column_family_name": "",
      "name": "/000587.sst",
      "directory": "/Users/pavel/restate/restate/restate-data/Pavels-MacBook-Pro.local/db-snapshots/1/snap_17IyaexCTaiY7ZvMJcsPdg5",
      "size": 1143,
      "level": 6,
      "start_key": "64650000000000000001010453454c46",
      "end_key": "667300000000000000010000000000000002",
      "smallest_seqno": 0,
      "largest_seqno": 0,
      "num_entries": 0,
      "num_deletions": 0
    }
  ]
}

Future work:

Implement fetching and bootstrapping from snapshot
Implement parallel multi-part upload
Implement trim-gap handling using snapshots

Closes: #2197

github-actions · 2024-11-18T11:13:09Z

Test Results

7 files ±0 7 suites ±0 4m 31s ⏱️ +13s
47 tests ±0 46 ✅ ±0 1 💤 ±0 0 ❌ ±0
182 runs ±0 179 ✅ ±0 3 💤 ±0 0 ❌ ±0

Results for commit 24abfde. ± Comparison against base commit 7a62c27.

♻️ This comment has been updated with latest results.

muhamadazmy

Thank you @pcholakov for creating this PR. It looks good to me! I left 2 very minor comments

crates/worker/src/lib.rs

muhamadazmy · 2024-11-18T11:43:31Z

crates/worker/src/partition/snapshots/repository.rs

+                    .into_string()
+                    .map(|path| format!("file://{path}"))
+            })
+            .map_err(|e| anyhow!("Unable to convert path to string: {:?}", e))?;


Suggested change

.map_err(|e| anyhow!("Unable to convert path to string: {:?}", e))?;

.context("Unable to convert path to string")?;

This will still include the 'inner' error in the output string when printed.

That approach doesn't work here because OsString::into_string() returns Result<String, OsString>, which doesn't meet Anyhow's trait bounds :-)

crates/worker/src/partition/snapshots/repository.rs

tillrohrmann

Thanks for creating this PR @pcholakov. The changes look really good. The one question I have is whether there is a way to avoid materializing the tarball and re-reading into memory. It would be awesome if we can stream the tarballing into the object-store upload.

crates/types/Cargo.toml

crates/worker/src/partition/snapshots/repository.rs

tillrohrmann · 2024-11-18T16:00:27Z

crates/worker/src/partition/snapshots/repository.rs

+    /// Write a partition snapshot to the snapshot repository.
+    pub(crate) async fn put(
+        &self,
+        partition_id: PartitionId,


Isn't partition_id already part of PartitionSnapshotMetadata?

Removed :-)

crates/worker/src/partition/snapshots/repository.rs

tillrohrmann · 2024-11-18T16:12:19Z

crates/worker/src/partition/snapshots/repository.rs

+        // todo(pavel): don't buffer the entire snapshot in memory!
+        let payload = PutPayload::from(tokio::fs::read(tarball.path()).await?);


That would indeed be great. Especially once we have larger snapshots.

ObjecStore already supports multi part upload, you can use that to upload the tar in chunks instead.

Implemented in the latest revision! 🎉

tillrohrmann · 2024-11-18T16:13:28Z

crates/worker/src/partition/snapshots/repository.rs

+        // the latest snapshot is always first.
+        let inverted_sort_key = format!("{:016x}", u64::MAX - lsn.as_u64());
+
+        // The snapshot data / metadata key format is: [<base_prefix>/]<partition_id>/<sort_key>_<lsn>_<snapshot_id>.tar


What's the idea for distinguishing full from incremental snapshots in the future? Would the latter have a completely different path or contain a marker file that denotes them as incremental?

I'm about to introduce this shortly to this PR - the key idea is to upload the tar archives and metadata JSON files separately, so that interested nodes can easily query just the metadata. We can gradually introduce additional attributes to the metadata JSON schema to support referencing the constituent parts of an incremental snapshot. The snapshot format version field within the metadata blob will allow nodes to know how to interpret it - or fail loudly if the Restate server is an older version that doesn't understand it.

The paths will be something like:

[<prefix>/]metadata/<partition_id>/<sort_key>-<snapshot_id>-{lsn}.json

[<prefix>/]snapshot/<partition_id>/<sort_key>-<snapshot_id>-{lsn}.tar

I imagine that at some point we'll add incremental snapshots and the repository format will then look something along the lines of:

[<prefix>/]metadata/<partition_id>/<sort_key>-<snapshot_id>-{lsn}.json (V2)

[<prefix>/]files/<partition_id>/<snapshot_id>-{filename}.sst

In this world, there will no longer be 1:1 metadata-to-snapshot correspondence but rather a 1:n relationship. Additionally, we may want to write some sort of index metadata to make it cheaper to garbage collect disused SSTs - but I haven't thought too much about that yet.

crates/worker/src/partition/snapshots/repository.rs

crates/worker/Cargo.toml

pcholakov

Thanks @tillrohrmann and @muhamadazmy for your early input, it was really valuable! I've pushed a new revision but I still want to remove tar archiving before I mark it ready for review.

crates/worker/src/lib.rs

crates/worker/Cargo.toml

pcholakov · 2024-11-20T18:00:48Z

crates/worker/src/partition/snapshots/repository.rs

+                    .into_string()
+                    .map(|path| format!("file://{path}"))
+            })
+            .map_err(|e| anyhow!("Unable to convert path to string: {:?}", e))?;


That approach doesn't work here because OsString::into_string() returns Result<String, OsString>, which doesn't meet Anyhow's trait bounds :-)

crates/worker/src/partition/snapshots/repository.rs

pcholakov · 2024-11-20T20:00:44Z

crates/worker/src/partition/snapshots/repository.rs

+                    let mut tarball = tar::Builder::new(NamedTempFile::new_in(&staging_path)?);
+                    debug!(
+                        "Creating snapshot tarball of {:?} in: {:?}...",
+                        &staging_path,


I've renamed staging_path to local_snapshot_path for clarity - that's the raw RocksDB column family export directory with the SSTs plus our own metadata JSON blob. We then tar that directory up into an archive at the path snapshot_archive_path.

crates/worker/src/partition/snapshots/repository.rs

pcholakov · 2024-11-20T20:04:00Z

crates/worker/src/partition/snapshots/repository.rs

+        // todo(pavel): don't buffer the entire snapshot in memory!
+        let payload = PutPayload::from(tokio::fs::read(tarball.path()).await?);


Implemented in the latest revision! 🎉

crates/worker/src/partition/snapshots/repository.rs

Substantial changes since initial revision

tillrohrmann

Thanks for creating this PR @pcholakov. The changes look really nice. I left a few minor comments. The one question I had was whether concurrent modifications of a snapshot metadata.json or the latest.json can be a problem (e.g. if an old and new leader upload a snapshot at the same time)?

crates/core/src/worker_api/partition_processor_manager.rs

tillrohrmann · 2024-11-22T16:40:01Z

crates/worker/src/partition/snapshots/repository.rs

+    /// Restate cluster name which produced the snapshot.
+    pub lsn: Lsn,


Comment seems to be a bit off.

The entire field is redundant! (We have min_applied_lsn below.)

crates/worker/src/partition/snapshots/repository.rs

tillrohrmann · 2024-11-22T17:00:59Z

crates/worker/src/partition/snapshots/repository.rs

+        debug!(
+            %lsn,
+            "Publishing partition snapshot to: {}",
+            self.destination,
+        );


You can instrument put via #[instrument()] and include the lsn, snapshot id, etc.

tillrohrmann · 2024-11-22T17:06:15Z

crates/worker/src/partition/snapshots/repository.rs

+        let put_result = self
+            .object_store
+            .put(&metadata_key, metadata_json_payload)
+            .await?;


Is there a possibility for two processes taking a snapshot for the same lsn (e.g. an old leader and a new one) which aren't exactly the same because the effective lsn is different? If this is possible, is this a problem?

Definitely! This is partly why I'm still on the fence about the exact snapshot naming scheme. One simple solution is to use snapshot IDs to disambiguate snapshots for the same LSN as they must be (modulo ULID collision) unique across nodes. I'd combine that with conditional put (only succeed if file does not exist) and complain loudly if it ever fails.

tillrohrmann · 2024-11-22T17:07:21Z

crates/worker/src/partition/snapshots/repository.rs

+        let put_result = self
+            .object_store
+            .put(&latest_path, latest_json_payload)
+            .await?;


Same question here but for different lsns. How are we gonna us e the latest.json? I could imagine how a slow old leader completes a snapshot after a new snapshot has been completed.

I have an idea here that hadn't made it into the PR just yet: just download the previous pointer and check that we aren't moving backwards. This should be enough to prevent the worst case of some node going to sleep mid-snapshot and wreaking havoc.

Since I wrote that comment, we have been blessed with proper S3 conditional put, so I rewrote the update logic to perform a CAS 🎉 I'm not doing this preemptively since this path should be uncontended, but the check is there as a defensive measure against going backwards and overwriting something we didn't mean to.

Sounds good :-)

tillrohrmann · 2024-11-22T17:08:28Z

crates/worker/src/partition/snapshots/repository.rs

+        for file in &snapshot.files {
+            let filename = file.name.trim_start_matches("/");
+            let key = object_store::path::Path::from(format!(
+                "{}/{}",
+                snapshot_prefix.as_str(),
+                filename
+            ));
+            let put_result = put_snapshot_object(
+                local_snapshot_path.join(filename).as_path(),
+                &key,
+                &self.object_store,
+            )
+            .await?;
+            debug!(
+                etag = put_result.e_tag.unwrap_or_default(),
+                ?key,
+                "Put snapshot data file completed",
+            );
+        }


Uploading multiple files concurrently, will probably only cause higher and less predictable resource utilization. And we aren't in a rush, I guess.

Agreed! It was easier to be more predictable with a single upload stream. The impact I'm most concerned about is the memory overhead. S3 advises using fairly large chunks - order 100MB - for maximum throughput so maybe it's worth looking into memory mapped IO down the line.

crates/worker/src/partition/snapshots/repository.rs

pcholakov · 2024-11-26T21:41:55Z

@tillrohrmann if you could take another look please, that would be great! I think I've covered all the comments:

added a unique snapshot ID to the snapshot path to guarantee uniqueness
eliminated Error suffix from various error variants
reusing a BytesMut buffer to minimize allocation (still sequential multipart upload though; ran out of time - let's park that a a future improvement if desired)
now performing an S3 CAS on pointer bump 🎉
a partial multipart upload is cleaned up on error

The partition snapshot prefix looks like this in S3 with the latest changes:

crates/worker/src/partition/snapshots/repository.rs

tillrohrmann

Thanks a lot for updating the PR @pcholakov. It looks really good to me. I think we are very close to merging it.

The last remaining questions I had were around resource management in case of failures when uploading snapshots. In particular, who is cleaning up partial snapshot artifacts (ssts) and when are we deleting the local snapshot files.

When conditionally updating the latest.json should we retry in case there was a concurrent modification?

I was also wondering whether we shouldn't configure a local snapshot directory if no destination was specified. That way users can control the snapshotting by configuring a valid destination.

tillrohrmann · 2024-11-28T22:09:07Z

crates/types/src/config/worker.rs

+    #[serde(flatten, skip_serializing_if = "HashMap::is_empty")]
+    pub additional_options: HashMap<String, String>,


With serde(flatten) this will act a bit as catch all fields specified under SnapshotOptions in the toml that aren't specifically defined in SnapshotOptions?

Ah, totally didn't see that because I haven't tested this option at all! I'm not sure how to handle it with serde but I'll find a way to make it work.

Did you test it and it works?

We don't seem to use this field anymore.

I clearly did all the precursor work but then forgot to delete the field - sorry about that! This is now removed.

tillrohrmann · 2024-11-28T22:15:09Z

crates/worker/src/partition/snapshots/repository.rs

+            base_dir
+                .join("pp-snapshots")
+                .into_os_string()
+                .into_string()
+                .map(|path| format!("file://{path}"))
+                .map_err(|e| anyhow!("Unable to convert path to string: {:?}", e))?


If we by default snapshot to local disk which is not accessible by every PP process, then we have to make the trim logic conditional on this fact. It will also require us to implement a snapshot exchange mechanism if it is only local because once we trim and then start new PPs then we need such a mechanism. Instead I would suggest to only create the SnapshotRepository, if the destination is configured. And it is the responsibility of the user to ensure that destination is accessible by all nodes.

This is sensible - it became quite apparent when I started working on the auto-trim-by-archived-LSN PR earlier. Partly this was motivated by wanting the CreateSnapshot RPC to just work out of the box, but it's not worth the potential confusion. I still think I would prefer automated trimming to be opt-in regardless of the existence of a valid SnapshotRepository but we can have that discussion in a separate PR.

tillrohrmann · 2024-11-28T22:16:53Z

crates/worker/src/partition/snapshots/repository.rs

+        // locations just work. This makes object_store behave similarly to the Lambda invoker.
+        let object_store: Arc<dyn ObjectStore> = if destination.scheme() == "s3"
+            && destination.query().is_none()
+            && snapshots_options.additional_options.is_empty()


What if additional_options contains some other settings but not the credentials and is therefore not empty?

I can make that work; there are a handful of config keys (region + access key) that are in conflict with the AWS config provider. It's a reasonable expectation to merge the configs.

tillrohrmann · 2024-11-28T22:17:31Z

crates/worker/src/partition/snapshots/repository.rs

+        // SDK credentials provider so that the conventional environment variables and config
+        // locations just work. This makes object_store behave similarly to the Lambda invoker.
+        let object_store: Arc<dyn ObjectStore> = if destination.scheme() == "s3"
+            && destination.query().is_none()


Why isn't it a good idea to rely on default credentials if the destination contains a query part?

The query part is how object_store configuration typically works - you can pass API keys and other bits of config as URL parameters. I explicitly didn't want to deal with merging config from two completely different config providers, I'm not even sure it's possible to do it in a sane way. The logic is that if you wish to override the config, then you own setting up all of it. I think that's reasonable behavior except I see that I've completely neglected to mention that in the SnapshotsOptions docs - I'll fix that.

tillrohrmann · 2024-11-28T22:26:52Z

crates/worker/src/partition/snapshots/repository.rs

+                &key,
+                &self.object_store,
+            )
+            .await?;


What happens with already uploaded files if we fail at this point? Will this leave an incomplete snapshot behind?

Correct; my thinking was that we have to deal with pruning the repository separately anyway and I would handle it there - but I can make a best-effort attempt at cleanup on upload. In general, I've altogether skipped hardening the snapshot path with things like retries. I was planning to do that as a follow-up but can certainly address it now if the PR is not getting too big.

tillrohrmann · 2024-11-28T22:33:13Z

crates/worker/src/partition/snapshots/repository.rs

+                None
+            }
+            Err(e) => {
+                bail!("Failed to get latest snapshot pointer: {}", e);


Wondering whether a warn logging might be good enough. Technically we did complete the snapshot. It just might be unused because latest.json isn't updated. How would the caller handle an error compared to an Ok(())?

Alternatively, we might wanna clean up the snapshot because it wouldn't be used because latest.json hasn't been updated.

I probably would rather make a best-effort attempt at cleaning up partially uploaded keys, and return error. As far as response semantics, what I hope to achieve is that the caller has an unambiguous confirmation that a snapshot exists, at a given LSN (I know technically we don't yet return the LSN in CreateSnapshotResponse but still.) I think this will become important in the future where the cluster controller might want to orchestrate snapshot/restore sequences across nodes.

I have one more follow up to make here - currently we return an error if the LSN is unchanged from the latest archived-LSN snapshot in the repository. That should be a no-op and return success, with the ID of the existing latest snapshot, basically making it an idempotent no-op operation to call repeatedly, even if the log is not moving.

tillrohrmann · 2024-11-28T22:36:33Z

crates/worker/src/partition/snapshots/repository.rs

+        let put_result = self
+            .object_store
+            .put_opts(&latest_path, latest_json, conditions)
+            .await?;


What's the contract of the put method wrt to updating the latest.json because I've left a few comments regarding this. Is your intention that a successfully uploaded snapshot must update the latest.json and if it fails, then the whole put method failed?

If there was a concurrent update, shouldn't we retry until we know that there is a newer snapshot?

Up till your comment, I was thinking of the put contract as returning the status of whether the specific create-snapshot request succeeded, or not. But if you zoom out a bit, the caller really only cares that a snapshot exists, at some LSN, in the shared repository.

Partly also, in the current world, we don't expect any contention on latest.json so this is mainly just a paranoid defensive line against a node experiencing a really long pause between starting a snapshot, and trying to bump the latest pointer - long enough that another processor has become the leader, and taken over snapshotting.

I think a perfectly reasonable fallback here is that, if there is a concurrent update, we just read the latest value and return that to the caller. I'll update the code to behave like that.

crates/worker/src/partition/snapshots/snapshot_task.rs

tillrohrmann · 2024-11-28T22:41:40Z

crates/worker/src/partition/snapshots/repository.rs

+    debug!("Performing multipart upload for {key}");
+    let mut upload = object_store.put_multipart(key).await?;
+
+    let mut buf = BytesMut::new();


If you pass in this buffer into this method, then you can reuse it across uploading multiple files and don't have reallocate it for every file again.

Good call! Initially, I was thinking of doing multiple puts in parallel so deliberately did not reuse this.

Subsequently, I changed my thinking around concurrency: I think we should optimise the restore path for maximum throughput as we want a cold Partition Processor to get up to speed ASAP, but on the create snapshot path we should rather optimise for minimal disruption to the ongoing Restate request processing. Let me know if you disagree with the thinking here.

tillrohrmann · 2024-11-28T22:58:50Z

crates/worker/src/partition/snapshots/repository.rs

+        loop {
+            let mut len = 0;
+            buf.reserve(MULTIPART_UPLOAD_CHUNK_SIZE_BYTES);
+
+            // Ensure full buffer unless at EOF
+            while buf.len() < MULTIPART_UPLOAD_CHUNK_SIZE_BYTES {
+                len = snapshot.read_buf(&mut buf).await?;
+                if len == 0 {
+                    break;
+                }
+            }
+
+            if !buf.is_empty() {
+                upload
+                    .put_part(PutPayload::from_bytes(buf.split().freeze()))
+                    .await?;
+            }
+
+            if len == 0 {
+                break;
+            }
+        }


The logic looks sound to me :-)

Great! I had some extra logging to make sure we are definitely not allocating more into the buffer but it's a new API I haven't worked with before 😅

pcholakov · 2024-12-05T14:35:09Z

@tillrohrmann

Have to ask again about the CAS loop because I don't understand how it currently will work.

I was answering something slightly different, my apologies! I think this is the relevant answer: #2310 (comment) :-)

…tion

… conditional updates)

The SnapshotRepository retry policy is set to 60s total timeout.

…est S3 etag conditional update support

pcholakov

Ok, I really think this is everything covered now 😅

pcholakov · 2024-12-05T17:07:52Z

crates/worker/src/partition/snapshots/repository.rs

+        );
+        snapshot2.min_applied_lsn = snapshot1.min_applied_lsn.next();
+
+        repository.put(&snapshot2, source_dir).await?;


No words. 🤦‍♂️

I added coverage and made the LatestSnapshot struct construction a lot nicer in the process. Thank you for flagging this!

pcholakov · 2024-12-05T17:15:07Z

Cargo.toml

@@ -139,6 +140,7 @@ metrics-exporter-prometheus = { version = "0.15", default-features = false, feat
    "async-runtime",
 ] }
 moka = "0.12.5"
+object_store = { version = "0.11.1", features = ["aws"] }


Since we are not officially supporting anything other than S3, it was easier to just not compile the other providers. The file provider is always enabled, it seems.

pcholakov · 2024-12-05T17:15:19Z

benchmarks/Cargo.toml

@@ -23,7 +23,7 @@ arc-swap = { workspace = true }
 futures = { workspace = true }
 futures-util = { workspace = true }
 http = { workspace = true }
-pprof = { version = "0.13", features = ["criterion", "flamegraph"] }
+pprof = { version = "0.14", features = ["criterion", "flamegraph"] }


Unrelated RUSTSEC update.

Thanks for fixing this :-)

tillrohrmann

Great work @pcholakov. The changes look good to me. +1 for merging :-)

tillrohrmann · 2024-12-05T17:54:52Z

benchmarks/Cargo.toml

@@ -23,7 +23,7 @@ arc-swap = { workspace = true }
 futures = { workspace = true }
 futures-util = { workspace = true }
 http = { workspace = true }
-pprof = { version = "0.13", features = ["criterion", "flamegraph"] }
+pprof = { version = "0.14", features = ["criterion", "flamegraph"] }


Thanks for fixing this :-)

pcholakov requested review from tillrohrmann and muhamadazmy November 18, 2024 10:56

muhamadazmy previously approved these changes Nov 18, 2024

View reviewed changes

tillrohrmann reviewed Nov 18, 2024

View reviewed changes

muhamadazmy reviewed Nov 20, 2024

View reviewed changes

crates/worker/Cargo.toml Outdated Show resolved Hide resolved

pcholakov force-pushed the refactor/snapshots-to-ppm branch from 9f6d162 to d686e7e Compare November 20, 2024 16:57

pcholakov commented Nov 20, 2024

View reviewed changes

pcholakov force-pushed the feat/snapshot-upload branch from 768bddf to 56e659f Compare November 20, 2024 20:05

pcholakov commented Nov 20, 2024

View reviewed changes

crates/worker/src/partition/snapshots/repository.rs Outdated Show resolved Hide resolved

Base automatically changed from refactor/snapshots-to-ppm to main November 21, 2024 10:54

pcholakov force-pushed the feat/snapshot-upload branch from 56e659f to cee99e6 Compare November 21, 2024 19:18

pcholakov requested review from muhamadazmy and tillrohrmann November 21, 2024 19:22

pcholakov changed the title ~~Introduce SnapshotsRepository backed by object_store~~ Introduce SnapshotRepository and object store integration Nov 21, 2024

pcholakov marked this pull request as ready for review November 21, 2024 19:22

pcholakov force-pushed the feat/snapshot-upload branch from 76f4843 to 38268d6 Compare November 22, 2024 12:58

tillrohrmann reviewed Nov 22, 2024

View reviewed changes

crates/worker/src/partition/snapshots/repository.rs Outdated Show resolved Hide resolved

pcholakov force-pushed the feat/snapshot-upload branch 3 times, most recently from defc6ee to 7291ede Compare November 26, 2024 21:35

pcholakov commented Nov 26, 2024

View reviewed changes

crates/worker/src/partition/snapshots/repository.rs Show resolved Hide resolved

pcholakov requested a review from tillrohrmann November 26, 2024 21:38

pcholakov force-pushed the feat/snapshot-upload branch from 7291ede to ff6d9ce Compare November 27, 2024 12:43

pcholakov commented Nov 27, 2024

View reviewed changes

crates/worker/src/partition/snapshots/repository.rs Outdated Show resolved Hide resolved

pcholakov mentioned this pull request Nov 27, 2024

Implement partition store restore-from-snapshot #2353

Merged

tillrohrmann reviewed Nov 28, 2024

View reviewed changes

pcholakov added 21 commits December 5, 2024 18:06

Update paths to remove SK, simplify structure

ee6f413

Move metadata serialization to repository

98e72e2

Maintain a latest.json pointer object

afd6507

reformat

18df90f

Addressing various PR comments

ee9c22f

Update snapshot object store paths to include padded LSN and id

723ade7

Prevent latest snapshot pointer from moving backwards

56ade31

Reuse a BytesMut buffer for upload chunk data

03c1cfc

Add PUT conditions to make the latest.json pointer update a CAS opera…

5252df6

…tion

Fix conditional put for latest to use UpdateVersion

cfd8652

Enable S3 conditional updates with latest object_store

bc62ccc

Allow git source for arrow-rs (no release of object_store includes S3…

f1d1f8c

… conditional updates)

Add snapshot upload progress tracking and best-effort cleanup on error

4f27d19

Update CreateSnapshot RPC request timeout to 90s

235565c

The SnapshotRepository retry policy is set to 60s total timeout.

Reuse allocated upload buffer across files

063ba17

Address latest PR feedback

db9975e

Bump pprof (RUSTSEC-2024-0408)

44f2c01

Refactor S3/local file destination latest snapshot update test

6745ad9

Add comments to SnapshotRepository

1a25a1d

Revert to object_store 0.11 which does indeed work great with the lat…

9889aa5

…est S3 etag conditional update support

Validate object store uploads

aee1a12

pcholakov force-pushed the feat/snapshot-upload branch from 3a87e07 to aee1a12 Compare December 5, 2024 17:06

pcholakov commented Dec 5, 2024

View reviewed changes

pcholakov requested a review from tillrohrmann December 5, 2024 17:11

Self-review updates

24abfde

pcholakov commented Dec 5, 2024

View reviewed changes

tillrohrmann approved these changes Dec 5, 2024

View reviewed changes

pcholakov merged commit 1cdb6f8 into main Dec 6, 2024
13 checks passed

pcholakov deleted the feat/snapshot-upload branch December 6, 2024 08:49

	.map_err(\|e\| anyhow!("Unable to convert path to string: {:?}", e))?;
	.context("Unable to convert path to string")?;

		// todo(pavel): don't buffer the entire snapshot in memory!
		let payload = PutPayload::from(tokio::fs::read(tarball.path()).await?);

		/// Restate cluster name which produced the snapshot.
		pub lsn: Lsn,

		#[serde(flatten, skip_serializing_if = "HashMap::is_empty")]
		pub additional_options: HashMap<String, String>,

Introduce SnapshotRepository and object store integration #2310

Introduce SnapshotRepository and object store integration #2310

Conversation

pcholakov commented Nov 18, 2024 • edited Loading

github-actions bot commented Nov 18, 2024 • edited Loading

Test Results

muhamadazmy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muhamadazmy Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcholakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcholakov Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcholakov Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcholakov commented Nov 26, 2024

tillrohrmann left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcholakov commented Dec 5, 2024

pcholakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tillrohrmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcholakov commented Nov 18, 2024 •

edited

Loading

github-actions bot commented Nov 18, 2024 •

edited

Loading

muhamadazmy Nov 20, 2024 •

edited

Loading

pcholakov Nov 25, 2024 •

edited

Loading

pcholakov Nov 25, 2024 •

edited

Loading

tillrohrmann left a comment •

edited

Loading

tillrohrmann Nov 28, 2024 •

edited

Loading