-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Scylla API for restore #4192
Draft
Michal-Leszczynski
wants to merge
25
commits into
ml/scylla-api
Choose a base branch
from
ml/restore-scylla-api
base: ml/scylla-api
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix(backup_test): add missing 'Integration' suffix to tests Some tests were missing the Integration suffix in their names. This resulted in not including them in the 'make pkg-integration-test' command used when running tests on gh actions. * refactor(testutils): export CheckAnyConstraint It is also useful for backup svc tests. * fix(backup_test): skip TestBackupSkipSchemaIntegration for older Scylla versions
This adds /cloud/metadata api call to agent which should return cloud instance metadata, such as instance_type and cloud_provider. Refs: #4130
This log does not contain any useful information, but it clogs the log files since checking for closest DC is done during every fresh scyllaclient creation, which is done by the config cache service every minute.
For Scylla to access object storage, it needs to be configured in the 'object_storage.yaml' config file.
A separate column for Scylla task ID is needed because: - it has a different type from agent job ID - it make it clear which API was used
Those methods consist of both: - direct Scylla backup API call - helper Scylla Task Manager API calls
Michal-Leszczynski
force-pushed
the
ml/restore-scylla-api
branch
2 times, most recently
from
January 8, 2025 14:56
0792ccb
to
3529e52
Compare
When working with Rclone, SM specifies just the provider name, and Rclone (with agent config) resolves it internally to the correct endpoint. This made it so user didn't need to specify the exact endpoint when running SM backup/restore tasks. When working with Scylla, SM needs to specify resolved host name on its own. This should be the same name as specified in 'object_storage.yaml' (See https://github.com/scylladb/scylladb/blob/92db2eca0b8ab0a4fa2571666a7fe2d2b07c697b/docs/dev/object_storage.md?plain=1#L29-L39). In order to maximize compatibility and UX, we still want it to be possible to specify just the provider name when running backup/restore. In such case, SM sends provider name as the "endpoint" query param, which is resolved by agent to proper host name when forwarding request to Scylla. Different "endpoint" query params are not resolved. Note that resolving "endpoint" query param in the proxy is just for the UX, so it might not work correctly in all the cases. In order to ensure correctness, "endpoint" should be specified directly by SM user so that no resolving is needed.
Scylla backup API can be used when: - node exposes Scylla backup API - s3 is the used provider - backup won't create versioned files
Some tests used interceptor for given paths in order to wait/block/check some API calls. Those interceptors were updated to also look for Scylla backup API paths.
Using Scylla backup API does not result in changes to Rclone transfers, rate limiting or cpu pinning, so it shouldn't be checked as a part of the restore test.
This is a simple test for checking whether the correct API is used during the backup.
Michal-Leszczynski
force-pushed
the
ml/restore-scylla-api
branch
from
January 9, 2025 15:50
3529e52
to
3475e2f
Compare
When new restore task is executed, it should have its own task ID and run ID, but the cluster ID should remain the same. This commit fixes an autofill typo from the past. It was discovered because it affected the config cache service.
Otherwise, we panic inside updateSingle method. This commit also contains a small test for testing this behavior.
This is required for testing Scylla restore API as it does not work with integer based SSTables.
A separate column for Scylla task ID is needed because: - it has a different type from agent job ID - it make it clear which API was used
This commit extends SSTable structure with its TOC component, which is needed when using Scylla restore API. Moreover, it introduces batch types, which are also needed for deciding, whether given batch can be restored with Scylla restore API or the Rclone API. It also makes sure that all SSTables within the same batch belong to the same batch type.
Michal-Leszczynski
force-pushed
the
ml/restore-scylla-api
branch
from
January 9, 2025 16:18
3475e2f
to
c735fb2
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP