-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clickhouse-backup: Add support for sharded backup #648
Conversation
1984421
to
82d747d
Compare
My bad pushing this without actually checking things out - I've made some amendments and did some testing
I hope you don't mind that I didn't add an integration test, as I noticed that the setup is single replica (?) |
this is massive commit need time to figure out how exactly it works |
Please notofiy, if you don't have time to resolve conflicts and make pull request changes |
@mskwon any updates? |
May I please have a little more time? I've started the changes though would like to add some different sharding modes, as I've realized that the consistency issue you pointed out could be pretty bad (for a hypothetical star schema, perhaps this could lead to the dimension and fact tables to become inconsistent - esp as transactions is experimental in clickhouse IIRC) - I'd like to make it possible to shard by database / have an option to keep certain tables backed up by the same replica. |
@mskwon great glad to hear it I mean when execute backup, then backup data only on the first replica in shard |
@mskwon any news about this PR? |
oh, we rollback some zerolog related commits in master |
looks like you tried to rebase |
Would it be possible to rerun the CI? I'm seeing a data race in the testflows related to the server, which I'm guessing is to be fixed by the zerolog changes |
I rollback zerolog migration (git reset --hard, git push --force origin master) and move these commits to separate branch and separate WIP PR Could you rebase your commits? |
Pull request still show 11 commits instead of 2 |
Apologies for the poor responsiveness here, I was taking some time off. It seems like the recent tests failed due to some S3 state
|
so, will we continue work for this PR? |
Hey, it looks like what might be happening is that the |
Hmmm, I rebased this off of your fix for
Running the following locally seems to have TestFIPS passing
I'm not sure what is going on with my local setup, but running the following produces an error
Still trying to debug... |
Hmmm, seeing a different error now with your fixes:
|
You need to create GCS bucket in https://console.cloud.google.com and define the following 3 secrets in your fork
|
Ah - my bad for not specifying, but this is for the Github actions - seems like QA_GCS_OVER_S3_BUCKET in the continuous integration .env might be hitting an issue |
Hmmm, seems like the S3 bucket error came up again with the CI on TestFIPS:
It's strange because it seems like the CI is passing for your changes, but I don't think this change impacts any of the tests unless there is some sort of matrix of configurations which is being tested which turns on the sharded operation mode. Is it the case that the test infrastructure would test sharded operation modes even without modification to the tests? |
create |
|
also please merge latest master |
|
Which OS do you use? what show Could you merge the latest master to your branch? Did you force push after |
I'm using MacOS
The latest push should be on top of Altiniy:master, at least according to this? |
temporary replace |
please merge latest master changes (TestFIPS should skip for your for) and properly resolve conflicts |
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state. Fixes Altinity#639 clickhouse-backup/backup_shard: Use Array for active replicas clickhouse-go v1 does not support clickhouse Map types. Force the Map(String, UInt8) column replica_is_active to a string array for now. clickhouse-backup/backuper: Skip shard assignment for skipped tables Skip shard assignment for skipped tables. Also add the new ShardBackupType "ShardBackupNone", which is assigned to skipped tables clickhouse-backup/backuper: Use b.GetTables for CreateBackup Use b.GetTables for CreateBackup instead of b.ch.GetTables and move b.populateBackupShardField to b.GetTables so as to populate the field for the server API. backup: Addressing changes for adding sharding support Add in different sharded operation modes to give users the ability to specify granularity of sharding
Pull Request Test Coverage Report for Build 5743982693Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
thanks for contributing |
added to changelog: Add `SHARDED_OPERATION_MODE` option, to easy create backup for sharded cluster, available values none (no sharding), table (table granularity), database (database granularity), first-replica (on the lexicographically sorted first active replica), thanks @mskwon, fix [639](#639), fix [648](#648)
Thank you for your patience and the huge amount of help with this, Slach |
This change adds a new configuration 'general.sharded_operation' which shards tables for backup across replicas, allowing for a uniform backup and restore call to the server without consideration for table replication state.
Fixes #639