Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added flag to support sharded migrations via cli #913

Conversation

bharadwaj-aditya
Copy link
Collaborator

@bharadwaj-aditya bharadwaj-aditya commented Oct 7, 2024

Added support to perform sharded migrations from SMT CLI.
This defaults to not adding the migration_shard_id in the primary key.

Options to add in the primary key will be taken on as follow up work.

  • Tests pass
  • Appropriate changes to README are included in PR

@bharadwaj-aditya bharadwaj-aditya requested a review from a team as a code owner October 7, 2024 04:54
@bharadwaj-aditya bharadwaj-aditya requested review from manitgupta and Deep1998 and removed request for a team October 7, 2024 04:54
removed extra files

added spanner migration tool to gitignore
moving rules to internal

corrected tests
@pull-request-size pull-request-size bot added size/M and removed size/L labels Oct 9, 2024
@@ -59,18 +59,32 @@ func (sads *SchemaFromSourceImpl) schemaFromDatabase(migrationProjectId string,
var infoSchema common.InfoSchema
var err error
isSharded := false
fmt.Printf("fetching schema from database %+v\n", sourceProfile.Config.ShardConfigurationBulk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might print sensitive details to the log? Please check print statements added (in other parts of PR as well)

@@ -48,6 +48,8 @@ such as `avro` etc.
Please note that streaming migration is only supported for MySQL, Oracle and PostgreSQL databases currently.
Here is an example of a [streamingCfg JSON](./config-json.md#streamingcfg-for-non-sharded-minimal-downtime-migrations) and [how to use it in the CLI](./schema-and-data.md#examples).

* **`shardingConfig`**: Specifies the sharding config file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an example sharding config here.

Comment on lines +811 to +826
shardIdConfig, ok2 := params["shardIdConfig"]
if ok2 {
switch shardIdConfig {
case constants.SHARD_ID_PREFIX:
config.ShardIdConfig = shardIdConfig
case constants.SHARD_ID_SUFFIX:
config.ShardIdConfig = shardIdConfig
case constants.SHARD_ID_NONE:
config.ShardIdConfig = shardIdConfig
default:
return SourceProfile{}, fmt.Errorf("invalid shard id config passed")
}
} else {
//Default to setting shard id config to none
config.ShardIdConfig = constants.SHARD_ID_NONE
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n.NewSourceProfileConfig seems to be a misnomer now since it is supposed to return a SourceProfileConfig but the SourceProfileConfig is now actually a combination of shardIdConfig and ShardConfiguration* (bulk, dms or dataflow). The shardIdConfig only gets added in later..

We should ideally rename NewSourceProfileConfig to something like parseShardConfigurationFile or shift this logic inside it so that it still returns the completed SourceProfileConfig implementation.

@manitgupta
Copy link
Member

I am actually a bit confused on what this PR is trying to do. I think I initially misunderstood this as adding the ability to modify primary key, but it seems like it just allows user to do sharded migrations via CLI by passing in the cfg?

If so, this is already supported today via the config= flag.

https://screenshot.googleplex.com/5cD3UUPDhhX5nJU

@bharadwaj-aditya
Copy link
Collaborator Author

I am actually a bit confused on what this PR is trying to do. I think I initially misunderstood this as adding the ability to modify primary key, but it seems like it just allows user to do sharded migrations via CLI by passing in the cfg?

If so, this is already supported today via the config= flag.

https://screenshot.googleplex.com/5cD3UUPDhhX5nJU

It started with that idea, and pulled back to break into 2 PR's as the larger PR was getting too big owing to a bunch of refactoring.

But i agree, this PR doesnt seem super useful without the overall flow, so I'll probably close this and just raise the larger PR directly.

@bharadwaj-aditya bharadwaj-aditya deleted the sharding-cli branch October 10, 2024 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants