Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(1-1-restore): adds 1-1-restore cli and service stub #4221

Open
wants to merge 18 commits into
base: 1-1-restore-feature-branch
Choose a base branch
from

Conversation

VAveryanov8
Copy link
Collaborator

@VAveryanov8 VAveryanov8 commented Jan 20, 2025

This adds 1-1 restore cli and stubs.
For now cli just schedules the 1-1 restore task type and when task is executed, service logs "Not yet implemented".

1-1-restore is added as a subcommand to the restore cli, so it looks like below:

$ sctool.dev restore 1-1-restore -c my-cluster --source-cluster-id 0ee498b4-d6b6-4f92-9c19-3118b72d925f --nodes-mapping pkg/command/one2onerestore/testdata/nodesmapping_oneline.txt -L s3:test-bucket -T sm_20241122172732UTC

Note: I've temporarily used v3/pkg/managerclient from the va/fast-restore-part-1 branch as requried changed not in the master yet

P.S. Feel free to suggest another name for this cli and service 😄


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

@VAveryanov8 VAveryanov8 force-pushed the va/fast-restore-part-2 branch from f07e746 to 94d7777 Compare January 21, 2025 10:09
@VAveryanov8 VAveryanov8 marked this pull request as ready for review January 21, 2025 14:40
@VAveryanov8 VAveryanov8 changed the title feat(fastrestore): adds fastrestore cli and service stub feat(fastrestore): adds 1-1-restore cli and service stub Jan 23, 2025
@VAveryanov8 VAveryanov8 changed the title feat(fastrestore): adds 1-1-restore cli and service stub feat(1-1-restore): adds 1-1-restore cli and service stub Jan 23, 2025
pkg/command/one2onerestore/cmd.go Outdated Show resolved Hide resolved
pkg/command/one2onerestore/cmd.go Show resolved Hide resolved
pkg/command/one2onerestore/nodesmapping_test.go Outdated Show resolved Hide resolved
pkg/command/one2onerestore/res.yaml Outdated Show resolved Hide resolved
pkg/command/one2onerestore/res.yaml Outdated Show resolved Hide resolved
SourceClusterID string `json:"source_cluster_id"`
SnapshotTag string `json:"snapshot_tag"`
NodesMapping []nodeMapping `json:"nodes_mapping"`
BatchSize int `json:"batch_size,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think batch size is needed, it's ok to call RClone to copy all SSTables of the given keyspace/table of a given node at once.
RClone tracks the progress on its own, SM just calls for the progress using jobID.

Parallel int `json:"parallel,omitempty"`
Transfers int `json:"transfers"`
RateLimit []DCLimit `json:"rate_limit,omitempty"`
AllowCompaction bool `json:"allow_compaction,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don;t need this parameter.
Compaction should be disabled to not waste CPU cycles and disk IO.
But it doesn't have an impact on the correctness of the restore.

@VAveryanov8 VAveryanov8 changed the base branch from master to va/fast-restore-part-1 January 29, 2025 14:16
This introduces new cli command for fast vnode restore procedure.

Fixes: #4200
This adds support for the fast restore task type to the task scheduler
This is the result of running `make docs`
This renames fastrestore to 1-1 restore and also makes it a subcommand
of restore cmd.
This describes what 'same topology' means. Also adds references to new
subcommands to the 'restore' cli page.
…s uuid

This removes 1-1-restore options that are not needed at the moment (we
can added them later if nedded).
Also changed source-cluster-id type to uuid.UUID
@VAveryanov8 VAveryanov8 force-pushed the va/fast-restore-part-2 branch from 37409c0 to 7e9f877 Compare January 29, 2025 14:35
@VAveryanov8
Copy link
Collaborator Author

@karol-kokoszka @Michal-Leszczynski
I've removed fields from Target that are not needed at the moment for 1-1-restore. (If it turns out that some of them are actually needed, I'll add them later.) and addressed your code review comments. Please review one more time

This dowgrades some of the deps that was updated by accident, so now it
looks more like the master branch.
Comment on lines 15 to 19
The `<dc>` parameter is optional. It allows you to specify the datacenter whose nodes will be used to restore the data
from this location in a multi-dc setting, it must match Scylla nodes datacenter.
By default, all live nodes are used to restore data from specified locations.

Note that specifying datacenters closest to backup locations might reduce download time of restored data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it applies to the 1-1-restore.
Since we have node mapping, we shouldn't allow to specify the location DC.

Comment on lines 39 to 41
if _, err := SnapshotTagTime(t.SnapshotTag); err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use IsSnapshotTag instead.

if _, err := SnapshotTagTime(t.SnapshotTag); err != nil {
return err
}
if t.SourceClusterID.String() == "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can check if it's unset by t.SourceClusterID == uuid.Nil

"github.com/scylladb/scylla-manager/v3/pkg/util/uuid"
)

// Service for the 1-2-1 restore.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's keep the same naming convention (1-1-restore or one2onerestore).
I guess that we prefer the 1-1-restore and use it when dashes are not a problem.

Comment on lines 51 to 53
// One2OneRestore creates and initializes worker performing 1-2-1 vnode restore with given properties.
func (s *Service) One2OneRestore(ctx context.Context, clusterID, taskID, runID uuid.UUID, properties json.RawMessage) error {
s.logger.Info(ctx, "1-1 restore",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: align 1-1-restore naming convention (also in the log).

@@ -27,6 +27,7 @@ const (
UnknownTask TaskType = "unknown"
BackupTask TaskType = "backup"
RestoreTask TaskType = "restore"
One2OneRestoreTask TaskType = "1_1_restore"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: yet another version of writing the 1-1-restore.

This adds usage of IsSnapshotTag and uuid.Nil. Also aligns 1-1-restore
service naming convention across the code, comment and docs.
Base automatically changed from va/fast-restore-part-1 to 1-1-restore-feature-branch February 3, 2025 11:31
Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !
I have no major concerns.

healthSvc *healthcheck.Service
backupSvc *backup.Service
restoreSvc *restore.Service
one2OneRestoreSvc *one2onerestore.Service
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the interface for the one2OneRestoreSvc.Service (like Servicer) and reference it here instead of the struct ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can easily do this, but I'd like to clarify one detail in advance. I would like to define interface in scylla-manger/server.go package, but I see that in the SM code base it's often other way around (e.g ConfigCacher defined in configcache pkg). i think it's better to define interfaces on the consumer side https://go.dev/wiki/CodeReviewComments#interfaces

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the practice in go is to define the interface in the package that consumes the service, but here, the interface defines the API for the whole service which definition and implementation stays in the "one2onerestore" package.
So, for the clarity of what this package is expected to do, I prefer to have this interface that defines the API.

Returning interface from the constructor aligns with the intention to make the service’s API explicit and maintain a clear package boundary, even though it deviates a bit from the “accept interfaces, return concrete types” idiom.

Btw, packages like net/http http.Handler (defined in the same package).
Another example of the same approach (breaking the idiom), is in go-kit https://github.com/go-kit/examples/blob/master/addsvc/pkg/addservice/service.go

Copy link
Collaborator Author

@VAveryanov8 VAveryanov8 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the practice in go is to define the interface in the package that consumes the service, but here, the interface defines the API for the whole service which definition and implementation stays in the "one2onerestore" package.
So, for the clarity of what this package is expected to do, I prefer to have this interface that defines the API.

That make sense and it is a good intention, but why public methods of a Service struct doesn't define API and boundaries of a "one2onerestore" package?

In provided examples, they all have different pattern - interface is public, but implementation is private. In std lib it make sense - when you promise backward compatibility you don't want to expose some helper structs that can be changed in near future, e.g. http.FileServer. (and they use this pattern only for some helper functions).
In go-kit they use public interface and private struct for the services (business logic), because they have a lot of wrappers around Service, to hide details about various transports (e.g. http, grpc) that can be used, see https://github.com/go-kit/examples/blob/master/addsvc/pkg/addendpoint/set.go#L34

P.S. I'm not trying to argue for the sake of arguing or to resist making the change, I'm modestly trying to understand and learn why we want to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants