Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent DPU Upgrade HLD #1906

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Conversation

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@KrisNey-MSFT
Copy link

Hi @hdwhdw - are you looking for a Reviewer for this PR?

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@hdwhdw hdwhdw marked this pull request as ready for review February 7, 2025 21:33
@hdwhdw
Copy link
Author

hdwhdw commented Feb 7, 2025

@KrisNey-MSFT yes please. Thank you :)

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

### 2. Scope

This document describes the high-level design of the sequence to independently upgrade a SmartSwitch DPU with minimal impact to other DPUs and the NPU, through GNOI API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly define dependencies and any ordering constraints to prevent unexpected failures.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Let me know if there are any other dependencies.

* 'System.SetPackage'
* 'OS.Activate'
* 'Containerz.Deploy'
* Rollback:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollback

What happens if rollback fails or the previous image is corrupted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sentence for that. If Activate fail, we should SetPackage (install) the previous image and Activate again.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

3. DPU and NPU image compatibility: The upgrade process assumes that the DPU and NPU images are compatible with each other. It is up to the client to ensure the compatibility of the images.
4. Eliminating human intervention: The upgrade process may require human intervention to resolve issues that cannot be handled automatically, in particular, when both the upgrade process fails and the rollback process fails, the system may be left in an inconsistent state that requires manual intervention.

### 6. Architecture Degn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture Design

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch.

* 'Containerz.Deploy'
* Rollback:
* Rollback the new SONiC image on the DPU. Client issues 'OS.Activate' with the old SONiC image.
* Rollback the new offloaded container images on the NPU. Client issues 'Containerz.RemoveImage' with the old container images.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why rollback new offloaded img will issue RemoveImage with old img?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to undo the step. RemoveImage is optional and mainly to save spaces.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@hdwhdw hdwhdw changed the title DRAFT: Independent DPU Upgrade HLD Independent DPU Upgrade HLD Feb 13, 2025
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

qiluo-msft pushed a commit to sonic-net/sonic-gnmi that referenced this pull request Feb 14, 2025
#### Why I did it
Supports system.SetPackage, see sonic-net/SONiC#1906

This PR also improve ease of use for client by not using (in this `System.SetPackage` specifically) the generic `-jsonin` flag where we pass a json string, in favor of a more readable and verifiable flags like `-package_filename` and `-package_url` .

#### How I did it
Client side code for system.SetPackage. 

#### How to verify it

On physical switches:
```
admin@switch:~$ docker exec gnmi gnoi_client -target 127.0.0.1:50052 -logtostderr -insecure -module System -rpc SetPackage
System SetPackage
Error validating flags:  missing -package_filename

admin@switch:~$ docker exec gnmi gnoi_client -target 127.0.0.1:50052 -logtostderr -insecure -module System -rpc SetPackage -package_filename=filename
System SetPackage
Error validating flags:  missing -package_version

admin@switch:~$ docker exec gnmi gnoi_client -target 127.0.0.1:50052 -logtostderr -insecure -module System -rpc SetPackage -package_filename=filename -package_version=abc -package_activate=true -package_url=abc.com
System SetPackage
Error receiving response:  rpc error: code = Unimplemented desc =
```

Server already have placeholder code so the unimplemented error is expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants