From 5be9d4a713355972771bc7a1563af622e3bb3855 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Tue, 25 Jun 2024 16:49:19 -0700 Subject: [PATCH 01/12] add gitops feature spec Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 331 +++++++++++++++++++++++++++ 1 file changed, 331 insertions(+) create mode 100644 tools/2024-06-gitops-feature-spec.md diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md new file mode 100644 index 0000000..3cb3c3a --- /dev/null +++ b/tools/2024-06-gitops-feature-spec.md @@ -0,0 +1,331 @@ +# Radius Integration with GitOps + +* **Status**: Pending +* **Author**: Will Tsai (@willtsai) + +## Summary + +Continuous deployment of cloud native applications and infrastructure is challenging for a number of reasons. GitOps is a popular set of practices, implemented as popular tools, like Flux and ArgoCD, that mitigates these challenges for enterprise application teams that use git for source control and Kubernetes for orchestration of software containers. GitOps provides a developer-centric experience by abstracting and automating many of the tasks required for deploying and operating Kubernetes and its underly dependent infrastructure. The core concept of GitOps is to rely on a git repository that serves as a single source of truth: i.e. it contains current declarative descriptions of the required infrastructure for a given production environment. It also contains a description of the workflow required to prevent drift between the repo and the production environment. It's like having cruise control for managing applications in production. + +Enterprises that use GitOps and also want to use Radius do not currently have a clear path for how to use both technologies in an integrated and complementary way to enable a “better together” experience. + +### Top level goals + +- Enable enterprises to get the best of both GitOps and Radius for a better-together experience. +- Provide a consistent Radius + GitOps integration model for both existing and future GitOps platforms. + +### Non-goals (out of scope) + +- We are not trying to replace GitOps tools or platforms. We are trying to integrate Radius with existing GitOps tools and platforms. +- We are not trying to replace GitOps practices. We are trying to enable Radius to work well with GitOps practices. + +## Customer profile and challenges + + +### Customer persona(s) + +**Infrastructure operators / administrators** - In enterprise applications team that use GitOps, operators are responsible for configuring and managing the git repos for infrastructure and application configuration. They configure GitOps policies as well as the Kubernetes manifests, Helm charts, etc. that are applied to the application and the Kubernetes cluster. + +**Application developers** - Responsible for designing, developing and maintaining application code. Developers use GitOps by checking files into their git repositories to configure GitOps settings to auto deploy and monitor their application and dependent infrastructure. + +**Site Reliability Engineers** - They leverage GitOps by checking in configuration files into their git repositories that add information for the customization of the workload, i.e. replica pod number, strategic merge solutions. + +### Challenge(s) faced by the customer + +Before the introduction of GitOps, enterprise application teams struggled with complex deployment processes; environment/application configuration drift; as well as difficulty following best practices for security and disaster recovery. GitOps enables teams to define continuous deployment workflows and automation to simplify many of these tasks, which add significant value by increasing efficiency, security, creating better developer experiences, and allowing for faster deployments between code changes. + +Radius offers additional complementary value for enterprise application teams, including a rich, declarative application description which enables the Radius application graph. Radius also provides a clear separate of concerns via Recipes. However, without integration of Radius and GitOps, plus clear guidance on how to use the two together, enterprises already invested in GitOps don't know how to use both GitOps and Radius for a better together experience. + + + +## Key scenarios + + +### Scenario 1: As an operator I can install and configure Radius with my GitOps tool of choice + +My team uses GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in a git repository that serves as the source of truth for both the application code and infrastructure configuration. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. + +### Scenario 2: As an operator or developer, I can deploy and manage Radius Environments, Recipes, applications, and resources via my GitOps toolset + +With Radius installed, as an Operator, I now define, deploy and manage a Radius environment and associate relevant Recipes to that environment. + +As a Developer, my Operations counterpart has provided a Radius Environment plus Recipes to enable my Radius application development. I have familiarized myself with Radius and have experimented with deploying my application to this environment using the Recipe. After my app is deployed, I can view its graph in the Radius dashboard. Everything is working well for me so far and I really appreciate these cool Radius features! + +### Scenario 3: As a Site Reliability Engineer (SRE), I can patch and edit the configuration in my Kubernetes cluster through my GitOps toolset. + +As an SRE, my Operations counterpart has provided a Kubernetes cluster, a Radius Environment running on that cluster and the infrastructure resources through Radius Recipes required for the Radius Applications that will run in that Radius Environment. I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. I can do this through my GitOps toolset. + +### Stretch Goal: As an Operator and as a Developer, I can visualize the changes tracked by my GitOps toolset on my Radius Dashboard +We still need to flesh this out but the basic idea is the Radius dashboard can leverage the Notification Controller in Flux and ArgoCD to display to users inbound and outbound information of the source changes and cluster changes affecting their application and resources. + +## Key dependencies and risks + + + + +Dependencies - Flux, ArgoCD, future GitOps tools/platforms. + +The primary risk, per the question below, is whether an abstracted, generalizable extensibility model for GitOps is feasible. + +## Key assumptions to test and questions to answer + + +Our goal is to deliver a generalized model such that Flux, ArgoCD plus future GitOps technologies can all be integrated in a consistent manner. How feasible is that? We'll need to learn more about the GitOps tool landscape to determine feasibility. + +We as the Radius team understand that while Flux and ArgoCD might currently be the most popular options utilized by customers and while the current scenarios might reference Flux or ArgoCD, the solution we will build will be abstract enough to support all other high priority GitOps toolsets. We currently understand the GitOps workflow if abstracted in implementation usually follows this format: +1. Install GitOps tool onto Kubernetes cluster. +2. Add git repository information to GitOps tool +3. Deploy application to Kubernetes cluster +4. Done + +> The implementation settled on is generic enough to support GitOps on a wide level not narrowed down to just Flux or ArgoCD. + +## Current state +TBD + +# Scenario 1: Install and configure Radius using GitOps + +## Target customers + +**Infrastructure operators / administrators** - In enterprise applications teams that use GitOps, operators are responsible for configuring and managing the git repos for infrastructure and application configuration. They configure GitOps policies as well as the Kubernetes manifests, Helm charts, etc. that are applied to the application and the Kubernetes cluster. + +## Details of customer problem + +As an infrastructure operator in an enterprise applications team that has to maintain complex Kubernetes clusters and the platforms that run on them, I need to ensure that my team can deploy and manage applications and infrastructure in a consistent and reliable way. Thus, we make use of GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in a git repository that serves as the source of truth for both the application code and infrastructure configuration. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. + +## Desired customer experience outcome + +After this scenario is implemented, I can install and configure Radius with my GitOps tool of choice. I can include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. This ensures that I no longer have to manually manage the Radius instance I am providing for my developers and SREs to use, that there is an automated process to ensure that the deployed Radius instance matches the state specified I have specified in a repository. In other words, my experience with managing our Radius instance is consistent with how I normally use GitOps to define any other dependency in a Kubernetes environment. + +### Detailed Customer Experience + + +1. With my editor of choice (such as VSCode) open the GitOps (Flux or ArgoCD) Source File +1. Add installation of Radius as a Kubernetes cluster dependency for my desired Radius version. + 1. Similar the params in the Flux schema defined here: [Helm Charts | Flux (fluxcd.io)](https://fluxcd.io/flux/components/source/helmcharts/) +1. Commit the change to the git repo to submit my change and track it for future reference. +1. GitOps tool will detect this repo change and update the relevant Kubernetes clusters +1. I can validate the Radius installation through various GitOps or Kubernetes tools. For example, I can run `kubectl get namespace` and verify Radius appears as Kubernetes namespace. + 1. I will be able to validate Radius on my Flux dashboard or my rad CLI/GitOps CLI as well once I've created a Radius Environment or have Radius Applications running. + +> As an operator, I'm happy with this experience because it is completely consistent with how I normally use Flux or ArgoCD to define any other dependency in a Kubernetes environment. + +The syntax for adding Radius as a Kubernetes cluster dependency must be: +- Consistent with customer expectations for defining such a dependency +- Executable by Flux or ArgoCD (which based on current understanding should just work today). + +> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux, ArgoCD, or their GitOps tool of choice. + +## Key investments + +Based on current understanding, there should be no new features that need to be implemented in Radius for this scenario. The key investment is in testing the GitOps tooling to ensure that the syntax for adding Radius as a Kubernetes cluster dependency is consistent with customer expectations for defining such a dependency and is executable by Flux or ArgoCD. + +## Key dependencies and risks + + + + + +**Dependency: Flux, ArgoCD, etc.** - These GitOps tools/platforms must be able to detect the Radius dependency specified in the GitOps source file and update the relevant Kubernetes clusters accordingly to install Radius. + +**Risk: ability for GitOps tools to install and deploy Radius** - The primary risk is whether Flux, ArgoCD, etc. can install and deploy Radius as a Kubernetes cluster dependency. We will need to test this to ensure that it works as expected. + +## Key assumptions to test and questions to answer + + + +**Assumption:** The syntax for adding Radius as a Kubernetes cluster dependency must be consistent with customer expectations for defining such a dependency and must be executable by Flux or ArgoCD. We will need to test this to ensure that it works as expected. + +# Scenario 2: Deploy and manage Radius Environments, Recipes, Apps, Resources using GitOps + +## Target customers + +**Infrastructure operators / administrators** - In enterprise applications team that use GitOps, operators are responsible for configuring and managing the git repos for infrastructure and application configuration. They configure GitOps policies as well as the Kubernetes manifests, Helm charts, etc. that are applied to the application and the Kubernetes cluster. + +**Application developers** - Responsible for designing, developing and maintaining application code. Developers use GitOps by checking files into their git repositories to configure GitOps settings to auto deploy and monitor their application and dependent infrastructure. + +## Existing customer problem + +Since my organization uses GitOps to manage Kubernetes clusters and applications that run in those clusters, I need to remain consistent in infrastructure and application deployment even when using Radius. This means that I need to be able to apply Radius-centric concepts like Environments, Recipes, Applications, and Resources in a way that is consistent with my existing GitOps workflows. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. + +## Desired customer experience outcome + +With the implementation of this scenario, I can deploy and manage Radius Environments, their associated Recipes, Applications, and Resources via my existing GitOps (Flux and ArgoCD) toolsets. This means that I don't have to author pipelines or manually run Radius commands to deploy Radius-managed resources. Instead, I can leverage a git repo as the source of truth for my Radius-managed resources and use my existing GitOps tooling to deploy and manage those resources. This ensures that my experience with managing Radius resources is consistent with how I normally use GitOps to manage Kubernetes clusters and applications. + +> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux or ArgoCD. + +### Detailed Customer Experience + + + +**Step 1: Define and deploy Radius Environment and Recipes** + +As an operator, I need to define and deploy a Radius Environment and associate relevant Recipes to that environment so that my development team can build and deploy their Radius applications. + +1. Using my editor of choice, such as VSCode, I open an existing Radius environment file, `env.bicep`, from the online Radius samples repo. +1. I use Radius documentation to determine specific edits to my `env.bicep` file such as Kubernetes cluster and the appropriate cloud provider registrations. + 1. I will create a Radius Environment resource and add all the information listed above. + 1. I will then edit my Radius Environment resource to contain the properties required to register my Radius Recipe such as template path, template kind, etc. +1. I then push my new `env.bicep` file to my git repository, as I would any other file. +1. This triggers Flux or ArgoCD to detect the change, read the new `env.bicep` file, and deploy the Radius Environment as specified. +1. After Flux or ArgoCD initiates the Radius environment deployment, as I would normally do, I use Flux or ArgoCD to monitor deployment status and health status of the underlying Kubernetes cluster. + +> As an operator, while Radius presents some new concepts, those concepts fit cleanly within my existing GitOps workflows so Radius is relatively low overhead for me to adopt and to provide value to my development counterparts. + +**Step 2: Deploy and manage Radius Applications and Resources through GitOps toolset** + +Now, as a developer, I need to make some changes to my application code. Specifically, I need to change the Radius Resource for my frontend UI. My frontend resource is already running in my Radius application but I need update the code for parsing user input. First, I'll update the required file where this function lives then I'll update my Radius Resource as follows: + +1. With my editor of choice, such as VSCode, I open the existing `frontend` application file `helper.ts` and update my function `parseUsers` with the new required logic. +2. Then I push the changes to my git repository. +3. This triggers Flux/ArgoCD to detect the change and update the `frontend` resource with the new function. I use Flux/ArgoCD to monitor the health status of the Kubernetes cluster and immediately realize the pod associated with my `frontend` resource is failing. +4. I review my code change again and realize I created a bug in the function which is causing the resource to fail. +5. To resolve, I: + 1. Use the Flux/ArgoCD rollback commands to restart my `frontend` resource pods with the previous code version so as not to disrupt my customers. + 1. Proceed with a new code change to my `helper.ts` that resolves the bug, then push this change to my repository. +6. Flux/ArgoCD now deploys my bug fix and I can see in Flux/ArgoCD that the pod has restarted in a healthy state with my new code. + +> As a Developer, I'm very happy with this experience because I get the cool new features of Radius (like self-serve infrastructure deployment via Recipes and the App Graph/Dashboard) in addition to the features I already know and love in Flux, like health monitoring and rollback. + +__Requirements resulting from these scenarios:__ +- Flux/ArgoCD must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly +- Radius types and Bicep files must be able to be read by Flux/ArgoCD which currently does not have that ability. +- Flux/ArgoCD must be able to detect Radius resource types and execute Flux/ArgoCD commands against those types, including rollback *list of any other required Flux commands*. +- Based on user feedback this could be just testing that commands such as dry run mechanisms work + +## Key investments + + +### Feature 1: GitOps tooling can detect and understand Radius definition files + +GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file as required to deploy the Radius app correctly. + +GitOps tools (beginning with Flux and ArgoCD) must be able to detect changes to Radius applications and other resource type definitions in the git repo and execute the appropriate deployment, etc. commands against those types, including rollback operations. + +### Feature 2: GitOps tooling can execute Radius commands + +Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to execute deployments for the Radius resources defined in the git repo. + +### Feature 3: GitOps tooling can rollback Radius resources + +GitOps tools (beginning with Flux and ArgoCD) must be able to execute rollback operations for Radius resources defined in the git repo. + +## Key dependencies and risks + + + + + +**Dependency: Flux, ArgoCD, etc.** - These GitOps tools/platforms must be able to detect the Radius definition files specified in the git repo and update the relevant Kubernetes clusters accordingly to deploy and manage Radius resources. + +**Risk: ability for GitOps tools to deploy and manage Radius resources** - The primary risk is whether Flux, ArgoCD, etc. can deploy and manage Radius resources as specified in the git repo. We will need to test this to ensure that it works as expected. + +## Key assumptions to test and questions to answer + + +**Assumption:** GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the Bicep files as required to deploy the Radius app correctly. We will need to test this to ensure that it works as expected. + +# Scenario 3: Patch and edit the configuration in my Kubernetes cluster through my GitOps toolset + +## Target customers + +**Site Reliability Engineers (SREs)** - They leverage GitOps by checking in configuration files into their git repositories that add information for the customization of the workload, i.e. replica pod number, strategic merge solutions. + +## Existing customer problem + +As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset, but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts + +## Desired customer experience outcome + +As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1. To fix this as a SRE, I should have 2 options: + +**Option 1** + +1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through a YAML patch file that can contain the path to parameter annotations that effect the replica number of pods: + +Example: + +```yaml +apiVersion: kustomize... +kind: Component +resources: +- backend/mongoDB.yaml +patches: +- target: + group: core.opera.prod + version: 1 + kind: Application + name: backend + path: backend/replica-patch.yaml +``` + +2. This file points to replica-patch.yaml which contains parameters to change the replica number of pods to 10. +2. These 2 files are then committed to my configurational files to patch my Kubernetes cluster. + +> As an SRE, I'm very happy with this experience because I did not have to change anything in my normal workflow behavior and was able to leverage Flux and my normal YAML file workflow. + +**Option 2 Advanced SRE** + +1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through changing the parameter of replica sets that my MongoDB recipe takes in as defined in its Radius Environment resource in `env.bicep`: + +```bicep +resource env 'Applications.Core/environments@2023-10-01-preview' = { + name: 'prod' + properties: { + compute: { + kind: 'kubernetes' + resourceId: 'self' + namespace: 'default' + } + recipes: { + 'Applications.Datastores/mongoDB':{ + 'mongoDB-bicep': { + templateKind: 'bicep' + templatePath: 'https://ghcr.io/USERNAME/recipes/myrecipe:1.1.0' + // Optionally set parameters for all resources calling this Recipe + parameters: { + replicaSet: 10 + } + } + } + } + } +} +``` + +2. I can then commit this file to my repository and have Flux apply it to my Radius Environment. + +> As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux mechanisms which I’m familiar with to apply my patches to the Kubernetes cluster. + +_Requirements resulting from this scenario:_ +- Flux must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly +- Radius types and Bicep files must be able to be read by Flux which currently does not have that ability. + +## Key investments + + +### Feature 1 + + +### Feature 2 + + +### Feature 3 + + +## Key dependencies and risks + + + + + +## Key assumptions to test and questions to answer + + + +## Current state + \ No newline at end of file From ec9e3dcf64a761829df37e2149ded61e2db141fa Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Tue, 25 Jun 2024 17:06:47 -0700 Subject: [PATCH 02/12] spelling corrections Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- .github/config/en-custom.txt | 13 ++++++++++++- tools/2024-06-gitops-feature-spec.md | 11 +++++------ 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/.github/config/en-custom.txt b/.github/config/en-custom.txt index 2a061ec..5154294 100644 --- a/.github/config/en-custom.txt +++ b/.github/config/en-custom.txt @@ -662,4 +662,15 @@ envVariables EnvironmentVariables amongst ProviderSecret -SecretReference \ No newline at end of file +SecretReference +Tsai +willtsai +ArgoCD +GitOps +repos +SRE +SREs +toolset +toolsets +generalizable +fluxcd \ No newline at end of file diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index 3cb3c3a..c1a540a 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -5,7 +5,7 @@ ## Summary -Continuous deployment of cloud native applications and infrastructure is challenging for a number of reasons. GitOps is a popular set of practices, implemented as popular tools, like Flux and ArgoCD, that mitigates these challenges for enterprise application teams that use git for source control and Kubernetes for orchestration of software containers. GitOps provides a developer-centric experience by abstracting and automating many of the tasks required for deploying and operating Kubernetes and its underly dependent infrastructure. The core concept of GitOps is to rely on a git repository that serves as a single source of truth: i.e. it contains current declarative descriptions of the required infrastructure for a given production environment. It also contains a description of the workflow required to prevent drift between the repo and the production environment. It's like having cruise control for managing applications in production. +Continuous deployment of cloud native applications and infrastructure is challenging for a number of reasons. GitOps is a popular set of practices, implemented as popular tools, like Flux and ArgoCD, that mitigates these challenges for enterprise application teams that use git for source control and Kubernetes for orchestration of software containers. GitOps provides a developer-centric experience by abstracting and automating many of the tasks required for deploying and operating Kubernetes and its underlying dependent infrastructure. The core concept of GitOps is to rely on a git repository that serves as a single source of truth: i.e. it contains current declarative descriptions of the required infrastructure for a given production environment. It also contains a description of the workflow required to prevent drift between the repo and the production environment. It's like having cruise control for managing applications in production. Enterprises that use GitOps and also want to use Radius do not currently have a clear path for how to use both technologies in an integrated and complementary way to enable a “better together” experience. @@ -308,14 +308,13 @@ _Requirements resulting from this scenario:_ ## Key investments -### Feature 1 - - -### Feature 2 +### Feature 1: GitOps tooling can detect and understand Radius definition files +GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file as required to deploy the Radius app correctly. -### Feature 3 +### Feature 2: GitOps tooling can execute Radius commands +Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to execute deployments for the Radius resources defined in the git repo. ## Key dependencies and risks From 45e63e68107cdeeb674fe3a7bf50397b7a01d1c1 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Fri, 5 Jul 2024 14:08:04 -0700 Subject: [PATCH 03/12] update specs for patch scenario Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index c1a540a..d95dfc2 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -237,7 +237,7 @@ GitOps tools (beginning with Flux and ArgoCD) must be able to execute rollback o ## Existing customer problem -As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset, but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts +As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset via patching solutions like [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) in [Flux](https://fluxcd.io/flux/components/kustomize/ or [ArgoCD](https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/), but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts. ## Desired customer experience outcome @@ -268,6 +268,8 @@ patches: > As an SRE, I'm very happy with this experience because I did not have to change anything in my normal workflow behavior and was able to leverage Flux and my normal YAML file workflow. +> This is the current state for GitOps and Kustomize, thus should not require any new features to be implemented in Radius. + **Option 2 Advanced SRE** 1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through changing the parameter of replica sets that my MongoDB recipe takes in as defined in its Radius Environment resource in `env.bicep`: @@ -299,11 +301,11 @@ resource env 'Applications.Core/environments@2023-10-01-preview' = { 2. I can then commit this file to my repository and have Flux apply it to my Radius Environment. -> As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux mechanisms which I’m familiar with to apply my patches to the Kubernetes cluster. +> As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux mechanisms which I'm familiar with to apply my patches to the Kubernetes cluster. _Requirements resulting from this scenario:_ -- Flux must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly -- Radius types and Bicep files must be able to be read by Flux which currently does not have that ability. +- Flux and ArgoCD must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly +- Radius types and Bicep files must be able to be read by Flux/ArgoCD which currently does not have that ability. ## Key investments @@ -316,15 +318,28 @@ GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file ch Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to execute deployments for the Radius resources defined in the git repo. +### Feature 3: The GitOps controllers must be aware of the patches applied to the Kubernetes cluster + +The GitOps controllers must be aware of the patches applied to the Kubernetes cluster and must be able to honor those patched configurations to avoid conflicts with the Radius resource definitions that have been checked in. This should be possible by leveraging the existing Kustomize mechanisms in Flux ([Kustomize controller](https://fluxcd.io/flux/components/kustomize/)) and ArgoCD ([Application controller](https://argo-cd.readthedocs.io/en/stable/developer-guide/architecture/components/#application-controller)). + ## Key dependencies and risks +**Dependency: Flux, ArgoCD, etc.** - These GitOps tools/platforms must be able to detect the Radius definition files specified in the git repo and update the relevant Kubernetes clusters accordingly to deploy and manage Radius resources. + +**Risk: ability for GitOps tools to deploy and manage Radius resources** - The primary risk is whether Flux, ArgoCD, etc. can deploy and manage Radius resources as specified in the git repo. We will need to test this to ensure that it works as expected. + ## Key assumptions to test and questions to answer +**Assumption:** The Flux and ArgoCD controllers can be made aware of the originally checked in Radius resource definitions and can appropriately apply patches to the Kubernetes cluster without conflicts. We will need to test this to ensure that it works as expected. + +**Assumption:** Patch configurations can be built into the Radius resource defintion schemas and can be applied by Flux and ArgoCD controllers. We will need to test this to ensure that it works as expected. + ## Current state - \ No newline at end of file + +N/A \ No newline at end of file From 697e59874121fb85106fabb53978de51aae59e55 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Tue, 16 Jul 2024 11:05:20 -0700 Subject: [PATCH 04/12] updates from last meeting Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- .github/config/en-custom.txt | 4 +- tools/2024-06-gitops-feature-spec.md | 67 +++++++++++++++++----------- 2 files changed, 45 insertions(+), 26 deletions(-) diff --git a/.github/config/en-custom.txt b/.github/config/en-custom.txt index 5154294..e5da36f 100644 --- a/.github/config/en-custom.txt +++ b/.github/config/en-custom.txt @@ -673,4 +673,6 @@ SREs toolset toolsets generalizable -fluxcd \ No newline at end of file +fluxcd +Kustomize +kustomize \ No newline at end of file diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index d95dfc2..97d2cb9 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -5,10 +5,12 @@ ## Summary -Continuous deployment of cloud native applications and infrastructure is challenging for a number of reasons. GitOps is a popular set of practices, implemented as popular tools, like Flux and ArgoCD, that mitigates these challenges for enterprise application teams that use git for source control and Kubernetes for orchestration of software containers. GitOps provides a developer-centric experience by abstracting and automating many of the tasks required for deploying and operating Kubernetes and its underlying dependent infrastructure. The core concept of GitOps is to rely on a git repository that serves as a single source of truth: i.e. it contains current declarative descriptions of the required infrastructure for a given production environment. It also contains a description of the workflow required to prevent drift between the repo and the production environment. It's like having cruise control for managing applications in production. +Continuous deployment of cloud native applications and infrastructure is challenging for a number of reasons. GitOps is a popular set of practices, implemented as popular tools, like Flux and ArgoCD, that mitigates these challenges for enterprise application teams that use git for source control and Kubernetes for orchestration of software containers. GitOps provides a developer-centric experience by abstracting and automating many of the tasks required for deploying and operating Kubernetes and its underlying dependent infrastructure. The core concept of GitOps is to rely on a git repository that serves as a single source of truth: i.e. it contains current declarative descriptions of the required infrastructure for a given production environment. It also contains a description of the workflow required to prevent drift between the repo and the production environment. It's like having cruise control for managing applications in production without needing to write custom scripts or maintain complex CD pipelines. Enterprises that use GitOps and also want to use Radius do not currently have a clear path for how to use both technologies in an integrated and complementary way to enable a “better together” experience. +>It's important to note that GitOps mechanisms don't manage the CI (e.g. image build and app containerization) but they do manage the CD (e.g. deployment of the app to the cluster) + ### Top level goals - Enable enterprises to get the best of both GitOps and Radius for a better-together experience. @@ -18,6 +20,7 @@ Enterprises that use GitOps and also want to use Radius do not currently have a - We are not trying to replace GitOps tools or platforms. We are trying to integrate Radius with existing GitOps tools and platforms. - We are not trying to replace GitOps practices. We are trying to enable Radius to work well with GitOps practices. +- We are not implementing dashboard visualizations for GitOps changes (i.e. as an Operator and as a Developer, I can visualize the changes tracked by my GitOps toolset on my Radius Dashboard) as a part of the initial scope of this project. We still need to flesh this out but the basic idea is the Radius dashboard can leverage the Notification Controller in Flux and ArgoCD to display to users inbound and outbound information of the source changes and cluster changes affecting their application and resources. In a future iteration of Radius GitOps integration, we will explore options and potential to leverage plugins in the Backstage ecosystem to provide this functionality. ## Customer profile and challenges @@ -32,10 +35,12 @@ Enterprises that use GitOps and also want to use Radius do not currently have a ### Challenge(s) faced by the customer -Before the introduction of GitOps, enterprise application teams struggled with complex deployment processes; environment/application configuration drift; as well as difficulty following best practices for security and disaster recovery. GitOps enables teams to define continuous deployment workflows and automation to simplify many of these tasks, which add significant value by increasing efficiency, security, creating better developer experiences, and allowing for faster deployments between code changes. +Before the introduction of GitOps, enterprise application teams struggled with complex deployment processes; environment/application configuration drift; as well as difficulty following best practices for security and disaster recovery. GitOps enables teams to define continuous deployment workflows and automation without having to write custom scripts or maintain complex CI/CD pipelines. GitOps tools simplify and perform many of these tasks, which add significant value by increasing efficiency, security, creating better developer experiences, and allowing for faster deployments between code changes. Radius offers additional complementary value for enterprise application teams, including a rich, declarative application description which enables the Radius application graph. Radius also provides a clear separate of concerns via Recipes. However, without integration of Radius and GitOps, plus clear guidance on how to use the two together, enterprises already invested in GitOps don't know how to use both GitOps and Radius for a better together experience. +Additionally, in its current state GitOps works well within the Kubernetes sphere of influence but can't be applied to things outside of Kubernetes. This is a limitation for enterprises that want to use GitOps to manage their entire application stack, including things like databases, storage, and other infrastructure. To get the full benefit of GitOps, a user needs to combine things that can be controlled with GitOps with things that can't. Radius + Recipes can help here. + ## Key scenarios @@ -45,9 +50,9 @@ Radius offers additional complementary value for enterprise application teams, i My team uses GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in a git repository that serves as the source of truth for both the application code and infrastructure configuration. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. -### Scenario 2: As an operator or developer, I can deploy and manage Radius Environments, Recipes, applications, and resources via my GitOps toolset +### Scenario 2: As an operator or developer, I can deploy and manage Radius Environments, Recipes, applications, resources, credentials, etc. via my GitOps toolset -With Radius installed, as an Operator, I now define, deploy and manage a Radius environment and associate relevant Recipes to that environment. +With Radius installed, as an Operator, I now define, deploy and manage a Radius environment and associate relevant Recipes, Credentials, and other Radius objects to that environment. As a Developer, my Operations counterpart has provided a Radius Environment plus Recipes to enable my Radius application development. I have familiarized myself with Radius and have experimented with deploying my application to this environment using the Recipe. After my app is deployed, I can view its graph in the Radius dashboard. Everything is working well for me so far and I really appreciate these cool Radius features! @@ -55,9 +60,6 @@ As a Developer, my Operations counterpart has provided a Radius Environment plus As an SRE, my Operations counterpart has provided a Kubernetes cluster, a Radius Environment running on that cluster and the infrastructure resources through Radius Recipes required for the Radius Applications that will run in that Radius Environment. I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. I can do this through my GitOps toolset. -### Stretch Goal: As an Operator and as a Developer, I can visualize the changes tracked by my GitOps toolset on my Radius Dashboard -We still need to flesh this out but the basic idea is the Radius dashboard can leverage the Notification Controller in Flux and ArgoCD to display to users inbound and outbound information of the source changes and cluster changes affecting their application and resources. - ## Key dependencies and risks @@ -65,7 +67,7 @@ We still need to flesh this out but the basic idea is the Radius dashboard can l Dependencies - Flux, ArgoCD, future GitOps tools/platforms. -The primary risk, per the question below, is whether an abstracted, generalizable extensibility model for GitOps is feasible. +The primary risk, per the question below, is whether an abstracted, generalizable extensibility model for GitOps is feasible. Preliminary investigation tells us that it is feasible via a Kubernetes controller. ## Key assumptions to test and questions to answer @@ -81,7 +83,7 @@ We as the Radius team understand that while Flux and ArgoCD might currently be t > The implementation settled on is generic enough to support GitOps on a wide level not narrowed down to just Flux or ArgoCD. ## Current state -TBD +N/A # Scenario 1: Install and configure Radius using GitOps @@ -103,7 +105,7 @@ After this scenario is implemented, I can install and configure Radius with my G Step 2 … --> 1. With my editor of choice (such as VSCode) open the GitOps (Flux or ArgoCD) Source File -1. Add installation of Radius as a Kubernetes cluster dependency for my desired Radius version. +1. Add installation of Radius as a Kubernetes cluster dependency for my desired Radius version (committing a new version number in the repo triggers an upgrade of the Radius control plane). 1. Similar the params in the Flux schema defined here: [Helm Charts | Flux (fluxcd.io)](https://fluxcd.io/flux/components/source/helmcharts/) 1. Commit the change to the git repo to submit my change and track it for future reference. 1. GitOps tool will detect this repo change and update the relevant Kubernetes clusters @@ -120,7 +122,7 @@ The syntax for adding Radius as a Kubernetes cluster dependency must be: ## Key investments -Based on current understanding, there should be no new features that need to be implemented in Radius for this scenario. The key investment is in testing the GitOps tooling to ensure that the syntax for adding Radius as a Kubernetes cluster dependency is consistent with customer expectations for defining such a dependency and is executable by Flux or ArgoCD. +Based on current understanding, there should be no new features that need to be implemented in Radius for this scenario. The key investment is in testing the GitOps tooling to ensure that the syntax for adding Radius as a Kubernetes cluster dependency is consistent with customer expectations for defining such a dependency and is executable by Flux or ArgoCD. We need to validate that our Helm Charts are compatible with Flux and ArgoCD. ## Key dependencies and risks @@ -154,7 +156,7 @@ Since my organization uses GitOps to manage Kubernetes clusters and applications With the implementation of this scenario, I can deploy and manage Radius Environments, their associated Recipes, Applications, and Resources via my existing GitOps (Flux and ArgoCD) toolsets. This means that I don't have to author pipelines or manually run Radius commands to deploy Radius-managed resources. Instead, I can leverage a git repo as the source of truth for my Radius-managed resources and use my existing GitOps tooling to deploy and manage those resources. This ensures that my experience with managing Radius resources is consistent with how I normally use GitOps to manage Kubernetes clusters and applications. -> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux or ArgoCD. +> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux or ArgoCD. However, note that we're combining two different but existing user experiences (i.e. GitOps `git push` + Radius `app.bicep`). ### Detailed Customer Experience @@ -164,10 +166,10 @@ Step 2 **Step 1: Define and deploy Radius Environment and Recipes** -As an operator, I need to define and deploy a Radius Environment and associate relevant Recipes to that environment so that my development team can build and deploy their Radius applications. +As an operator, I need to define and deploy a Radius Environment and associate relevant Recipes, Credentials, etc. to that environment so that my development team can build and deploy their Radius applications. 1. Using my editor of choice, such as VSCode, I open an existing Radius environment file, `env.bicep`, from the online Radius samples repo. -1. I use Radius documentation to determine specific edits to my `env.bicep` file such as Kubernetes cluster and the appropriate cloud provider registrations. +1. I use Radius documentation to determine specific edits to my `env.bicep` file such as Kubernetes cluster and the appropriate cloud provider registrations and credentials. 1. I will create a Radius Environment resource and add all the information listed above. 1. I will then edit my Radius Environment resource to contain the properties required to register my Radius Recipe such as template path, template kind, etc. 1. I then push my new `env.bicep` file to my git repository, as I would any other file. @@ -181,18 +183,19 @@ As an operator, I need to define and deploy a Radius Environment and associate r Now, as a developer, I need to make some changes to my application code. Specifically, I need to change the Radius Resource for my frontend UI. My frontend resource is already running in my Radius application but I need update the code for parsing user input. First, I'll update the required file where this function lives then I'll update my Radius Resource as follows: 1. With my editor of choice, such as VSCode, I open the existing `frontend` application file `helper.ts` and update my function `parseUsers` with the new required logic. -2. Then I push the changes to my git repository. -3. This triggers Flux/ArgoCD to detect the change and update the `frontend` resource with the new function. I use Flux/ArgoCD to monitor the health status of the Kubernetes cluster and immediately realize the pod associated with my `frontend` resource is failing. +2. Then I push the changes to my git repository, which triggers my CI pipeline to rebuild and containerize my application. Note that the CI is outside of the scope of GitOps and this scenario. +3. This triggers Flux/ArgoCD to detect the updated application image and update the `frontend` resource with the new function. I use ArgoCD to monitor the health status of the Kubernetes cluster and immediately realize the pod associated with my `frontend` resource is failing. 4. I review my code change again and realize I created a bug in the function which is causing the resource to fail. 5. To resolve, I: - 1. Use the Flux/ArgoCD rollback commands to restart my `frontend` resource pods with the previous code version so as not to disrupt my customers. + 1. Use the ArgoCD [rollback commands](https://argo-cd.readthedocs.io/en/release-2.7/user-guide/commands/argocd_app_rollback/) to restart my `frontend` resource pods with the previous code version so as not to disrupt my customers OR I use `git revert` to revert my code change and `git push` the revert to my git repository which then gets detected by my CI to rebuild the image and then gets redeployed by Flux. 1. Proceed with a new code change to my `helper.ts` that resolves the bug, then push this change to my repository. -6. Flux/ArgoCD now deploys my bug fix and I can see in Flux/ArgoCD that the pod has restarted in a healthy state with my new code. +6. Flux/ArgoCD now redeploys my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted in a healthy state with my new code. > As a Developer, I'm very happy with this experience because I get the cool new features of Radius (like self-serve infrastructure deployment via Recipes and the App Graph/Dashboard) in addition to the features I already know and love in Flux, like health monitoring and rollback. __Requirements resulting from these scenarios:__ - Flux/ArgoCD must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly +- Flux/ArgoCD must be able to detect changes to the `.bicepconfig` file and execute bicep config updates - Radius types and Bicep files must be able to be read by Flux/ArgoCD which currently does not have that ability. - Flux/ArgoCD must be able to detect Radius resource types and execute Flux/ArgoCD commands against those types, including rollback *list of any other required Flux commands*. - Based on user feedback this could be just testing that commands such as dry run mechanisms work @@ -202,7 +205,7 @@ __Requirements resulting from these scenarios:__ ### Feature 1: GitOps tooling can detect and understand Radius definition files -GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file as required to deploy the Radius app correctly. +GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file (or any `*.bicep` and `.bicepconfig` files) as required to deploy the Radius app correctly. GitOps tools (beginning with Flux and ArgoCD) must be able to detect changes to Radius applications and other resource type definitions in the git repo and execute the appropriate deployment, etc. commands against those types, including rollback operations. @@ -214,6 +217,10 @@ Once Radius definition file changes are detected and understood, GitOps tools (b GitOps tools (beginning with Flux and ArgoCD) must be able to execute rollback operations for Radius resources defined in the git repo. +#### Feature 4: GitOps tooling can detect and execute resource deletions + +If a resource in a `.bicep` Radius definition file is deleted and the change is pushed to the repo, the GitOps tools (Flux and ArgoCD) should detect the change and trigger Radius to delete that resource. Note that this is not a behavior of bicep today but it's important for this usecase. + ## Key dependencies and risks @@ -243,7 +250,7 @@ As an SRE, I am expected to make adjustments to the configuration of the Kuberne As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1. To fix this as a SRE, I should have 2 options: -**Option 1** +**Option 1: patch using current Kubernetes mechanisms** 1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through a YAML patch file that can contain the path to parameter annotations that effect the replica number of pods: @@ -264,13 +271,13 @@ patches: ``` 2. This file points to replica-patch.yaml which contains parameters to change the replica number of pods to 10. -2. These 2 files are then committed to my configurational files to patch my Kubernetes cluster. +2. These 2 files are then committed to my configuration files to patch my Kubernetes cluster. Note that this means SREs can continue to patch resources that may have been previously deployed by Radius, using their current Kubernetes patch mechanisms (e.g. Kustomize). > As an SRE, I'm very happy with this experience because I did not have to change anything in my normal workflow behavior and was able to leverage Flux and my normal YAML file workflow. > This is the current state for GitOps and Kustomize, thus should not require any new features to be implemented in Radius. -**Option 2 Advanced SRE** +**Option 2: patch via modification of Radius definitions** 1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through changing the parameter of replica sets that my MongoDB recipe takes in as defined in its Radius Environment resource in `env.bicep`: @@ -312,7 +319,7 @@ _Requirements resulting from this scenario:_ ### Feature 1: GitOps tooling can detect and understand Radius definition files -GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file as required to deploy the Radius app correctly. +GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` (or any `.bicep`) file as required to deploy the Radius app correctly. ### Feature 2: GitOps tooling can execute Radius commands @@ -338,8 +345,18 @@ The GitOps controllers must be aware of the patches applied to the Kubernetes cl **Assumption:** The Flux and ArgoCD controllers can be made aware of the originally checked in Radius resource definitions and can appropriately apply patches to the Kubernetes cluster without conflicts. We will need to test this to ensure that it works as expected. -**Assumption:** Patch configurations can be built into the Radius resource defintion schemas and can be applied by Flux and ArgoCD controllers. We will need to test this to ensure that it works as expected. +**Assumption:** Patch configurations can be built into the Radius resource definition schemas and can be applied by Flux and ArgoCD controllers. We will need to test this to ensure that it works as expected. ## Current state -N/A \ No newline at end of file +N/A + +## Notes + +### July 9, 2024 +- DONE: More specifics on the value of GitOps as well as limitations to Kubernetes +- DONE: Clarify that can use GitOps to manage other Environment objects outside of Recipes +- DONE: Move dashboard features to out of scope +- DONE: Add resource deletion feature to Scenario 2 +- DONE Clarify the scope of gitops: it doesn't manage the CI (e.g. image build and app containerization) but it does manage the CD (e.g. deployment of the app to the cluster) +- DONE: Track disaster recovery scenarios as a separate feature request: DONE - https://github.com/radius-project/radius/issues/7732 \ No newline at end of file From e8eab953199cc3bf5c379ce400c6129133c3b13d Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Thu, 18 Jul 2024 14:54:51 -0700 Subject: [PATCH 05/12] updates from last meeting Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index 97d2cb9..7e27de9 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -93,7 +93,7 @@ N/A ## Details of customer problem -As an infrastructure operator in an enterprise applications team that has to maintain complex Kubernetes clusters and the platforms that run on them, I need to ensure that my team can deploy and manage applications and infrastructure in a consistent and reliable way. Thus, we make use of GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in a git repository that serves as the source of truth for both the application code and infrastructure configuration. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. +As an infrastructure operator in an enterprise applications team that has to maintain complex Kubernetes clusters and the platforms that run on them, I need to ensure that my team can deploy and manage applications and infrastructure in a consistent and reliable way. Thus, we make use of GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in git repositories that serve as the source of truth for both the application code and infrastructure configuration. Most commonly, infrastructure configuration and application code are house in separate repos. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. ## Desired customer experience outcome @@ -173,7 +173,7 @@ As an operator, I need to define and deploy a Radius Environment and associate r 1. I will create a Radius Environment resource and add all the information listed above. 1. I will then edit my Radius Environment resource to contain the properties required to register my Radius Recipe such as template path, template kind, etc. 1. I then push my new `env.bicep` file to my git repository, as I would any other file. -1. This triggers Flux or ArgoCD to detect the change, read the new `env.bicep` file, and deploy the Radius Environment as specified. +1. This triggers Flux or ArgoCD to detect the change, read the new `env.bicep` file, and trigger Radius to deploy the Environment as specified. 1. After Flux or ArgoCD initiates the Radius environment deployment, as I would normally do, I use Flux or ArgoCD to monitor deployment status and health status of the underlying Kubernetes cluster. > As an operator, while Radius presents some new concepts, those concepts fit cleanly within my existing GitOps workflows so Radius is relatively low overhead for me to adopt and to provide value to my development counterparts. @@ -189,7 +189,7 @@ Now, as a developer, I need to make some changes to my application code. Specifi 5. To resolve, I: 1. Use the ArgoCD [rollback commands](https://argo-cd.readthedocs.io/en/release-2.7/user-guide/commands/argocd_app_rollback/) to restart my `frontend` resource pods with the previous code version so as not to disrupt my customers OR I use `git revert` to revert my code change and `git push` the revert to my git repository which then gets detected by my CI to rebuild the image and then gets redeployed by Flux. 1. Proceed with a new code change to my `helper.ts` that resolves the bug, then push this change to my repository. -6. Flux/ArgoCD now redeploys my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted in a healthy state with my new code. +6. Flux/ArgoCD now triggers a redeployment of my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted in a healthy state with my new code. > As a Developer, I'm very happy with this experience because I get the cool new features of Radius (like self-serve infrastructure deployment via Recipes and the App Graph/Dashboard) in addition to the features I already know and love in Flux, like health monitoring and rollback. @@ -353,6 +353,10 @@ N/A ## Notes +### July 16, 2024 +- DONE: address some nit comments regarding accuracy of the scenarios with regards to the role of Flux/GitOps in deployments. +- TODO: drill deeper into Scenario 3 to specify whether we should support "patching the inputs" or "patching the outputs" + ### July 9, 2024 - DONE: More specifics on the value of GitOps as well as limitations to Kubernetes - DONE: Clarify that can use GitOps to manage other Environment objects outside of Recipes From 187b6595d4f7281f5bb6e131458ba54d62eaa75a Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Thu, 18 Jul 2024 14:55:58 -0700 Subject: [PATCH 06/12] spelling overrides Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- .github/config/en-custom.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/config/en-custom.txt b/.github/config/en-custom.txt index e5da36f..6a7e414 100644 --- a/.github/config/en-custom.txt +++ b/.github/config/en-custom.txt @@ -675,4 +675,6 @@ toolsets generalizable fluxcd Kustomize -kustomize \ No newline at end of file +kustomize +usecase +gitops \ No newline at end of file From 5c2a77f3e3fb1ce69825ce258d83f107813c2668 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Thu, 18 Jul 2024 14:57:48 -0700 Subject: [PATCH 07/12] small formatting update Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index 7e27de9..353ffca 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -244,7 +244,7 @@ If a resource in a `.bicep` Radius definition file is deleted and the change is ## Existing customer problem -As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset via patching solutions like [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) in [Flux](https://fluxcd.io/flux/components/kustomize/ or [ArgoCD](https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/), but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts. +As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset via patching solutions like [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) in [Flux](https://fluxcd.io/flux/components/kustomize/) or [ArgoCD](https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/), but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts. ## Desired customer experience outcome From 0d8a70b0b2fa8d1e80cd6e5d35c502581a8cc826 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Fri, 19 Jul 2024 08:53:57 -0700 Subject: [PATCH 08/12] more content Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 103 ++++++++++++--------------- 1 file changed, 46 insertions(+), 57 deletions(-) diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index 353ffca..d9e4c64 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -244,74 +244,67 @@ If a resource in a `.bicep` Radius definition file is deleted and the change is ## Existing customer problem -As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset via patching solutions like [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) in [Flux](https://fluxcd.io/flux/components/kustomize/) or [ArgoCD](https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/), but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts. +As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset via patching solutions like [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) in [Flux](https://fluxcd.io/flux/components/kustomize/) or [ArgoCD](https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/), but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts. Additionally, access to modify the Radius Applications and Resources is not granted to me as an SRE, so I have to rely on my Operations counterpart to make these changes for me, which is a significant bottleneck in my workflow. ## Desired customer experience outcome -As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1. To fix this as a SRE, I should have 2 options: - -**Option 1: patch using current Kubernetes mechanisms** - -1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through a YAML patch file that can contain the path to parameter annotations that effect the replica number of pods: - -Example: - -```yaml -apiVersion: kustomize... -kind: Component -resources: -- backend/mongoDB.yaml -patches: -- target: - group: core.opera.prod - version: 1 - kind: Application - name: backend - path: backend/replica-patch.yaml -``` - -2. This file points to replica-patch.yaml which contains parameters to change the replica number of pods to 10. -2. These 2 files are then committed to my configuration files to patch my Kubernetes cluster. Note that this means SREs can continue to patch resources that may have been previously deployed by Radius, using their current Kubernetes patch mechanisms (e.g. Kustomize). +As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1 in the current [Bicep parameters](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/parameter-files?tabs=Bicep) file created by the operator or developer called `app.bicepparams`: -> As an SRE, I'm very happy with this experience because I did not have to change anything in my normal workflow behavior and was able to leverage Flux and my normal YAML file workflow. +````bicep +using './app.bicep' -> This is the current state for GitOps and Kustomize, thus should not require any new features to be implemented in Radius. +param replicaCount = 1 +```` -**Option 2: patch via modification of Radius definitions** - -1. Using my editor of choice, VSCode, I'll make changes to the replica number for the pods through changing the parameter of replica sets that my MongoDB recipe takes in as defined in its Radius Environment resource in `env.bicep`: +The above parameter is defined and used in the `app.bicep` file as follows: ```bicep -resource env 'Applications.Core/environments@2023-10-01-preview' = { - name: 'prod' - properties: { - compute: { - kind: 'kubernetes' - resourceId: 'self' - namespace: 'default' - } - recipes: { - 'Applications.Datastores/mongoDB':{ - 'mongoDB-bicep': { - templateKind: 'bicep' - templatePath: 'https://ghcr.io/USERNAME/recipes/myrecipe:1.1.0' - // Optionally set parameters for all resources calling this Recipe - parameters: { - replicaSet: 10 - } - } +import radius as radius + +@description('The app ID of your Radius Application. Set automatically by the rad CLI.') +param application string +param replicaCount int + +resource demo 'Applications.Core/containers@2023-10-01-preview' = { + name: 'demo' + properties: { + application: application + container: { + image: 'ghcr.io/radius-project/samples/demo:latest' + ports: { + web: { + containerPort: 3000 } } + extensions: [ + { + kind: 'manualScaling' + replicas: replicaCount + } + ] } + } } ``` -2. I can then commit this file to my repository and have Flux apply it to my Radius Environment. +To fix this as a SRE, I can: + +1. Open the `app.bicepparams` file in my editor of choice and change the `replicaCount` parameter to 10. + +````bicep +using './app.bicep' + +param replicaCount = 10 +```` + +2. Commit and push the change to the git repository. + +2. Flux/ArgoCD detects the change and triggers Radius to redeploy the `demo` container resources with the new replica count. > As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux mechanisms which I'm familiar with to apply my patches to the Kubernetes cluster. _Requirements resulting from this scenario:_ -- Flux and ArgoCD must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly +- Flux and ArgoCD must be able to detect the file change described above and must be able to read the `app.bicep` and `app.bicepparams` files as required to deploy the Radius app correctly - Radius types and Bicep files must be able to be read by Flux/ArgoCD which currently does not have that ability. ## Key investments @@ -319,15 +312,11 @@ _Requirements resulting from this scenario:_ ### Feature 1: GitOps tooling can detect and understand Radius definition files -GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` (or any `.bicep`) file as required to deploy the Radius app correctly. - -### Feature 2: GitOps tooling can execute Radius commands - -Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to execute deployments for the Radius resources defined in the git repo. +GitOps tools (beginning with Flux and ArgoCD) must be able to read and detect the `*.bicep` and `*.bicepparams` file changes in the git repo described above as required to trigger Radius to deploy resources correctly. -### Feature 3: The GitOps controllers must be aware of the patches applied to the Kubernetes cluster +### Feature 2: GitOps tooling can trigger Radius commands -The GitOps controllers must be aware of the patches applied to the Kubernetes cluster and must be able to honor those patched configurations to avoid conflicts with the Radius resource definitions that have been checked in. This should be possible by leveraging the existing Kustomize mechanisms in Flux ([Kustomize controller](https://fluxcd.io/flux/components/kustomize/)) and ArgoCD ([Application controller](https://argo-cd.readthedocs.io/en/stable/developer-guide/architecture/components/#application-controller)). +Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to trigger Radius to execute deployments for the Radius resources defined in the git repo. ## Key dependencies and risks From 6df58c367ee3341700503edd84174c8aa823174b Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Fri, 19 Jul 2024 17:12:56 -0700 Subject: [PATCH 09/12] clarify scenario 3 Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 85 +++++++++++++++++----------- 1 file changed, 52 insertions(+), 33 deletions(-) diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index d9e4c64..8547083 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -48,17 +48,17 @@ Additionally, in its current state GitOps works well within the Kubernetes spher ### Scenario 1: As an operator I can install and configure Radius with my GitOps tool of choice -My team uses GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in a git repository that serves as the source of truth for both the application code and infrastructure configuration. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. +My team uses GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in git repositories that serve as the sources of truth for both the application code and infrastructure configuration. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. ### Scenario 2: As an operator or developer, I can deploy and manage Radius Environments, Recipes, applications, resources, credentials, etc. via my GitOps toolset -With Radius installed, as an Operator, I now define, deploy and manage a Radius environment and associate relevant Recipes, Credentials, and other Radius objects to that environment. +With Radius installed, as an Operator, I now define, deploy and manage a Radius environment and associate relevant Recipes, Credentials, and other Radius objects to that environment. To accomplish this, I just need to push my Radius Environment, Recipe, and other files into my git repository. Flux or ArgoCD will detect the changes and trigger Radius to deploy the Environment as specified. -As a Developer, my Operations counterpart has provided a Radius Environment plus Recipes to enable my Radius application development. I have familiarized myself with Radius and have experimented with deploying my application to this environment using the Recipe. After my app is deployed, I can view its graph in the Radius dashboard. Everything is working well for me so far and I really appreciate these cool Radius features! +As a Developer, my Operations counterpart has provided a Radius Environment plus Recipes to enable my Radius application development. After I have familiarized myself with Radius, I push my application definition files into my git repository, where Flux or ArgoCD will detect the changes and trigger Radius to deploy the application to this environment using the Recipes. After my app is deployed, I can view its graph in the Radius dashboard. Everything is working well for me so far and I really appreciate these cool Radius features! -### Scenario 3: As a Site Reliability Engineer (SRE), I can patch and edit the configuration in my Kubernetes cluster through my GitOps toolset. +### Scenario 3: As a Site Reliability Engineer (SRE), I can edit and customize the configurations in the Kubernetes cluster through my GitOps toolset, even after the application has been previously deployed. -As an SRE, my Operations counterpart has provided a Kubernetes cluster, a Radius Environment running on that cluster and the infrastructure resources through Radius Recipes required for the Radius Applications that will run in that Radius Environment. I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. I can do this through my GitOps toolset. +As an SRE, my Operations counterpart has provided a Kubernetes cluster, a Radius Environment running on that cluster and the infrastructure resources through Radius Recipes required for the Radius Applications that will run in that Radius Environment. I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. I can do this through my GitOps toolset by updating the Bicep parameter files and pushing them into the git repository. Flux or ArgoCD will detect the changes and trigger Radius to redeploy the resources with the new configurations. ## Key dependencies and risks @@ -93,7 +93,7 @@ N/A ## Details of customer problem -As an infrastructure operator in an enterprise applications team that has to maintain complex Kubernetes clusters and the platforms that run on them, I need to ensure that my team can deploy and manage applications and infrastructure in a consistent and reliable way. Thus, we make use of GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in git repositories that serve as the source of truth for both the application code and infrastructure configuration. Most commonly, infrastructure configuration and application code are house in separate repos. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. +As an infrastructure operator in an enterprise applications team that has to maintain complex Kubernetes clusters and the platforms that run on them, I need to ensure that my team can deploy and manage applications and infrastructure in a consistent and reliable way. Thus, we make use of GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in git repositories that serve as the source of truth for both the application code and infrastructure configuration. Most commonly, infrastructure configuration and application code are housed in separate repos. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. ## Desired customer experience outcome @@ -132,7 +132,7 @@ Based on current understanding, there should be no new features that need to be **Dependency: Flux, ArgoCD, etc.** - These GitOps tools/platforms must be able to detect the Radius dependency specified in the GitOps source file and update the relevant Kubernetes clusters accordingly to install Radius. -**Risk: ability for GitOps tools to install and deploy Radius** - The primary risk is whether Flux, ArgoCD, etc. can install and deploy Radius as a Kubernetes cluster dependency. We will need to test this to ensure that it works as expected. +**Risk: ability for GitOps tools to install and deploy Radius** - The primary risk is whether Flux, ArgoCD, etc. can install and deploy Radius as a Kubernetes cluster dependency. We will need to test this to ensure that it works as expected by validating that Radius Helm Charts are compatible with Flux and ArgoCD. ## Key assumptions to test and questions to answer @@ -154,9 +154,9 @@ Since my organization uses GitOps to manage Kubernetes clusters and applications ## Desired customer experience outcome -With the implementation of this scenario, I can deploy and manage Radius Environments, their associated Recipes, Applications, and Resources via my existing GitOps (Flux and ArgoCD) toolsets. This means that I don't have to author pipelines or manually run Radius commands to deploy Radius-managed resources. Instead, I can leverage a git repo as the source of truth for my Radius-managed resources and use my existing GitOps tooling to deploy and manage those resources. This ensures that my experience with managing Radius resources is consistent with how I normally use GitOps to manage Kubernetes clusters and applications. +With the implementation of this scenario, I can deploy and manage Radius Environments, their associated Recipes, Applications, Credentials, and Resources via my existing GitOps (Flux and ArgoCD) toolsets. This means that I don't have to author pipelines or manually run Radius commands to deploy Radius-managed resources. Instead, I can leverage a git repo as the source of truth for my Radius-managed resources and use my existing GitOps tooling to deploy and manage those resources. This ensures that my experience with managing Radius resources is consistent with how I normally use GitOps to manage Kubernetes clusters and applications. -> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux or ArgoCD. However, note that we're combining two different but existing user experiences (i.e. GitOps `git push` + Radius `app.bicep`). +> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux or ArgoCD. However, note that we're combining two different but existing user experiences (i.e. GitOps `git push` + Radius `*.bicep`). ### Detailed Customer Experience @@ -176,20 +176,24 @@ As an operator, I need to define and deploy a Radius Environment and associate r 1. This triggers Flux or ArgoCD to detect the change, read the new `env.bicep` file, and trigger Radius to deploy the Environment as specified. 1. After Flux or ArgoCD initiates the Radius environment deployment, as I would normally do, I use Flux or ArgoCD to monitor deployment status and health status of the underlying Kubernetes cluster. -> As an operator, while Radius presents some new concepts, those concepts fit cleanly within my existing GitOps workflows so Radius is relatively low overhead for me to adopt and to provide value to my development counterparts. +> As an operator, while Radius presents some new concepts, those concepts fit cleanly within my existing GitOps workflows so Radius is relatively low overhead for me to adopt and to provide value to my development counterparts. + +> A similar process is followed for deploying Radius Recipes, Applications, or any other Resources. **Step 2: Deploy and manage Radius Applications and Resources through GitOps toolset** Now, as a developer, I need to make some changes to my application code. Specifically, I need to change the Radius Resource for my frontend UI. My frontend resource is already running in my Radius application but I need update the code for parsing user input. First, I'll update the required file where this function lives then I'll update my Radius Resource as follows: 1. With my editor of choice, such as VSCode, I open the existing `frontend` application file `helper.ts` and update my function `parseUsers` with the new required logic. -2. Then I push the changes to my git repository, which triggers my CI pipeline to rebuild and containerize my application. Note that the CI is outside of the scope of GitOps and this scenario. +2. Then I push the changes to my git repository, which triggers my CI pipeline to rebuild and containerize my application. *Note that the CI is outside of the scope of GitOps and this scenario.* 3. This triggers Flux/ArgoCD to detect the updated application image and update the `frontend` resource with the new function. I use ArgoCD to monitor the health status of the Kubernetes cluster and immediately realize the pod associated with my `frontend` resource is failing. 4. I review my code change again and realize I created a bug in the function which is causing the resource to fail. 5. To resolve, I: 1. Use the ArgoCD [rollback commands](https://argo-cd.readthedocs.io/en/release-2.7/user-guide/commands/argocd_app_rollback/) to restart my `frontend` resource pods with the previous code version so as not to disrupt my customers OR I use `git revert` to revert my code change and `git push` the revert to my git repository which then gets detected by my CI to rebuild the image and then gets redeployed by Flux. 1. Proceed with a new code change to my `helper.ts` that resolves the bug, then push this change to my repository. -6. Flux/ArgoCD now triggers a redeployment of my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted in a healthy state with my new code. +6. Flux/ArgoCD now triggers a redeployment of my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted with my new code, but my application is still failing due to connectivity issues with a dependency of the `frontend` service that was added as a result of my code changes above. I realize that I need to update the `frontend` resource to include a new environment variable that points to the new dependency. +7. In VSCode, I open the `frontend.bicep` Radius definition file and add the new environment variable to the `frontend` resource, then push the change to my git repository. +8. Flux/ArgoCD detects the change and triggers Radius to redploy the `frontend` resource with the updated environment variable. I can now see in Flux/ArgoCD that the pod has restarted and is now healthy. > As a Developer, I'm very happy with this experience because I get the cool new features of Radius (like self-serve infrastructure deployment via Recipes and the App Graph/Dashboard) in addition to the features I already know and love in Flux, like health monitoring and rollback. @@ -207,17 +211,17 @@ __Requirements resulting from these scenarios:__ GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file (or any `*.bicep` and `.bicepconfig` files) as required to deploy the Radius app correctly. -GitOps tools (beginning with Flux and ArgoCD) must be able to detect changes to Radius applications and other resource type definitions in the git repo and execute the appropriate deployment, etc. commands against those types, including rollback operations. +GitOps tools (beginning with Flux and ArgoCD) must be able to detect changes to Radius applications and other resource type definitions in the git repo and notify Radius of the changes. -### Feature 2: GitOps tooling can execute Radius commands +### Feature 2: GitOps tooling can trigger Radius to execute commands -Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to execute deployments for the Radius resources defined in the git repo. +Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to notify and trigger Radius to execute deployments for the Radius resources defined in the git repo. ### Feature 3: GitOps tooling can rollback Radius resources -GitOps tools (beginning with Flux and ArgoCD) must be able to execute rollback operations for Radius resources defined in the git repo. +GitOps tools (beginning with Flux and ArgoCD) must be able to execute rollback operations for Radius resources defined in the git repo. In other words, the user may use the GitOps rollback commands or `git revert` to revert a change to a Radius resource and Flux/ArgoCD will detect the change and trigger Radius to redeploy the resource with the previous configuration. -#### Feature 4: GitOps tooling can detect and execute resource deletions +### Feature 4: GitOps tooling can detect and execute resource deletions If a resource in a `.bicep` Radius definition file is deleted and the change is pushed to the repo, the GitOps tools (Flux and ArgoCD) should detect the change and trigger Radius to delete that resource. Note that this is not a behavior of bicep today but it's important for this usecase. @@ -229,12 +233,12 @@ If a resource in a `.bicep` Radius definition file is deleted and the change is **Dependency: Flux, ArgoCD, etc.** - These GitOps tools/platforms must be able to detect the Radius definition files specified in the git repo and update the relevant Kubernetes clusters accordingly to deploy and manage Radius resources. -**Risk: ability for GitOps tools to deploy and manage Radius resources** - The primary risk is whether Flux, ArgoCD, etc. can deploy and manage Radius resources as specified in the git repo. We will need to test this to ensure that it works as expected. +**Risk: ability for GitOps tools to deploy and manage Radius resources** - The primary risk is whether Flux, ArgoCD, etc. can trigger Radius to deploy and manage resources as specified in the git repo. We will need to test this to ensure that it works as expected. ## Key assumptions to test and questions to answer -**Assumption:** GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the Bicep files as required to deploy the Radius app correctly. We will need to test this to ensure that it works as expected. +**Assumption:** GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the Bicep files as required to trigger Radius to deploy the resources correctly. We will need to test this to ensure that it works as expected. # Scenario 3: Patch and edit the configuration in my Kubernetes cluster through my GitOps toolset @@ -248,7 +252,7 @@ As an SRE, I am expected to make adjustments to the configuration of the Kuberne ## Desired customer experience outcome -As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1 in the current [Bicep parameters](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/parameter-files?tabs=Bicep) file created by the operator or developer called `app.bicepparams`: +As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1 in the current [Bicep parameters](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/parameter-files?tabs=Bicep) file created by the operator or developer called `app.bicepparams` in the parameters git repo: ````bicep using './app.bicep' @@ -256,7 +260,7 @@ using './app.bicep' param replicaCount = 1 ```` -The above parameter is defined and used in the `app.bicep` file as follows: +The above parameter is defined and used in the `app.bicep` file stored in a separate repo as follows: ```bicep import radius as radius @@ -297,26 +301,33 @@ using './app.bicep' param replicaCount = 10 ```` -2. Commit and push the change to the git repository. +2. Commit and push the change to the parameters git repository. + +2. Flux/ArgoCD detects the change and triggers Radius to redeploy the `demo` container resources with the new replica count. Radius is able to read the `app.bicepparams` file and apply the new replica count to the `demo` container resources, despite the `app.bicep` file being unmodified and also stored in a separate git repository. + +> As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux or ArgoCD mechanisms which I'm familiar with to apply my patches to the Kubernetes cluster. -2. Flux/ArgoCD detects the change and triggers Radius to redeploy the `demo` container resources with the new replica count. +> As an operator or developer I'm happy to keep the configurable parameters separate from the main resource definition files so that SREs can only be granted permissions to modify the parameters and not the main resource definition files. -> As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux mechanisms which I'm familiar with to apply my patches to the Kubernetes cluster. +> As an application engineering team, we are happy with this pattern as we are already following something similar in our Terraform modules today. _Requirements resulting from this scenario:_ -- Flux and ArgoCD must be able to detect the file change described above and must be able to read the `app.bicep` and `app.bicepparams` files as required to deploy the Radius app correctly -- Radius types and Bicep files must be able to be read by Flux/ArgoCD which currently does not have that ability. +- Flux and ArgoCD must be able to detect the file change described above and must be able to read the `app.bicep` and `app.bicepparams` files as required to trigger Radius to deploy resources correctly. ## Key investments -### Feature 1: GitOps tooling can detect and understand Radius definition files +### Feature 1: GitOps tooling can detect and understand Radius definition and parameter files -GitOps tools (beginning with Flux and ArgoCD) must be able to read and detect the `*.bicep` and `*.bicepparams` file changes in the git repo described above as required to trigger Radius to deploy resources correctly. +GitOps tools (beginning with Flux and ArgoCD) must be able to read and detect the `*.bicep` and `*.bicepparams` file changes in the git repos described above as required to trigger Radius to deploy resources correctly. ### Feature 2: GitOps tooling can trigger Radius commands -Once Radius definition file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to trigger Radius to execute deployments for the Radius resources defined in the git repo. +Once Radius definition or parameter file changes are detected and understood, GitOps tools (beginning with Flux and ArgoCD) must be able to trigger Radius to execute deployments for the Radius resources defined in the git repo. + +### Feature 3: Radius can deploy resources with new parameters + +Radius must be able to read the `*.bicepparams` file and apply the new parameters to the `*.bicep` file resources as required to deploy the resources correctly, even if the `*.bicep` file is stored in a separate git repository. Radius must be able to orchestrate the redeployment of the resources with the new parameters even though the main `*.bicep` resource definition file was not updated. ## Key dependencies and risks @@ -326,15 +337,21 @@ Once Radius definition file changes are detected and understood, GitOps tools (b **Dependency: Flux, ArgoCD, etc.** - These GitOps tools/platforms must be able to detect the Radius definition files specified in the git repo and update the relevant Kubernetes clusters accordingly to deploy and manage Radius resources. -**Risk: ability for GitOps tools to deploy and manage Radius resources** - The primary risk is whether Flux, ArgoCD, etc. can deploy and manage Radius resources as specified in the git repo. We will need to test this to ensure that it works as expected. +**Risk: ability for GitOps tools to deploy and manage Radius resources** - The primary risk is whether Flux, ArgoCD, etc. can trigger Radius to deploy and manage resources as specified in the git repo. We will need to test this to ensure that it works as expected. ## Key assumptions to test and questions to answer -**Assumption:** The Flux and ArgoCD controllers can be made aware of the originally checked in Radius resource definitions and can appropriately apply patches to the Kubernetes cluster without conflicts. We will need to test this to ensure that it works as expected. +**Assumption:** It is acceptable for SREs to have to learn some Bicep knowledge and Radius knowledge to apply patches to the Kubernetes cluster. We have validated this with potential users. + +**Assumption:** It is acceptable (or maybe even desirable) for developers or operators to predefine the parameters that SREs can modify in the application configuration. This means that SREs will not be able to modify all parameters in the application configuration, but only the ones that have been parameterized into a Bicep Parameters file. We have validated this with potential users. + +**Assumption:** The main resource definition files (e.g. `app.bicep`) and the parameter files (e.g. `app.bicepparams`) are stored in different git repositories (as is the case validated with users). SREs will only have access to the parameter files and not the main resource definition files, which means that if Flux/ArgoCD detects a change in the parameter file and triggers Radius to redeploy the resources, the resources will be redeployed with the new parameters but the main resource definition file will not be changed, which is acceptable to users. We need to validate that this kind of split repo setup is feasible with Radius and GitOps tools, and that Radius is able to deploy resources with the new parameters without receiving a change notification for the main resource definition file. + +>Note that each Bicep parameter file declares its association with a Bicep file via the `using` keyword (e.g. `using './app.bicep'`). This is how Radius may know which Bicep file to apply the parameters to and thus which resource to redeploy. -**Assumption:** Patch configurations can be built into the Radius resource definition schemas and can be applied by Flux and ArgoCD controllers. We will need to test this to ensure that it works as expected. +**Assumption:** Radius is compatible with the Bicep Parameters file format and features. This will be the case once the Radius has completed its merge with the main Bicep compiler. ## Current state @@ -344,7 +361,9 @@ N/A ### July 16, 2024 - DONE: address some nit comments regarding accuracy of the scenarios with regards to the role of Flux/GitOps in deployments. -- TODO: drill deeper into Scenario 3 to specify whether we should support "patching the inputs" or "patching the outputs" +- DONE: add an app deployment scenario to Scenario 2 +- DONE: drill deeper into Scenario 3 to specify whether we should support "patching the inputs" or "patching the outputs" + - We should support patching the inputs, i.e. the parameters that are used to deploy the resources. This is a pattern that users are already familiar with in their Terraform modules today. ### July 9, 2024 - DONE: More specifics on the value of GitOps as well as limitations to Kubernetes From 376197b35a65532fd7b86bc2bfbc07bd6414f202 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Fri, 19 Jul 2024 17:17:41 -0700 Subject: [PATCH 10/12] spelling corrections Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- .github/config/en-custom.txt | 3 ++- tools/2024-06-gitops-feature-spec.md | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.github/config/en-custom.txt b/.github/config/en-custom.txt index 7c04467..c390268 100644 --- a/.github/config/en-custom.txt +++ b/.github/config/en-custom.txt @@ -701,4 +701,5 @@ fluxcd Kustomize kustomize usecase -gitops \ No newline at end of file +gitops +predefine \ No newline at end of file diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index 8547083..9f035ae 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -193,7 +193,7 @@ Now, as a developer, I need to make some changes to my application code. Specifi 1. Proceed with a new code change to my `helper.ts` that resolves the bug, then push this change to my repository. 6. Flux/ArgoCD now triggers a redeployment of my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted with my new code, but my application is still failing due to connectivity issues with a dependency of the `frontend` service that was added as a result of my code changes above. I realize that I need to update the `frontend` resource to include a new environment variable that points to the new dependency. 7. In VSCode, I open the `frontend.bicep` Radius definition file and add the new environment variable to the `frontend` resource, then push the change to my git repository. -8. Flux/ArgoCD detects the change and triggers Radius to redploy the `frontend` resource with the updated environment variable. I can now see in Flux/ArgoCD that the pod has restarted and is now healthy. +8. Flux/ArgoCD detects the change and triggers Radius to redeploy the `frontend` resource with the updated environment variable. I can now see in Flux/ArgoCD that the pod has restarted and is now healthy. > As a Developer, I'm very happy with this experience because I get the cool new features of Radius (like self-serve infrastructure deployment via Recipes and the App Graph/Dashboard) in addition to the features I already know and love in Flux, like health monitoring and rollback. From f79f90709adfec68d6ea28f16cdbe419bd3edad1 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Mon, 29 Jul 2024 17:14:45 -0700 Subject: [PATCH 11/12] submit for approval Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- tools/2024-06-gitops-feature-spec.md | 111 +++++++++++++++------------ 1 file changed, 60 insertions(+), 51 deletions(-) diff --git a/tools/2024-06-gitops-feature-spec.md b/tools/2024-06-gitops-feature-spec.md index 9f035ae..d2fbf28 100644 --- a/tools/2024-06-gitops-feature-spec.md +++ b/tools/2024-06-gitops-feature-spec.md @@ -1,6 +1,6 @@ # Radius Integration with GitOps -* **Status**: Pending +* **Status**: Approved * **Author**: Will Tsai (@willtsai) ## Summary @@ -22,26 +22,26 @@ Enterprises that use GitOps and also want to use Radius do not currently have a - We are not trying to replace GitOps practices. We are trying to enable Radius to work well with GitOps practices. - We are not implementing dashboard visualizations for GitOps changes (i.e. as an Operator and as a Developer, I can visualize the changes tracked by my GitOps toolset on my Radius Dashboard) as a part of the initial scope of this project. We still need to flesh this out but the basic idea is the Radius dashboard can leverage the Notification Controller in Flux and ArgoCD to display to users inbound and outbound information of the source changes and cluster changes affecting their application and resources. In a future iteration of Radius GitOps integration, we will explore options and potential to leverage plugins in the Backstage ecosystem to provide this functionality. -## Customer profile and challenges - +## user profile and challenges + -### Customer persona(s) - +### User persona(s) + **Infrastructure operators / administrators** - In enterprise applications team that use GitOps, operators are responsible for configuring and managing the git repos for infrastructure and application configuration. They configure GitOps policies as well as the Kubernetes manifests, Helm charts, etc. that are applied to the application and the Kubernetes cluster. **Application developers** - Responsible for designing, developing and maintaining application code. Developers use GitOps by checking files into their git repositories to configure GitOps settings to auto deploy and monitor their application and dependent infrastructure. -**Site Reliability Engineers** - They leverage GitOps by checking in configuration files into their git repositories that add information for the customization of the workload, i.e. replica pod number, strategic merge solutions. +**Site Reliability Engineers** - They leverage GitOps by checking in configuration files into their git repositories that add information for the customization of the workload, e.g. replica pod number. -### Challenge(s) faced by the customer - +### Challenge(s) faced by the user + Before the introduction of GitOps, enterprise application teams struggled with complex deployment processes; environment/application configuration drift; as well as difficulty following best practices for security and disaster recovery. GitOps enables teams to define continuous deployment workflows and automation without having to write custom scripts or maintain complex CI/CD pipelines. GitOps tools simplify and perform many of these tasks, which add significant value by increasing efficiency, security, creating better developer experiences, and allowing for faster deployments between code changes. Radius offers additional complementary value for enterprise application teams, including a rich, declarative application description which enables the Radius application graph. Radius also provides a clear separate of concerns via Recipes. However, without integration of Radius and GitOps, plus clear guidance on how to use the two together, enterprises already invested in GitOps don't know how to use both GitOps and Radius for a better together experience. Additionally, in its current state GitOps works well within the Kubernetes sphere of influence but can't be applied to things outside of Kubernetes. This is a limitation for enterprises that want to use GitOps to manage their entire application stack, including things like databases, storage, and other infrastructure. To get the full benefit of GitOps, a user needs to combine things that can be controlled with GitOps with things that can't. Radius + Recipes can help here. - + ## Key scenarios @@ -60,6 +60,8 @@ As a Developer, my Operations counterpart has provided a Radius Environment plus As an SRE, my Operations counterpart has provided a Kubernetes cluster, a Radius Environment running on that cluster and the infrastructure resources through Radius Recipes required for the Radius Applications that will run in that Radius Environment. I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. I can do this through my GitOps toolset by updating the Bicep parameter files and pushing them into the git repository. Flux or ArgoCD will detect the changes and trigger Radius to redeploy the resources with the new configurations. +>Note that this GitOps integration with Radius will not handle the publishing step of recipes, i.e. `rad bicep publish`. Instead, changes to the environment (e.g. Recipe image tag) would be built into the definition of `env.bicep` and thus applied by GitOps as the Recipe configurations in the Environment are updated and merged. + ## Key dependencies and risks @@ -71,10 +73,10 @@ The primary risk, per the question below, is whether an abstracted, generalizabl ## Key assumptions to test and questions to answer - + Our goal is to deliver a generalized model such that Flux, ArgoCD plus future GitOps technologies can all be integrated in a consistent manner. How feasible is that? We'll need to learn more about the GitOps tool landscape to determine feasibility. -We as the Radius team understand that while Flux and ArgoCD might currently be the most popular options utilized by customers and while the current scenarios might reference Flux or ArgoCD, the solution we will build will be abstract enough to support all other high priority GitOps toolsets. We currently understand the GitOps workflow if abstracted in implementation usually follows this format: +We as the Radius team understand that while Flux and ArgoCD might currently be the most popular options utilized by users and while the current scenarios might reference Flux or ArgoCD, the solution we will build will be abstract enough to support all other high priority GitOps toolsets. We currently understand the GitOps workflow if abstracted in implementation usually follows this format: 1. Install GitOps tool onto Kubernetes cluster. 2. Add git repository information to GitOps tool 3. Deploy application to Kubernetes cluster @@ -87,20 +89,18 @@ N/A # Scenario 1: Install and configure Radius using GitOps -## Target customers - +## Target users **Infrastructure operators / administrators** - In enterprise applications teams that use GitOps, operators are responsible for configuring and managing the git repos for infrastructure and application configuration. They configure GitOps policies as well as the Kubernetes manifests, Helm charts, etc. that are applied to the application and the Kubernetes cluster. -## Details of customer problem - +## Details of user problem As an infrastructure operator in an enterprise applications team that has to maintain complex Kubernetes clusters and the platforms that run on them, I need to ensure that my team can deploy and manage applications and infrastructure in a consistent and reliable way. Thus, we make use of GitOps to manage Kubernetes clusters and applications that run in those clusters. All relevant configuration information is stored in git repositories that serve as the source of truth for both the application code and infrastructure configuration. Most commonly, infrastructure configuration and application code are housed in separate repos. To use Radius, I need to include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. -## Desired customer experience outcome - +## Desired user experience outcome + After this scenario is implemented, I can install and configure Radius with my GitOps tool of choice. I can include Radius as a dependency for my Kubernetes cluster, just as I would any other dependency, via my GitOps tools. This ensures that I no longer have to manually manage the Radius instance I am providing for my developers and SREs to use, that there is an automated process to ensure that the deployed Radius instance matches the state specified I have specified in a repository. In other words, my experience with managing our Radius instance is consistent with how I normally use GitOps to define any other dependency in a Kubernetes environment. -### Detailed Customer Experience - +### Detailed User Experience + @@ -115,14 +115,14 @@ Step 2 > As an operator, I'm happy with this experience because it is completely consistent with how I normally use Flux or ArgoCD to define any other dependency in a Kubernetes environment. The syntax for adding Radius as a Kubernetes cluster dependency must be: -- Consistent with customer expectations for defining such a dependency +- Consistent with user expectations for defining such a dependency - Executable by Flux or ArgoCD (which based on current understanding should just work today). -> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux, ArgoCD, or their GitOps tool of choice. +> There is no new Radius specific user experience for this scenario. Users are only interacting with Flux, ArgoCD, or their GitOps tool of choice. ## Key investments -Based on current understanding, there should be no new features that need to be implemented in Radius for this scenario. The key investment is in testing the GitOps tooling to ensure that the syntax for adding Radius as a Kubernetes cluster dependency is consistent with customer expectations for defining such a dependency and is executable by Flux or ArgoCD. We need to validate that our Helm Charts are compatible with Flux and ArgoCD. +Based on current understanding, there should be no new features that need to be implemented in Radius for this scenario. The key investment is in testing the GitOps tooling to ensure that the syntax for adding Radius as a Kubernetes cluster dependency is consistent with user expectations for defining such a dependency and is executable by Flux or ArgoCD. We need to validate that our Helm Charts are compatible with Flux and ArgoCD. ## Key dependencies and risks @@ -136,30 +136,30 @@ Based on current understanding, there should be no new features that need to be ## Key assumptions to test and questions to answer - + -**Assumption:** The syntax for adding Radius as a Kubernetes cluster dependency must be consistent with customer expectations for defining such a dependency and must be executable by Flux or ArgoCD. We will need to test this to ensure that it works as expected. +**Assumption:** The syntax for adding Radius as a Kubernetes cluster dependency must be consistent with user expectations for defining such a dependency and must be executable by Flux or ArgoCD. We will need to test this to ensure that it works as expected. # Scenario 2: Deploy and manage Radius Environments, Recipes, Apps, Resources using GitOps -## Target customers - +## Target users + **Infrastructure operators / administrators** - In enterprise applications team that use GitOps, operators are responsible for configuring and managing the git repos for infrastructure and application configuration. They configure GitOps policies as well as the Kubernetes manifests, Helm charts, etc. that are applied to the application and the Kubernetes cluster. **Application developers** - Responsible for designing, developing and maintaining application code. Developers use GitOps by checking files into their git repositories to configure GitOps settings to auto deploy and monitor their application and dependent infrastructure. -## Existing customer problem - +## Existing user problem + Since my organization uses GitOps to manage Kubernetes clusters and applications that run in those clusters, I need to remain consistent in infrastructure and application deployment even when using Radius. This means that I need to be able to apply Radius-centric concepts like Environments, Recipes, Applications, and Resources in a way that is consistent with my existing GitOps workflows. Without clear guidance on how to do this, I am unsure how to use both GitOps and Radius for a better together experience. -## Desired customer experience outcome - -With the implementation of this scenario, I can deploy and manage Radius Environments, their associated Recipes, Applications, Credentials, and Resources via my existing GitOps (Flux and ArgoCD) toolsets. This means that I don't have to author pipelines or manually run Radius commands to deploy Radius-managed resources. Instead, I can leverage a git repo as the source of truth for my Radius-managed resources and use my existing GitOps tooling to deploy and manage those resources. This ensures that my experience with managing Radius resources is consistent with how I normally use GitOps to manage Kubernetes clusters and applications. +## Desired user experience outcome + +With the implementation of this scenario, I can deploy and manage Radius Environments, their associated Recipes, Applications, Credentials, and Resources via my existing GitOps (Flux and ArgoCD) toolsets. This means that I don't have to author pipelines or manually run Radius commands to deploy Radius-managed resources. Instead, I can leverage a git repo as the source of truth for my Radius-managed resources and use my existing GitOps tooling to deploy and manage those resources. This ensures that my experience with managing Radius resources is consistent with the patterns in how I normally use GitOps to manage Kubernetes clusters and applications. -> There is no new Radius specific user experience for this scenario. Customers are only interacting with Flux or ArgoCD. However, note that we're combining two different but existing user experiences (i.e. GitOps `git push` + Radius `*.bicep`). +> There is no new Radius specific user experience for this scenario. Users are only interacting with Flux or ArgoCD. The new experience overall is using Radius and Bicep with GitOps instead of YAML/KRM. -### Detailed Customer Experience - +### Detailed user Experience + @@ -169,11 +169,11 @@ Step 2 As an operator, I need to define and deploy a Radius Environment and associate relevant Recipes, Credentials, etc. to that environment so that my development team can build and deploy their Radius applications. 1. Using my editor of choice, such as VSCode, I open an existing Radius environment file, `env.bicep`, from the online Radius samples repo. -1. I use Radius documentation to determine specific edits to my `env.bicep` file such as Kubernetes cluster and the appropriate cloud provider registrations and credentials. +1. I use Radius documentation to determine specific edits to my `env.bicep` and `env.bicepparams` files such as Kubernetes cluster and the appropriate cloud provider registrations and credentials. 1. I will create a Radius Environment resource and add all the information listed above. 1. I will then edit my Radius Environment resource to contain the properties required to register my Radius Recipe such as template path, template kind, etc. -1. I then push my new `env.bicep` file to my git repository, as I would any other file. -1. This triggers Flux or ArgoCD to detect the change, read the new `env.bicep` file, and trigger Radius to deploy the Environment as specified. +1. I then push my new `env.bicep` and `env.bicepparams` files to my git repository, as I would any other file. +1. This triggers Flux or ArgoCD to detect the change, read the new `env.bicep` and `env.bicepparams` files, and trigger Radius to deploy the Environment as specified. 1. After Flux or ArgoCD initiates the Radius environment deployment, as I would normally do, I use Flux or ArgoCD to monitor deployment status and health status of the underlying Kubernetes cluster. > As an operator, while Radius presents some new concepts, those concepts fit cleanly within my existing GitOps workflows so Radius is relatively low overhead for me to adopt and to provide value to my development counterparts. @@ -189,10 +189,10 @@ Now, as a developer, I need to make some changes to my application code. Specifi 3. This triggers Flux/ArgoCD to detect the updated application image and update the `frontend` resource with the new function. I use ArgoCD to monitor the health status of the Kubernetes cluster and immediately realize the pod associated with my `frontend` resource is failing. 4. I review my code change again and realize I created a bug in the function which is causing the resource to fail. 5. To resolve, I: - 1. Use the ArgoCD [rollback commands](https://argo-cd.readthedocs.io/en/release-2.7/user-guide/commands/argocd_app_rollback/) to restart my `frontend` resource pods with the previous code version so as not to disrupt my customers OR I use `git revert` to revert my code change and `git push` the revert to my git repository which then gets detected by my CI to rebuild the image and then gets redeployed by Flux. + 1. Use the ArgoCD [rollback commands](https://argo-cd.readthedocs.io/en/release-2.7/user-guide/commands/argocd_app_rollback/) to restart my `frontend` resource pods with the previous code version so as not to disrupt my users OR I use `git revert` to revert my code change and `git push` the revert to my git repository which then gets detected by my CI to rebuild the image and then gets redeployed by Flux. 1. Proceed with a new code change to my `helper.ts` that resolves the bug, then push this change to my repository. 6. Flux/ArgoCD now triggers a redeployment of my new application image containing the bug fix and I can see in Flux/ArgoCD that the pod has restarted with my new code, but my application is still failing due to connectivity issues with a dependency of the `frontend` service that was added as a result of my code changes above. I realize that I need to update the `frontend` resource to include a new environment variable that points to the new dependency. -7. In VSCode, I open the `frontend.bicep` Radius definition file and add the new environment variable to the `frontend` resource, then push the change to my git repository. +7. In VSCode, I open the `frontend.bicep` Radius definition and `frontend.bicepparams` files and add the new environment variable to the `frontend` resource, then push the change to my git repository. 8. Flux/ArgoCD detects the change and triggers Radius to redeploy the `frontend` resource with the updated environment variable. I can now see in Flux/ArgoCD that the pod has restarted and is now healthy. > As a Developer, I'm very happy with this experience because I get the cool new features of Radius (like self-serve infrastructure deployment via Recipes and the App Graph/Dashboard) in addition to the features I already know and love in Flux, like health monitoring and rollback. @@ -200,7 +200,7 @@ Now, as a developer, I need to make some changes to my application code. Specifi __Requirements resulting from these scenarios:__ - Flux/ArgoCD must be able to detect the file change described above and must be able to read the env.bicep file as required to deploy the Radius app correctly - Flux/ArgoCD must be able to detect changes to the `.bicepconfig` file and execute bicep config updates -- Radius types and Bicep files must be able to be read by Flux/ArgoCD which currently does not have that ability. +- Radius types, `bicep`, and `bicepparams` files must be able to be read by Flux/ArgoCD which currently does not have that ability. - Flux/ArgoCD must be able to detect Radius resource types and execute Flux/ArgoCD commands against those types, including rollback *list of any other required Flux commands*. - Based on user feedback this could be just testing that commands such as dry run mechanisms work @@ -209,7 +209,7 @@ __Requirements resulting from these scenarios:__ ### Feature 1: GitOps tooling can detect and understand Radius definition files -GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file (or any `*.bicep` and `.bicepconfig` files) as required to deploy the Radius app correctly. +GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the `env.bicep` file (or any `*.bicep`, `*.bicepparams`, `.bicepconfig` files) as required to deploy the Radius app correctly. GitOps tools (beginning with Flux and ArgoCD) must be able to detect changes to Radius applications and other resource type definitions in the git repo and notify Radius of the changes. @@ -223,7 +223,9 @@ GitOps tools (beginning with Flux and ArgoCD) must be able to execute rollback o ### Feature 4: GitOps tooling can detect and execute resource deletions -If a resource in a `.bicep` Radius definition file is deleted and the change is pushed to the repo, the GitOps tools (Flux and ArgoCD) should detect the change and trigger Radius to delete that resource. Note that this is not a behavior of bicep today but it's important for this usecase. +If a resource or config in a `.bicep`, `.bicepparams`, or `.bicepconfig` definition file is deleted and the change is pushed to the repo, the GitOps tools (Flux and ArgoCD) should detect the change and trigger Radius to delete that resource. Note that this is not a behavior of Bicep today but it's important for this usecase. + +>The technical design should account for the edge case where `.bicepparams` files are deleted but the `.bicep` files are not. The behavior for this case should follow the same conventions currently in place for Helm charts and other GitOps resources - more investigation during implementation time is needed to determine the behavior that aligns with conventions. Reference the concept of "declarative deletion" in the exploration. ## Key dependencies and risks @@ -237,21 +239,23 @@ If a resource in a `.bicep` Radius definition file is deleted and the change is ## Key assumptions to test and questions to answer - + **Assumption:** GitOps tools (beginning with Flux and ArgoCD) must be able to detect the file change in the git repo described above and must be able to read the Bicep files as required to trigger Radius to deploy the resources correctly. We will need to test this to ensure that it works as expected. +**Assumption:** CI for the application code and Recipes is outside of the scope of this scenario. We assume that the user has separate automation that handles the containerization of the application code and the publishing of the Recipes to a registry. + # Scenario 3: Patch and edit the configuration in my Kubernetes cluster through my GitOps toolset -## Target customers - +## Target users + **Site Reliability Engineers (SREs)** - They leverage GitOps by checking in configuration files into their git repositories that add information for the customization of the workload, i.e. replica pod number, strategic merge solutions. -## Existing customer problem - +## Existing user problem + As an SRE, I am expected to make adjustments to the configuration of the Kubernetes cluster and the Radius Environment as needed. Today, I adjust the cluster as needed through my GitOps toolset via patching solutions like [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/) in [Flux](https://fluxcd.io/flux/components/kustomize/) or [ArgoCD](https://argo-cd.readthedocs.io/en/stable/user-guide/kustomize/), but I'm unable to do so for all the Radius-specific resources since Radius types and Bicep files are not currently readable by Flux or ArgoCD. This means that I have to manually manage the Radius instance I am providing for my developers and operators to use, which is a significant overhead for me to adopt and to provide value to my development counterparts. Additionally, access to modify the Radius Applications and Resources is not granted to me as an SRE, so I have to rely on my Operations counterpart to make these changes for me, which is a significant bottleneck in my workflow. -## Desired customer experience outcome - +## Desired user experience outcome + As an example, the application team will require 10 replica pods for their prod environment however this variable is currently set to 1 in the current [Bicep parameters](https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/parameter-files?tabs=Bicep) file created by the operator or developer called `app.bicepparams` in the parameters git repo: ````bicep @@ -303,7 +307,7 @@ param replicaCount = 10 2. Commit and push the change to the parameters git repository. -2. Flux/ArgoCD detects the change and triggers Radius to redeploy the `demo` container resources with the new replica count. Radius is able to read the `app.bicepparams` file and apply the new replica count to the `demo` container resources, despite the `app.bicep` file being unmodified and also stored in a separate git repository. +2. Flux/ArgoCD detects the change and triggers Radius to redeploy the `demo` container resources with the new replica count. Radius is able to read the `app.bicepparams` file and apply the new replica count to the `demo` container resources, despite the `app.bicep` file being unmodified and also stored in a separate git repository. Note that these parameters can apply to fields within the [`PodSpec` configurations](https://docs.radapp.io/guides/author-apps/kubernetes/patch-podspec/) that Radius currently supports. > As an SRE, I'm very happy with this experience because I did not have to do many steps to apply my parameter patch and while I had to learn some Bicep knowledge and Radius knowledge, I was able to ultimately leverage Flux or ArgoCD mechanisms which I'm familiar with to apply my patches to the Kubernetes cluster. @@ -341,7 +345,7 @@ Radius must be able to read the `*.bicepparams` file and apply the new parameter ## Key assumptions to test and questions to answer - + **Assumption:** It is acceptable for SREs to have to learn some Bicep knowledge and Radius knowledge to apply patches to the Kubernetes cluster. We have validated this with potential users. @@ -359,6 +363,11 @@ N/A ## Notes +### July 29, 2024 +- DONE: Add assumption regarding CI responsibility for publishing and pushing Recipes to registries being outside of Radius-GitOps scope. +- DONE: Clarify that version numbers are explicitly managed by operators. +- DONE: Discuss and define the approach for handling declarative deletion in Bicep deployments where parameters are deleted. + ### July 16, 2024 - DONE: address some nit comments regarding accuracy of the scenarios with regards to the role of Flux/GitOps in deployments. - DONE: add an app deployment scenario to Scenario 2 From b762acdd82293a073639a099dc10fcd502eb8005 Mon Sep 17 00:00:00 2001 From: Will Tsai <28876888+willtsai@users.noreply.github.com> Date: Tue, 30 Jul 2024 09:13:33 -0700 Subject: [PATCH 12/12] spelling corrections Signed-off-by: Will Tsai <28876888+willtsai@users.noreply.github.com> --- .github/config/en-custom.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/config/en-custom.txt b/.github/config/en-custom.txt index c390268..c7beb37 100644 --- a/.github/config/en-custom.txt +++ b/.github/config/en-custom.txt @@ -702,4 +702,5 @@ Kustomize kustomize usecase gitops -predefine \ No newline at end of file +predefine +KRM \ No newline at end of file