-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ListNodeImageVersions + shared image gallery support #526
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 11371322239Details
💛 - Coveralls |
func (n NodeImageVersionsAPI) List(_ context.Context, _, _ string) (imagefamily.NodeImageVersionsResponse, error) { | ||
return imagefamily.NodeImageVersionsResponse{ | ||
Values: []imagefamily.NodeImageVersion{ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Codegen this data
pkg/cloudprovider/drift.go
Outdated
|
||
expectedImageID, err := c.imageProvider.GetImageID(ctx, communityImageName, publicGalleryURL, nodeClass.Spec.GetImageVersion()) | ||
expectedImageID, err := c.imageProvider.GetLatestImageID(ctx, imageStub, nodeClass.Spec.GetImageVersion()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note this will trigger drift when we switch it on going from community to sig
pkg/operator/options/options.go
Outdated
@@ -90,6 +94,8 @@ func (o *Options) AddFlags(fs *coreoptions.FlagSet) { | |||
fs.StringVar(&o.SubnetID, "vnet-subnet-id", env.WithDefaultString("VNET_SUBNET_ID", ""), "The default subnet ID to use for new nodes. This must be a valid ARM resource ID for subnet that does not overlap with the service CIDR or the pod CIDR") | |||
fs.Var(newNodeIdentitiesValue(env.WithDefaultString("NODE_IDENTITIES", ""), &o.NodeIdentities), "node-identities", "User assigned identities for nodes.") | |||
fs.StringVar(&o.NodeResourceGroup, "node-resource-group", env.WithDefaultString("AZURE_NODE_RESOURCE_GROUP", ""), "[REQUIRED] the resource group created and managed by AKS where the nodes live") | |||
fs.BoolVar(&o.ManagedKarpenter, "managed-karpenter", env.WithDefaultBool("MANAGED_KARPENTER", false), "Whether Karpenter is managed by AKS or not.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.BoolVar(&o.ManagedKarpenter, "managed-karpenter", env.WithDefaultBool("MANAGED_KARPENTER", false), "Whether Karpenter is managed by AKS or not.") | |
fs.BoolVar(&o.ManagedKarpenter, "managed-karpenter", env.WithDefaultBool("MANAGED_KARPENTER", false), "[MANAGED-FLAG] Whether Karpenter is managed by AKS or not.") |
@@ -90,6 +94,8 @@ func (o *Options) AddFlags(fs *coreoptions.FlagSet) { | |||
fs.StringVar(&o.SubnetID, "vnet-subnet-id", env.WithDefaultString("VNET_SUBNET_ID", ""), "The default subnet ID to use for new nodes. This must be a valid ARM resource ID for subnet that does not overlap with the service CIDR or the pod CIDR") | |||
fs.Var(newNodeIdentitiesValue(env.WithDefaultString("NODE_IDENTITIES", ""), &o.NodeIdentities), "node-identities", "User assigned identities for nodes.") | |||
fs.StringVar(&o.NodeResourceGroup, "node-resource-group", env.WithDefaultString("AZURE_NODE_RESOURCE_GROUP", ""), "[REQUIRED] the resource group created and managed by AKS where the nodes live") | |||
fs.BoolVar(&o.ManagedKarpenter, "managed-karpenter", env.WithDefaultBool("MANAGED_KARPENTER", false), "Whether Karpenter is managed by AKS or not.") | |||
fs.StringVar(&o.SIGSubscriptionID, "sig-subscription-id", env.WithDefaultString("SIG_SUBSCRIPTION_ID", ""), "The subscription ID of the shared image gallery.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs.StringVar(&o.SIGSubscriptionID, "sig-subscription-id", env.WithDefaultString("SIG_SUBSCRIPTION_ID", ""), "The subscription ID of the shared image gallery.") | |
fs.StringVar(&o.SIGSubscriptionID, "sig-subscription-id", env.WithDefaultString("SIG_SUBSCRIPTION_ID", ""), "[MANAGED-FLAG] The subscription ID of the shared image gallery we want to point to") |
pkg/operator/options/options.go
Outdated
@@ -74,6 +74,10 @@ type Options struct { | |||
SubnetID string // => VnetSubnetID to use (for nodes in Azure CNI Overlay and Azure CNI + pod subnet; for for nodes and pods in Azure CNI), unless overridden via AKSNodeClass | |||
setFlags map[string]bool | |||
|
|||
ManagedKarpenter bool // => ManagedKarpenter is true if Karpenter is managed by AKS, false if it is a self-hosted karpenter installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remind me: is there any way to prevent self-hosted users from setting this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, presumably they good set it if they wanted to, but after we put Agentbaker bootstrapping behind this flag bootstrapping won't work for them.
I guess we could also determine if we are using managed based on the kubeclients or build path(main_ccp.go vs main.go) but i thought we wanted to remove those and consolidate them eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be an enum?
Mode:
- Managed
- SelfHosted
With default SelfHosted?
@matthchr, feel like I remember best practice of using enums instead of bools?
pkg/providers/imagefamily/image.go
Outdated
} | ||
|
||
const ( | ||
kubernetesVersionCacheKey = "kubernetesVersion" | ||
|
||
// if imageVersion is equal to an "", then we will get the latest version | ||
autoUpgradeMode = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are looking to remove imageVersion
I think leaving this out for now would be best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, missed this is a const. Makes sense for clarity. However, should still be getting removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pease resolve merge conflicts (which should include removal of AKSNodeClass imageVersion)
pkg/providers/imagefamily/image.go
Outdated
// /subscriptions/{subscriptionID}/resourceGroups/{resourceGroup}/providers/Microsoft.Compute/galleries/{galleryName}/images/{imageDefinition}/versions/{imageVersion}. | ||
// If imageVersion is an empty string, it fetches the latest version of the image and caches it for imageExpirationInterval (3 days). | ||
func (p *Provider) getImageIDSIG(ctx context.Context, imgStub DefaultImageOutput, imageVersion string) (string, error) { | ||
key := fmt.Sprintf(sharedImageGalleryImageIDFormat, options.FromContext(ctx).SIGSubscriptionID, imgStub.GalleryResourceGroup, imgStub.GalleryName, imgStub.ImageDefinition, imageVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does subscription ID need to be part of the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reconstruct the full id without the cache including subscription id. Especially since sub is a constant. Good catch.
pkg/providers/imagefamily/image.go
Outdated
if imageID, ok := p.imageCache.Get(key); ok { | ||
return imageID.(string), nil | ||
} | ||
if imageVersion == autoUpgradeMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to simplified code w/o imageVersion
pkg/providers/imagefamily/image.go
Outdated
imageID, imageExpirationInterval) | ||
} | ||
// return the latest version of the image from the cache after we have caached all of the imageDefinitions | ||
if imageID, ok := p.imageCache.Get(key); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What guarantees this key will be available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We iterate through all the image definitions + image versions from ListNodeImageVersions call before this section and set them in the cache.
for _, version := range versions.Values {
imageID := fmt.Sprintf(sharedImageGalleryImageIDFormat, options.FromContext(ctx).SIGSubscriptionID, imgStub.GalleryResourceGroup, imgStub.GalleryName, version.SKU, version.Version)
cacheKey := fmt.Sprintf(sharedImageGalleryImageIDFormat, options.FromContext(ctx).SIGSubscriptionID, imgStub.GalleryResourceGroup, imgStub.GalleryName, version.SKU, autoUpgradeMode)
p.imageCache.Set(
cacheKey,
imageID, imageExpirationInterval)
}
@@ -137,11 +140,13 @@ func NewAZClient(ctx context.Context, cfg *auth.Config, env *azure.Environment) | |||
} | |||
klog.V(5).Infof("Created azure resource graph client %v, using a token credential", azureResourceGraphClient) | |||
|
|||
imageVersionsClient, err := armcomputev5.NewCommunityGalleryImageVersionsClient(cfg.SubscriptionID, cred, opts) | |||
communityImageVersionsClient, err := armcomputev5.NewCommunityGalleryImageVersionsClient(cfg.SubscriptionID, cred, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This client may not be needed, based on SIG flag, right? Would be good to only instantiate clients that are needed.
pkg/operator/options/options.go
Outdated
@@ -90,6 +94,8 @@ func (o *Options) AddFlags(fs *coreoptions.FlagSet) { | |||
fs.StringVar(&o.SubnetID, "vnet-subnet-id", env.WithDefaultString("VNET_SUBNET_ID", ""), "The default subnet ID to use for new nodes. This must be a valid ARM resource ID for subnet that does not overlap with the service CIDR or the pod CIDR") | |||
fs.Var(newNodeIdentitiesValue(env.WithDefaultString("NODE_IDENTITIES", ""), &o.NodeIdentities), "node-identities", "User assigned identities for nodes.") | |||
fs.StringVar(&o.NodeResourceGroup, "node-resource-group", env.WithDefaultString("AZURE_NODE_RESOURCE_GROUP", ""), "[REQUIRED] the resource group created and managed by AKS where the nodes live") | |||
fs.BoolVar(&o.ManagedKarpenter, "use-sig", env.WithDefaultBool("USE_SIG", false), "If set to true karpenter will use the AKS managed shared image galleries and the node image versions api. If set to false karpenter will use community image galleries. Only a subset of image features will be available in the community image galleries and this flag is only for the managed node provisioning addon.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename the internal variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
pkg/providers/imagefamily/image.go
Outdated
// GetLatestImageID returns the latest image ID for the given image version and image specification | ||
func (p *Provider) GetLatestImageID(ctx context.Context, defaultImage DefaultImageOutput, imageVersion string) (string, error) { | ||
// Managed Karpenter will use the AKS Managed Shared Image Galleries | ||
if options.FromContext(ctx).ManagedKarpenter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can one cache here, instead of later?
…ng out unsupported galleries
321f082
to
cc07eb7
Compare
<TODO: Move from 04-02 preview to 09-02 preview> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing CI
Fixes #
Description
This pr introduces
Open Qs:
Are we ok with all Nodes with Community Image gallery node images being drifted over to use SIG in managed karpenter with this change?
How was this change tested?
Does this change impact docs?
Release Note