Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/aws s3 collect multiple profiles #174

Closed
wants to merge 6 commits into from

Conversation

Pokom
Copy link
Contributor

@Pokom Pokom commented May 16, 2024

Cherry picked a change from #169 to extend the configuration for
AWS to have profiles which represents the profiles you want to pill data
from.

Updated aws/s3 to become aware of multiple profiles and create a new
costexplorer client per profile when fetching billing data.
Updated s3.parseBillingData to return a slice of outputs so that we
can merge them with other profiles before parsing out billing data.

Pokom added 2 commits May 16, 2024 09:49
- fixes #173

Cherry picked a change from #169 to extend the configuration for
AWS to have profiles which represents the profiles you want to pill data
from.

Updated `aws/s3` to become aware of multiple profiles and create a new
costexplorer client per profile when fetching billing data.
Updated `s3.parseBillingData` to return a slice of outputs so that we
can merge them with other profiles before parsing out billing data.
@Pokom Pokom requested review from the-it and logyball May 16, 2024 13:59
Copy link

@inkel inkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass at review, I've found something that I'm not entirely sure I understand. Once this is addressed I'd gladly do another run.

pkg/aws/aws.go Outdated
Comment on lines 117 to 119
if config.Profile != "" {
options = append(options, awsconfig.WithSharedConfigProfile(profile))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's going on here. You check here that the single -profile flag has some value, and if it does, then forget about that value and use the one from the iteration 🤔

Basically the -profile value never gets used, and in fact, I'd argue than adding this flag adds confusion to the user, as now they have -profile and -profiles. It seems to me that the single case is the same as -profiles with just one value.

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty on point question. Mainly, naming is hard 😅

It looks like we're not using the -profile flag in dev or production today:

 spec:
  ...
      containers:
        - args:
            - --provider=aws
            - --aws.services=s3

I think the initial intent of the check you see is when doing local development to figure out which profile we should target. In dev/production that should be handled by:

options := []func(*awsconfig.LoadOptions) error{awsconfig.WithEC2IMDSRegion()}

Let me remove the -profile flag as it's not used anymore. I'll build a local image and deploy this into development and make sure it still works 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed here, it looks like this might've been a misstep. It now appears that we can't even use the flags separately, they must be used together, as specifying -aws.profile will not get used unless there's a profile from -aws.profiles as well, in which case it'll use the value(s) from aws.profiles, but unless you supply aws.profile, it'll never use any config values 😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if config.Profile != "" {
options = append(options, awsconfig.WithSharedConfigProfile(profile))
}
options = append(options, awsconfig.WithSharedConfigProfile(profile))

I'd suggest something like this, perhaps with a conditional flag to look if aws-profile is defined first rather than a loop through Profiles?

Or better yet, I'd just eliminate the aws.profiles flag entirely and allow aws.profile to be specified multiple times OR eliminate aws.profile and allow aws.profiles to be one or more comma-seperated items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opted to stick with -aws.profiles as a flag, as I think it makes it clear that multiple values can be passed in. @inkel @logyball I've pushed a commit that removes the reference, and appreciate the thoughts/suggestions!

cloudcost-exporter when deployed infers the profile and region from this line:

				options := []func(*awsconfig.LoadOptions) error{awsconfig.WithEC2IMDSRegion()}

I need to be a bit mindful when rolling this change out. If the -profiles isn't set, I've added logic to return an error. So I'll need to likely couple rolling out the wave with adding the -aws.profiles flag to the environment.

Copy link
Contributor

@logyball logyball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as inkel, with a couple of suggestions how to resolve

pkg/aws/aws.go Outdated
Comment on lines 117 to 119
if config.Profile != "" {
options = append(options, awsconfig.WithSharedConfigProfile(profile))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed here, it looks like this might've been a misstep. It now appears that we can't even use the flags separately, they must be used together, as specifying -aws.profile will not get used unless there's a profile from -aws.profiles as well, in which case it'll use the value(s) from aws.profiles, but unless you supply aws.profile, it'll never use any config values 😕

pkg/aws/aws.go Outdated
Comment on lines 117 to 119
if config.Profile != "" {
options = append(options, awsconfig.WithSharedConfigProfile(profile))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if config.Profile != "" {
options = append(options, awsconfig.WithSharedConfigProfile(profile))
}
options = append(options, awsconfig.WithSharedConfigProfile(profile))

I'd suggest something like this, perhaps with a conditional flag to look if aws-profile is defined first rather than a loop through Profiles?

Or better yet, I'd just eliminate the aws.profiles flag entirely and allow aws.profile to be specified multiple times OR eliminate aws.profile and allow aws.profiles to be one or more comma-seperated items.

@Pokom Pokom requested a review from logyball May 16, 2024 18:09
@Pokom Pokom requested a review from inkel May 16, 2024 18:25
Copy link
Contributor

@logyball logyball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more suggestions, already liking this direction!!

@@ -9,6 +9,7 @@ type Config struct {
ProjectID string
Providers struct {
AWS struct {
Profile string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing I'm confused about now is why even have this at all? It's not used anywhere as far as I can tell -

func selectProvider(cfg *config.Config) (provider.Provider, error) {
	switch cfg.Provider {
	case "aws":
		return aws.New(&aws.Config{
			Region:         cfg.Providers.AWS.Region,
			Profiles:       strings.Split(cfg.Providers.AWS.Profiles.String(), ","),
			ScrapeInterval: cfg.Collector.ScrapeInterval,
			Services:       strings.Split(cfg.Providers.AWS.Services.String(), ","),
		})

	case "gcp":
		return google.New(&google.Config{
			ProjectId:       cfg.ProjectID,
			Region:          cfg.Providers.GCP.Region,
			Projects:        cfg.Providers.GCP.Projects.String(),
			DefaultDiscount: cfg.Providers.GCP.DefaultGCSDiscount,
			ScrapeInterval:  cfg.Collector.ScrapeInterval,
			Services:        strings.Split(cfg.Providers.GCP.Services.String(), ","),
		})

	default:
		return nil, fmt.Errorf("unknown provider")
	}
}

@@ -28,7 +28,8 @@ import (

func providerFlags(fs *flag.FlagSet, cfg *config.Config) {
flag.StringVar(&cfg.Provider, "provider", "aws", "aws or gcp")
fs.Var(&cfg.Providers.AWS.Profiles, "aws.profile", "AWS profile(s).")
fs.StringVar(&cfg.Providers.AWS.Profile, "aws.profile", "", "AWS Profile to authenticate with.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto to comment above ^

This doesn't seem to be used at all

Comment on lines +105 to +106
if len(config.Profiles) == 0 {
return nil, fmt.Errorf("no profiles provided")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only problem is this doesn't quite work. If you split an empty string, it results in a length one array 😡

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we either check that:

  • cfg.Providers.AWS.Profiles != "" when defined
  • move the check to the loop:
case "S3":
			var clients []costexplorer.CostExplorer
			for _, profile := range config.Profiles {
				if profile == "" {
					return nil, fmt.Errorf("invalid profile provided")
				}
				// There are two scenarios:
				// 1. Running locally, the user must pass in a region and profile to use
				// 2. Running within an EC2 instance and the region and profile can be derived
				// I'm going to use the AWS SDK to handle this for me. If the user has provided a region and profile, it will use that.
				// If not, it will use the EC2 instance metadata service to determine the region and credentials.
				// This is the same logic that the AWS CLI uses, so it should be fine.
				options := []func(*awsconfig.LoadOptions) error{awsconfig.WithEC2IMDSRegion()}
				if config.Region != "" {
					options = append(options, awsconfig.WithRegion(config.Region))
				}
				options = append(options, awsconfig.WithSharedConfigProfile(profile))
				ac, err := awsconfig.LoadDefaultConfig(context.Background(), options...)
				if err != nil {
					return nil, err
				}
				client := awscostexplorer.NewFromConfig(ac)
				clients = append(clients, client)
			}

			collector, err := s3.New(config.ScrapeInterval, clients)
			if err != nil {
				return nil, fmt.Errorf("error creating s3 collector: %w", err)
			}
			collectors = append(collectors, collector)```

@@ -171,24 +171,29 @@ func (c *Collector) Register(registry provider.Registry) error {
return nil
}

// Collect is the function that will be called by the Prometheus client anytime a scrape is performed.
// CollectMetrics Collect is the function that will be called by the Prometheus client anytime a scrape is performed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I have not tested this at length, but anecdotally it has improved the performance of collecting multiple profiles:

// CollectMetrics Collect is the function that will be called by the Prometheus client anytime a scrape is performed.
func (c *Collector) CollectMetrics(ch chan<- prometheus.Metric) float64 {
	c.m.Lock()
	defer c.m.Unlock()
	now := time.Now()
	// :fire: Checking scrape interval is to _mitigate_ expensive API calls to the cost explorer API

	if c.billingData == nil || now.After(c.nextScrape) {
		eg := new(errgroup.Group)
		outputsLock := sync.Mutex{}
		var billingOutputs []*awscostexplorer.GetCostAndUsageOutput
		for _, client := range c.clients {
			eg.Go(func() error {
				endDate := time.Now().AddDate(0, 0, -1)
				// Current assumption is that we're going to pull 30 days worth of billing data
				startDate := endDate.AddDate(0, 0, -30)
				billingData, err := getBillingData(client, startDate, endDate, c.metrics)
				if err != nil {
					return err
				}
				outputsLock.Lock()
				billingOutputs = append(billingOutputs, billingData...)
				outputsLock.Unlock()
				return nil
			})
		}

		err := eg.Wait()
		if err != nil {
			log.Printf("Error getting billing data: %v\n", err)
			return 0
		}
		c.nextScrape = time.Now().Add(c.interval)
		c.metrics.NextScrapeGauge.Set(float64(c.nextScrape.Unix()))
		c.billingData = parseBillingData(billingOutputs)
	}

	exportMetrics(c.billingData, c.metrics)
	return 1.0
}

I can't speak to the validity of the results, but running it this way seemed to speed things up....

Copy link

@inkel inkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some pending comments from last week. I'll do another review later on today.

Comment on lines +12 to +14

"github.com/grafana/cloudcost-exporter/pkg/aws/costexplorer"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit GoIsm: remove the empty lines (not the one in line 9, tho).

options = append(options, awsconfig.WithRegion(config.Region))
var clients []costexplorer.CostExplorer
if len(config.Profiles) == 0 {
return nil, fmt.Errorf("no profiles provided")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion) We probably want to test this, and in order to make it simpler, I'd suggest adding a package variable var ErrNoProfiles = errors.New("no profiles provided") (and BTW, better to use errors.New instead of fmt.Errorf for simple strings). By adding this error then you can use errors.Is in the test.

@Pokom
Copy link
Contributor Author

Pokom commented May 20, 2024

Thanks @inkel and @logyball for the reviews! Unfortunately this approach isn't going to work the way I had thought 😞 I have a strong preference not to create and maintain secrets to access multiple profiles, and the recommended route is to use roles and the stscreds Assume Role method. Yet Another Cloudwatch Exporter uses this technique to access multiple profiles from the same workload, but I still don't understand the underlying mechanism well enough.

As a temporary workaround, I've deployed an instance of cloudwatch-exporter to EKS clusters in each profile needed, and will follow pattern for the short term.

@Pokom Pokom closed this May 20, 2024
@Pokom Pokom deleted the fix/aws-s3-collect-multiple-profiles branch June 18, 2024 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS: Collect metrics from multiple profiles
3 participants