Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud billing customer creation and reporting #5050

Merged
merged 30 commits into from
Jul 1, 2024
Merged

cloud billing customer creation and reporting #5050

merged 30 commits into from
Jul 1, 2024

Conversation

pjain1
Copy link
Member

@pjain1 pjain1 commented Jun 10, 2024

This includes - Creating a customer in external billing system on new org creation and assigning a billing plan to it. Plan metadata configured in the billing system should have quotas that will be fetched to be enforced as well reportable metric names. These metrics name will directly corresponds to the apis in the metrics project to fetch the corresponding usage. Usage reporter will use the apis to report usage to the billing system.

Next -

  • Enforce usage limits especially managed data bytes, will go into reconciler
  • Create metrics APIs in metrics project that will be used to fetch usage
  • APIs for UI to get list of public plans, current subscriptions for a customer and change plan for a customer
  • Rill cli cmds for the same
  • Handling of trail period and subscription end in web-admin, APIs for this, will need product input

@pjain1 pjain1 marked this pull request as draft June 10, 2024 20:42
admin/database/database.go Outdated Show resolved Hide resolved
admin/database/database.go Outdated Show resolved Hide resolved
@@ -238,6 +239,9 @@ type Organization struct {
QuotaSlotsTotal int `db:"quota_slots_total"`
QuotaSlotsPerDeployment int `db:"quota_slots_per_deployment"`
QuotaOutstandingInvites int `db:"quota_outstanding_invites"`
QuotaNumUsers int `db:"quota_num_users"`
QuotaManagedDataBytes int64 `db:"quota_managed_data_bytes"`
BillingCustomerID *string `db:"billing_customer_id"` // review: should this be a struct to store more metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with both approaches, but yeah if we need to add some billing provider-specific fields, it might be easier to just have a JSON payload called something like billing_properties.

Annotations map[string]string `db:"annotations"`
CreatedOn time.Time `db:"created_on"`
UpdatedOn time.Time `db:"updated_on"`
NextUsageReportingTime time.Time `db:"next_usage_reporting_time"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this UsageReportedOn? Then a null default value would be meaningful, and the caller can add the delta for next report time at query time. (Also using "on" for consistency with CreatedOn and UpdatedOn.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UsageReportedOn seems like the time on which last usage was reported but In the code this is used as the reporting start time for next reporting window. This window may be bigger than one delta as reporting may be delayed because of data unavailability or other number of reasons. Do you still think this makes sense, so I will make the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the time on which last usage was reported

Yes, so what I meant was that I think the code will be cleaner if it tracks the last reporting time. That leaves the logic of deciding a next time only up to the worker – whereas NextUsageReportingTime creates coupled logic between the creator of the project and the reporting worker.

admin/database/database.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
admin/billing/biller.go Outdated Show resolved Hide resolved
@@ -0,0 +1,247 @@
package worker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth going over this file once more with an aim of simplifying the code where possible. There are some places with pretty deep nesting and if/else scenarios.

Also, have you considered syncing events in bulk for all orgs/projecst in one query (going over all events in the bucket) instead of querying and uploading per org/project? It seems the Orb APIs would support that and could reduce the latency and number of queries by a lot.

Copy link
Member Author

@pjain1 pjain1 Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its pretty busy, not happy with it. Tried simplifying a bit by breaking logic into functions to handle org and then its projects. Also changed reporting logic to first collect usage for all projects in an org and reporting all at once. Two more places to look into -

  • Reporting usage to Orb for all orgs, not sure if we would hit issues with big payload
  • Usage availability is fetched for all projects of an org in one go from rill-cloud-metrics but fetching usage is still done per project per bucket per metric. Each project might have different buckets to report and metrics to report may change as per plan so creating a single api to fetch usage per plan can be hard to keep in sync.

What do you think, may be handling too many corner cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each project might have different buckets to report and metrics to report may change as per plan so creating a single api to fetch usage per plan can be hard to keep in sync.

My hope would be that we could simplify usage reporting a lot by not trying to do such a granular breakdown. It should also lead to fewer, more efficient queries to the DB. Unless Orb charges a lot of money for redundant metrics, it seems simpler to just ship all non-zero billable metrics to them and have the logic of which metrics to use implemented there.

Of course we would probably need to batch the uploads to Orb to not upload huge payloads.

I'm imagining a query like this on our side (pseudo-code):

SELECT instance_id, <date trunc> as event_time, metric_name, <agg> as value
FROM metrics
WHERE event_time > <from time> AND event_time < <to time>
GROUP BY ALL
ORDER BY 

Then in the worker, it just does a lookup and caches each instance_id, enriches the events with info about the instance's org, and does a bulk upload to Orb of e.g. every 1k events.

@pjain1 pjain1 marked this pull request as ready for review June 20, 2024 11:01
admin/billing/biller.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
admin/database/database.go Outdated Show resolved Hide resolved
@@ -0,0 +1,247 @@
package worker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each project might have different buckets to report and metrics to report may change as per plan so creating a single api to fetch usage per plan can be hard to keep in sync.

My hope would be that we could simplify usage reporting a lot by not trying to do such a granular breakdown. It should also lead to fewer, more efficient queries to the DB. Unless Orb charges a lot of money for redundant metrics, it seems simpler to just ship all non-zero billable metrics to them and have the logic of which metrics to use implemented there.

Of course we would probably need to batch the uploads to Orb to not upload huge payloads.

I'm imagining a query like this on our side (pseudo-code):

SELECT instance_id, <date trunc> as event_time, metric_name, <agg> as value
FROM metrics
WHERE event_time > <from time> AND event_time < <to time>
GROUP BY ALL
ORDER BY 

Then in the worker, it just does a lookup and caches each instance_id, enriches the events with info about the instance's org, and does a bulk upload to Orb of e.g. every 1k events.

admin/billing/biller.go Outdated Show resolved Hide resolved
admin/billing/biller.go Outdated Show resolved Hide resolved
admin/billing/biller.go Outdated Show resolved Hide resolved
admin/billing/orb.go Outdated Show resolved Hide resolved
Comment on lines +314 to +320
for i := 0; i < len(plans.Data); i++ {
billingPlan, err := getBillingPlanFromOrbPlan(&plans.Data[i])
if err != nil {
return nil, err
}
billingPlans = append(billingPlans, billingPlan)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe filter out those plans where Name == ""? In case we forget to set it in Orb, then could someone send a manual API call to upgrade to "" and then get access to some internal plan? Or is there something that protects against that?

Copy link
Member Author

@pjain1 pjain1 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, did not thought about it but it cannot happen currently, in the API where GetPlanByName is used, a check exists to disallow empty string. Also there is option to add plan in the cli cmd using biller plan id as well (not sure if we need that), so filtering would cause that to fail.

But to be more defensive I can return ErrNotFound from GetPlanByName if name is empty ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good with the additional defensive option

admin/metrics/client.go Outdated Show resolved Hide resolved
admin/worker/billing_reporter.go Outdated Show resolved Hide resolved
admin/worker/billing_reporter.go Outdated Show resolved Hide resolved
admin/database/database.go Outdated Show resolved Hide resolved
admin/worker/billing_reporter.go Outdated Show resolved Hide resolved
proto/rill/admin/v1/api.proto Outdated Show resolved Hide resolved
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check out the merge conflicts

proto/rill/admin/v1/api.proto Outdated Show resolved Hide resolved
Comment on lines 61 to 64
// DeleteOrganizationSubscription deletes the given subscription for the organization
rpc DeleteOrganizationBillingSubscription(DeleteOrganizationBillingSubscriptionRequest) returns (DeleteOrganizationBillingSubscriptionResponse) {
option (google.api.http) = {delete: "/v1/organizations/{org_name}/billing/subscriptions/{subscription_id}"};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we already discussed this – but why is it we need the ability to delete a subscription? Won't it put the org into a weird state? If you want to stop a paid subscription, shouldn't you just call UpdateOrganizationBillingPlan to change to a free plan?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my original intention was to allow for fix for issues that could arise from manual assignment, also don't see any free plan in the mocks, I just see Starter plan which has 30 days trial period and then its all paid. In that case as well if no subscprition then org should be hibernated, not sure but its a separate issue. Anyways removed.

Comment on lines 642 to 643
optional string plan_name = 2;
optional string biller_plan_id = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we decide on just one way of doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to using plan name only as its a more friendly option for cli but also enforces that all plans defined in Orb have an external plan id defined.

@pjain1 pjain1 merged commit 9c87336 into main Jul 1, 2024
7 checks passed
@pjain1 pjain1 deleted the cloud_billing branch July 1, 2024 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants