-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud billing customer creation and reporting #5050
Conversation
admin/database/database.go
Outdated
@@ -238,6 +239,9 @@ type Organization struct { | |||
QuotaSlotsTotal int `db:"quota_slots_total"` | |||
QuotaSlotsPerDeployment int `db:"quota_slots_per_deployment"` | |||
QuotaOutstandingInvites int `db:"quota_outstanding_invites"` | |||
QuotaNumUsers int `db:"quota_num_users"` | |||
QuotaManagedDataBytes int64 `db:"quota_managed_data_bytes"` | |||
BillingCustomerID *string `db:"billing_customer_id"` // review: should this be a struct to store more metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with both approaches, but yeah if we need to add some billing provider-specific fields, it might be easier to just have a JSON payload called something like billing_properties
.
admin/database/database.go
Outdated
Annotations map[string]string `db:"annotations"` | ||
CreatedOn time.Time `db:"created_on"` | ||
UpdatedOn time.Time `db:"updated_on"` | ||
NextUsageReportingTime time.Time `db:"next_usage_reporting_time"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this UsageReportedOn
? Then a null
default value would be meaningful, and the caller can add the delta for next report time at query time. (Also using "on" for consistency with CreatedOn
and UpdatedOn
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UsageReportedOn
seems like the time on which last usage was reported but In the code this is used as the reporting start time for next reporting window. This window may be bigger than one delta as reporting may be delayed because of data unavailability or other number of reasons. Do you still think this makes sense, so I will make the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like the time on which last usage was reported
Yes, so what I meant was that I think the code will be cleaner if it tracks the last reporting time. That leaves the logic of deciding a next time only up to the worker – whereas NextUsageReportingTime
creates coupled logic between the creator of the project and the reporting worker.
@@ -0,0 +1,247 @@ | |||
package worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth going over this file once more with an aim of simplifying the code where possible. There are some places with pretty deep nesting and if/else scenarios.
Also, have you considered syncing events in bulk for all orgs/projecst in one query (going over all events in the bucket) instead of querying and uploading per org/project? It seems the Orb APIs would support that and could reduce the latency and number of queries by a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah its pretty busy, not happy with it. Tried simplifying a bit by breaking logic into functions to handle org and then its projects. Also changed reporting logic to first collect usage for all projects in an org and reporting all at once. Two more places to look into -
- Reporting usage to Orb for all orgs, not sure if we would hit issues with big payload
- Usage availability is fetched for all projects of an org in one go from rill-cloud-metrics but fetching usage is still done per project per bucket per metric. Each project might have different buckets to report and metrics to report may change as per plan so creating a single api to fetch usage per plan can be hard to keep in sync.
What do you think, may be handling too many corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each project might have different buckets to report and metrics to report may change as per plan so creating a single api to fetch usage per plan can be hard to keep in sync.
My hope would be that we could simplify usage reporting a lot by not trying to do such a granular breakdown. It should also lead to fewer, more efficient queries to the DB. Unless Orb charges a lot of money for redundant metrics, it seems simpler to just ship all non-zero billable metrics to them and have the logic of which metrics to use implemented there.
Of course we would probably need to batch the uploads to Orb to not upload huge payloads.
I'm imagining a query like this on our side (pseudo-code):
SELECT instance_id, <date trunc> as event_time, metric_name, <agg> as value
FROM metrics
WHERE event_time > <from time> AND event_time < <to time>
GROUP BY ALL
ORDER BY
Then in the worker, it just does a lookup and caches each instance_id
, enriches the events with info about the instance's org, and does a bulk upload to Orb of e.g. every 1k events.
@@ -0,0 +1,247 @@ | |||
package worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each project might have different buckets to report and metrics to report may change as per plan so creating a single api to fetch usage per plan can be hard to keep in sync.
My hope would be that we could simplify usage reporting a lot by not trying to do such a granular breakdown. It should also lead to fewer, more efficient queries to the DB. Unless Orb charges a lot of money for redundant metrics, it seems simpler to just ship all non-zero billable metrics to them and have the logic of which metrics to use implemented there.
Of course we would probably need to batch the uploads to Orb to not upload huge payloads.
I'm imagining a query like this on our side (pseudo-code):
SELECT instance_id, <date trunc> as event_time, metric_name, <agg> as value
FROM metrics
WHERE event_time > <from time> AND event_time < <to time>
GROUP BY ALL
ORDER BY
Then in the worker, it just does a lookup and caches each instance_id
, enriches the events with info about the instance's org, and does a bulk upload to Orb of e.g. every 1k events.
for i := 0; i < len(plans.Data); i++ { | ||
billingPlan, err := getBillingPlanFromOrbPlan(&plans.Data[i]) | ||
if err != nil { | ||
return nil, err | ||
} | ||
billingPlans = append(billingPlans, billingPlan) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe filter out those plans where Name == ""
? In case we forget to set it in Orb, then could someone send a manual API call to upgrade to ""
and then get access to some internal plan? Or is there something that protects against that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, did not thought about it but it cannot happen currently, in the API where GetPlanByName
is used, a check exists to disallow empty string. Also there is option to add plan in the cli cmd using biller plan id as well (not sure if we need that), so filtering would cause that to fail.
But to be more defensive I can return ErrNotFound
from GetPlanByName
if name is empty ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good with the additional defensive option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check out the merge conflicts
proto/rill/admin/v1/api.proto
Outdated
// DeleteOrganizationSubscription deletes the given subscription for the organization | ||
rpc DeleteOrganizationBillingSubscription(DeleteOrganizationBillingSubscriptionRequest) returns (DeleteOrganizationBillingSubscriptionResponse) { | ||
option (google.api.http) = {delete: "/v1/organizations/{org_name}/billing/subscriptions/{subscription_id}"}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we already discussed this – but why is it we need the ability to delete a subscription? Won't it put the org into a weird state? If you want to stop a paid subscription, shouldn't you just call UpdateOrganizationBillingPlan
to change to a free plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah my original intention was to allow for fix for issues that could arise from manual assignment, also don't see any free plan in the mocks, I just see Starter plan which has 30 days trial period and then its all paid. In that case as well if no subscprition then org should be hibernated, not sure but its a separate issue. Anyways removed.
proto/rill/admin/v1/api.proto
Outdated
optional string plan_name = 2; | ||
optional string biller_plan_id = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we decide on just one way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to using plan name only as its a more friendly option for cli but also enforces that all plans defined in Orb have an external plan id defined.
This includes - Creating a customer in external billing system on new org creation and assigning a billing plan to it. Plan metadata configured in the billing system should have quotas that will be fetched to be enforced as well reportable metric names. These metrics name will directly corresponds to the apis in the metrics project to fetch the corresponding usage. Usage reporter will use the apis to report usage to the billing system.
Next -