Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

clear min and max on every ExportView #1182

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ require (
golang.org/x/net v0.0.0-20190620200207-3b0461eec859
golang.org/x/sys v0.0.0-20190502145724-3ef323f4f1fd // indirect
golang.org/x/text v0.3.2 // indirect
google.golang.org/appengine v1.4.0 // indirect
google.golang.org/genproto v0.0.0-20190425155659-357c62f0e4bb // indirect
google.golang.org/grpc v1.20.1
)
Expand Down
1 change: 1 addition & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZi
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
google.golang.org/grpc v1.20.1 h1:Hz2g2wirWK7H0qIIhGIqRGTuMwTE8HEKFnDZZ7lm9NU=
google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v2 v2.2.2 h1:ZCJp+EgiOT7lHqUV2J862kp8Qj64Jo6az82+3Td9dZw=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
Expand Down
1 change: 1 addition & 0 deletions stats/view/worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) {
e.ExportView(viewData)
}
exportersMu.Unlock()
v.clearRows()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect sum/count and bucket count in Distribution. It will also affect count aggregation which is expected to be cumulative.

I understand the problem of min and max that you are trying to solve. Reseting just min and max might be the solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia Thanks for the review :)

So I don't think it's just the min/max. I'm not sure about the sum/count in Distribution just yet, but for the Count-Aggregation, it should probably also be reset and let the Exporter be responsible for the cumulativeness side of things.

The reason I say that is because if the server restarts for whatever reason, then the count is completely gone. Furthermore, if the Count is displayed on a timeseries graph, then a lot of these values will be falsely duplicated.

For example, imagine a server receives 10 requests per second..after 10 seconds, the graph will say there are 100 requests in the last second (which is wrong, there were only 10) and if the serve restarts you will see a sharp drop from 100 back to 10.

If we collect the "count per interval", then that should solve the issue.

Would also love to know more about which data should never be reset, even if the process restarts and the data gets "forced" to be reset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions and Counters have Start-time. Start-time lets backend detect the restart.
It also lets it calculate the rate correctly.
Example:

At time T0: Application starts 
T1: First request served. Start-time is set to T1
T2: 100 request served. 
T3: 200 request served. Export enabled.
T4: 300 request served. First export happens, Count = 300, Start-time = T1. Rate=300/(T4-T1)
T5: 400 request served.
T6: Application Restarts
T7: First request served after restart, Start-time= T7
T8: 100 request served. Export count = 100, Start-time = T7, Rate = 100/(T8-T7)

Resetting all metrics would be a behavior change and would have an adverse consequence on some backends.

Copy link
Author

@marwan-at-work marwan-at-work Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia awesome. However, from what I can see there's no way for a backend to determine that:

The CountData only has access to Value and no idea of start time: https://github.com/census-instrumentation/opencensus-go/blob/master/stats/view/aggregation_data.go#L42

In your example, the Exporter does not have access to T1 (does it?), it can have access to T0 by just putting func init() { t1 = time.Now() } I imagine but that's not accurate.

Furthermore, dividing Count by (Tnow-T1) will just make an even distribution across time, and not the real distribution. For example, if the first minute we got 300 requests, but the next minute we got 10 requests, it will not show that accurately no?

Finally, do you have an example of an exporter doing this time calculation?

Thanks again :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startTime per view is stored here

There are two interface for exporters to get the data

  1. ExportView(vd *ViewData)
    starttime for ViewData is set here
  2. ExportMetric(context.Context, []*metricdata.Metric)
    startTime for Metric is retrieved here
    and set here

Regarding even distribution - If backend has individual data points then it can correctly represent the rate for each time window. For example
T0: Count = 300
T1: Count = 400
T2: Count = 410
Rate for the window T1-T0 = (400-300)/(T1-T0)
Rate for the window T2-T1 = (410-400)/(T2-T1)

ViewData and Metric both have end time as well. So the backend can compute the rate accordingly.

I'll post a screenshot shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer

So in this case, min/max are restarted, count/sum/avg are diffed, and SumOfSquaredDev is an unsolvable bug. This means Exporters should probably be aware of those details to export the metrics accurately.

over

Would you prefer the above over view.DisableMonotonicity(true) configuration?

because, the later changes the meaning of aggregation as currently defined. It would be sort of a hack.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia sounds good, then feel free to review/merge this pr 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia there's a potential new issue: how can we ensure that ExportView does not get called if a stats.Record() was never called after the last interval?

AFAIK, in the StatsD model, you only report to the backend when there's data. But OpenCensus will infinitely call ExportData even if the code hasn't called stats.Record since the last ReportingPeriod...right?

Copy link
Author

@marwan-at-work marwan-at-work Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original solution, clearRows made sure that there are no rows, and therefore no new data is reported.

I'm not sure how I can replicate that or find a way to determine that no new data has been given. This is because if I take the diff from the current data minus the previous data, if the value is zero it can either mean that the there's no new data, or that by accident the current data is equal to the previous data.

Copy link

@AlexandreYang AlexandreYang Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Create Delta types for both Count [and Sum?] and DistributionData and implement them everywhere

@marwan-at-work @rghetia I'm not sure if that ^ option has been discarded or not. Wouldn't this solve all issues mentioned if implemented ?

}

func (w *worker) reportUsage(now time.Time) {
Expand Down