This repository has been archived by the owner on Jul 31, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 327
clear min and max on every ExportView #1182
Open
marwan-at-work
wants to merge
7
commits into
census-instrumentation:master
Choose a base branch
from
marwan-at-work:cr
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1380fae
clearRows on every ExportView
marwan-at-work 15a8df5
do not reset everything
marwan-at-work 2be56c0
reset on config
marwan-at-work 80c9ce6
Revert "reset on config"
marwan-at-work 878d62c
reset only min/max
marwan-at-work 7fae139
sort imports
marwan-at-work 92ea5dc
pr comments + add tests
marwan-at-work File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will affect sum/count and bucket count in Distribution. It will also affect count aggregation which is expected to be cumulative.
I understand the problem of min and max that you are trying to solve. Reseting just min and max might be the solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia Thanks for the review :)
So I don't think it's just the min/max. I'm not sure about the sum/count in Distribution just yet, but for the Count-Aggregation, it should probably also be reset and let the Exporter be responsible for the cumulativeness side of things.
The reason I say that is because if the server restarts for whatever reason, then the count is completely gone. Furthermore, if the Count is displayed on a timeseries graph, then a lot of these values will be falsely duplicated.
For example, imagine a server receives 10 requests per second..after 10 seconds, the graph will say there are 100 requests in the last second (which is wrong, there were only 10) and if the serve restarts you will see a sharp drop from 100 back to 10.
If we collect the "count per interval", then that should solve the issue.
Would also love to know more about which data should never be reset, even if the process restarts and the data gets "forced" to be reset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Distributions and Counters have Start-time. Start-time lets backend detect the restart.
It also lets it calculate the rate correctly.
Example:
Resetting all metrics would be a behavior change and would have an adverse consequence on some backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia awesome. However, from what I can see there's no way for a backend to determine that:
The CountData only has access to
Value
and no idea of start time: https://github.com/census-instrumentation/opencensus-go/blob/master/stats/view/aggregation_data.go#L42In your example, the Exporter does not have access to
T1
(does it?), it can have access toT0
by just puttingfunc init() { t1 = time.Now() }
I imagine but that's not accurate.Furthermore, dividing Count by (Tnow-T1) will just make an even distribution across time, and not the real distribution. For example, if the first minute we got 300 requests, but the next minute we got 10 requests, it will not show that accurately no?
Finally, do you have an example of an exporter doing this time calculation?
Thanks again :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startTime per view is stored here
There are two interface for exporters to get the data
starttime for ViewData is set here
startTime for Metric is retrieved here
and set here
Regarding even distribution - If backend has individual data points then it can correctly represent the rate for each time window. For example
T0: Count = 300
T1: Count = 400
T2: Count = 410
Rate for the window T1-T0 = (400-300)/(T1-T0)
Rate for the window T2-T1 = (410-400)/(T2-T1)
ViewData and Metric both have end time as well. So the backend can compute the rate accordingly.
I'll post a screenshot shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer
over
because, the later changes the meaning of aggregation as currently defined. It would be sort of a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia sounds good, then feel free to review/merge this pr 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rghetia there's a potential new issue: how can we ensure that ExportView does not get called if a
stats.Record()
was never called after the last interval?AFAIK, in the StatsD model, you only report to the backend when there's data. But OpenCensus will infinitely call ExportData even if the code hasn't called stats.Record since the last ReportingPeriod...right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original solution,
clearRows
made sure that there are no rows, and therefore no new data is reported.I'm not sure how I can replicate that or find a way to determine that no new data has been given. This is because if I take the diff from the current data minus the previous data, if the value is zero it can either mean that the there's no new data, or that by accident the current data is equal to the previous data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marwan-at-work @rghetia I'm not sure if that ^ option has been discarded or not. Wouldn't this solve all issues mentioned if implemented ?