You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 27, 2021. It is now read-only.
Aggregating percentiles is a hard problem. They show the point at which a certain percentage of observed values occur. For example, the 99th percentile is the value which is greater than 99% of the values. An aggregation of percentiles that manipulates the data (e.g. average) would be mathematically incorrect since we’d only be handling the percentile and not the complete set of values for the different time series.
You can however, aggregate “group by, min/max” to group the information and simplify graphs and alerts. For instance you could look at p75 latency, using aggregation max grouping by site. This would give you the maximum value for p75 latency in in the given site across all instances, though it doesn’t tell you how broad the situation is, since it could apply to just one or many instances, you got the maximum value across all the percentiles in each group.
Suggestion
Having first-class support for an aggregatable data structures other than doubles would go a long way to solving this problem. Some type of native histogram support would be an ideal choice, assuming we can ingest them in a sane way.
Problem
Aggregating percentiles is a hard problem. They show the point at which a certain percentage of observed values occur. For example, the 99th percentile is the value which is greater than 99% of the values. An aggregation of percentiles that manipulates the data (e.g. average) would be mathematically incorrect since we’d only be handling the percentile and not the complete set of values for the different time series.
You can however, aggregate “group by, min/max” to group the information and simplify graphs and alerts. For instance you could look at p75 latency, using aggregation max grouping by site. This would give you the maximum value for p75 latency in in the given site across all instances, though it doesn’t tell you how broad the situation is, since it could apply to just one or many instances, you got the maximum value across all the percentiles in each group.
Suggestion
Having first-class support for an aggregatable data structures other than doubles would go a long way to solving this problem. Some type of native histogram support would be an ideal choice, assuming we can ingest them in a sane way.