One of the exciting aspects of aggregations are how easily they are converted into charts and graphs. For the rest of the chapter, we are going to focus on various analytics that we can wring out of our toy dataset. We will also demonstrate the different types of charts aggregations can power.
The histogram
bucket is particularly useful. Histograms are essentially
bar charts, and if you’ve ever built a report or analytics dashboard, you
undoubtedly had a few bar charts in it.
The histogram works by specifying an interval. If we were histogram’ing sale prices, you might specify an interval of 20,000. This would create a new bucket every $20,000. Documents are then sorted into buckets.
For our dashboard, we want a bar chart of car sale prices, but we
also want to know the top selling make per price range. This is easily accomplished
using a terms
bucket nested inside the histogram
:
GET /cars/transactions/_search?search_type=count
{
"aggs":{
"price":{
"histogram":{
"field":"price", (1)
"interval":20000 (1)
},
"aggs":{
"make":{
"terms":{
"field":"make", (2)
"size":1
}
}
}
}
}
}
-
The
histogram
bucket requires two parameters: a numeric field, and an interval which defines the bucket size -
A
terms
bucket is nested inside each price range, which will show us the top make per price range
As you can see, our query is built around the "price" aggregation, which contains
a histogram
bucket. This bucket requires a numeric field to calculate
buckets on, and an interval size. The interval defines how "wide" each bucket
is. An interval of 20000 means we will have ranges [0-20000, 20000-40000, …]
Next, we define a nested bucket inside of the histogram. This is a terms
bucket
over the "make" field. There is also a new "size" parameter, which defines how
many terms we want to generate. A size of one means we only want the top make
for each price range (e.g. the make that has the highest doc count).
And here is the response (truncated):
{
...
"aggregations": {
"price": {
"buckets": [
{
"key": 0,
"doc_count": 3,
"make": {
"buckets": [
{
"key": "honda",
"doc_count": 1
}
]
}
},
{
"key": 20000,
"doc_count": 4,
"make": {
"buckets": [
{
"key": "ford",
"doc_count": 2
}
]
}
},
...
}
The response is fairly self-explanatory, but it should be noted that the
histogram keys correspond to the lower boundary of the interval. The key 0
means 0-20,000
, the key 20000
means 20,000-40,000
, etc.