Skip to content

Latest commit

 

History

History
173 lines (145 loc) · 6.06 KB

50_sorting_ordering.asciidoc

File metadata and controls

173 lines (145 loc) · 6.06 KB

Sorting multi-value buckets

Multi-value buckets — like the terms, histogram and date_histogram — dynamically produce many buckets. How does Elasticsearch decide what order these buckets are presented to the user?

By default, buckets are ordered by doc_count in descending order. This is a good default because often we want to find the documents that maximize some criteria: price, population, frequency.

But sometimes you’ll want to modify this sort order, and there are a few ways to do it depending on the bucket.

Intrinsic sorts

These sort modes are "intrinsic" to the bucket…​they operate on data that bucket generates such as doc_count. They share the same syntax but differ slightly depending on the bucket being used.

Let’s perform a terms aggregation but sort by doc_count ascending:

GET /cars/transactions/_search?search_type=count
{
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color",
              "order": {
                "_count" : "asc" (1)
              }
            }
        }
    }
}
  1. Using the _count keyword, we can sort by doc_count ascending

We introduce a "order" object into the aggregation, which allows us to sort on one of several values:

  • _count: Sort by document count. Works with terms, histogram, date_histogram

  • _term: Sort by the string value of a term alphabetically. Works only with terms

  • _key: Sort by the numeric value of each bucket’s key (conceptually similar to _term). Works only with histogram and date_histogram

Sorting by a metric

Often, you’ll find yourself wanting to sort based on a metric’s calculated value. For our car sales analytics dashboard, we may want to build a bar chart of sales by car color, but order the bars by the average price ascending.

We can do this by adding a metric to our bucket, then referencing that metric from the "order" parameter:

GET /cars/transactions/_search?search_type=count
{
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color",
              "order": {
                "avg_price" : "asc" (2)
              }
            },
            "aggs": {
                "avg_price": {
                    "avg": {"field": "price"} (1)
                }
            }
        }
    }
}
  1. The average price is calculated for each bucket

  2. Then the buckets are ordered by the calculated average in ascending order

This lets you over-ride the sort order with any metric, simply by referencing the name of the metric. Some metrics, however, emit multiple values. The extended_stats metric is a good example: it provides half a dozen individual metrics.

If you want to sort on a multi-value metric, you just need to use the dot-path to the metric of interest:

GET /cars/transactions/_search?search_type=count
{
    "aggs" : {
        "colors" : {
            "terms" : {
              "field" : "color",
              "order": {
                "stats.variance" : "asc" (1)
              }
            },
            "aggs": {
                "stats": {
                    "extended_stats": {"field": "price"}
                }
            }
        }
    }
}
  1. Using dot notation, we can sort on the metric we are interested in

In this example we are sorting on the variance of each bucket, so that colors with the least variance in price will appear before those that have more variance.

Sorting based on "deep" metrics

In the prior examples, the metric was a direct child of the bucket. An average price was calculated for each term. It is possible to sort on "deeper" metrics, which are grandchildren or great-grandchildren of the bucket…​with some limitations.

You can define a path to a deeper, nested metric using angle brackets (>), like so: my_bucket>another_bucket>metric

The caveat is that each nested bucket in the path must be a "single value" bucket. A filter bucket produces a single bucket: all documents which match the filtering criteria. Multi-valued buckets (such as terms) generate many dynamic buckets, which makes it impossible to specify a deterministic path.

Currently there are only two single-value buckets: filter and global. As a quick example, let’s build a histogram of car prices, but order the buckets by the variance in price of red and green (but not blue) cars in each price range.

GET /cars/transactions/_search?search_type=count
{
    "aggs" : {
        "colors" : {
            "histogram" : {
              "field" : "price",
              "interval": 20000,
              "order": {
                "red_green_cars>stats.variance" : "asc" (1)
              }
            },
            "aggs": {
                "red_green_cars": {
                    "filter": { "terms": {"color": ["red", "green"]}}, (2)
                    "aggs": {
                        "stats": {"extended_stats": {"field" : "price"}} (3)
                    }
                }
            }
        }
    }
}
  1. Sort the buckets generated by the histogram according to the variance of a nested metric

  2. Because we are using a single-value filter, we can use nested sorting

  3. Sort on the stats generated by this metric

In this example, you can see that we are accessing a nested metric. The stats metric is a child of red_green_cars, which is in turn a child of colors. To sort on that metric, we define the path as "red_green_cars>stats.variance". This is allowed because the filter bucket is a single-valued bucket.