-
Notifications
You must be signed in to change notification settings - Fork 2
Search E2E Testing
The search and relevancy ranking engine is a powerful and flexible Go-based query system designed to handle structured and unstructured data seamlessly. It provides advanced capabilities for filtering, sorting, and scoring documents while offering granular control over query result structures and analytics. The engine supports:
Structured Searches:
Boolean logic (AND, OR, NOR, NOT) for combining complex conditions.
Flexible query types like Term, Filter, Match, Range, and GeoSpatial operations to target exact, partial, or proximity-based matches.
Nested queries to process hierarchical or array-based data structures.
Full-Text Search:
Advanced capabilities for tokenized text matching, including fuzzy search powered by Levenshtein edit distance for approximate matches.
Efficient string normalization and stemming to unify queries and document content into shared linguistic roots for better recall.
Geospatial Queries:
Geolocation-based searches like GeoDistance, GeoPolygon, GeoLine, and GeoMultiPolygon, enabling location-specific filtering and analysis through latitude/longitude operations.
Dynamic Relevancy Scoring:
Factor Boosting: Dynamically boost document scores based on numeric field values. For example, prioritize popular ads by applying logarithmic scaling on views.
Decay Scoring: Penalize older or less relevant matches using mathematical decay formulas based on time or distance.
Boolean Boosting: Reward documents meeting specific criteria, such as listings posted by verified sellers or promoted items.
Real-Time and Analytical Aggregations:
Multi-level grouping and analytics using GroupBy, RangeBuckets, and DateHistogram.
Statistical metrics (sum, avg, median, percentiles, std_dev) for decision-making insights.
Nested sub-aggregations for hierarchical data analysis.
TopHits for retrieving the highest-ranked documents within group buckets.
Key Highlights:
Scalability: Tailored for handling large datasets with precision and speed.
Flexibility: Supports recursive query structures, nested evaluations, and dynamic templates for unique use cases.
Precision: Implements sophisticated data parsing techniques for strings, numbers, dates, and geo-coordinates.
Extensibility: Designed to be modular and adaptable for diverse search scenarios.
The result is a robust search engine that enables developers to build highly customized, efficient, and relevant query and ranking systems, delivering strong performance across varied use cases (e.g., marketplace searches, user-specific recommendations, geospatial filtering, or high-volume data aggregation).
The index is specified by using the index field in a query
Limitation: Limited to indexes under same owner.
The partition is specified using the composite field in a query. An Index needs to be set-up before search can be used.
@todo: Setting up an index
All queries will have a top level bool case. The structure starts with bools and ends with bools.
{
"query": {
"bool": {
"all": [...]
}
}
}
Conditions
The conditions present to mix and match anyway you so choose are.
- all
- one
- none
- not
All: All conditions in this array must be true for the Bool to be true (AND logic).
{
"query": {
"bool": {
"all": [...]
}
}
}
One: At least one condition in this array must be true for the Bool to be true (OR logic).
{
"query": {
"bool": {
"one": [...]
}
}
}
None: All conditions in this array must be false for the Bool to be true (NOR logic).
{
"query": {
"bool": {
"none": [...]
}
}
}
not: @todo
{
"query": {
"bool": {
"not": [...]
}
}
}
The Match condition is used when you need to perform full or partial text searching on a field. Unlike a simple equality check, Match looks for patterns within the string. You'd use this to see if a document's title or description contains a certain keyword, starts with a specific prefix, or adheres to a complex regular expression. For example, finding all ads where the title contains the word "Luxury" or starts with "2024".
Advanced Text Analysis Pipeline used to preprocess both user search queries and document content to improve relevance and search recall.
Here is a breakdown of its key features and processing steps:
- The Analysis Pipeline (Analyze function)
The core function, Analyze, orchestrates a five-step process to transform raw text into a list of normalized, meaningful word stems (tokens):
Normalization (normalizeText): Removes possessive markers ('s, ') and converts hyphens (-) into spaces to ensure compound words are treated as separate tokens.
Tokenization (simpleTokenizer): Splits the cleaned text into individual words, using any sequence of non-word characters (including punctuation) as a delimiter.
Filtering (Lowercase & Stopwords): Converts all tokens to lowercase and removes common, low-value words defined in the englishStopwords list (e.g., "the," "a," "is").
Stemming (stemTokens): The core process that reduces words to their root or base form.
Cleanup (filterShortAndNumericTokens): Removes any resulting tokens that are one character long or consist purely of numbers, preventing accidental matching on artifacts of the analysis process.
- Advanced Stemmer (advancedRuleStem)
The stemmer implements a sophisticated rule-based approach, similar to the Porter stemming algorithm, to accurately find word roots:
Vowel-Consonant Context: It uses helper functions (isVowel, hasVowel) to ensure suffixes are only removed if a vowel remains in the resulting word stem. This prevents errors like reducing "bed" to "b."
Tense and Plural Removal (Step 1): Handles common English rules for plurals (-s, -ies, -sses) and tenses (-ed, -ing).
Double Consonant Handling: Normalizes words after tense removal (e.g., turns hopp into hop).
Complex Suffixes (Steps 2 & 3): Targets and removes longer grammatical suffixes (e.g., -ational, -tional, -alize, -able, -ence) to further unify related words.
In summary, this file ensures that a search query like "relational database performance" correctly matches a document containing phrases like "related database operations" by consistently reducing all those complex words to their shared linguistic roots.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"one": [
{
"matchPhrase": {
"field": "description",
"value": "clean miles",
"slop": 3
}
}
]
}
}
}
The Filter condition is the workhorse for comparing a field's value against a static standard, primarily using relational operators. It is designed to work efficiently with numeric and temporal data by parsing the field value as a number. This is the condition you use for standard inequality checks like finding all items where the views count is > (greater than) 500 or the rating is <= (less than or equal to) 4.5.
The Term condition is the simplest and most fundamental filter, used for exact-value matching and set membership checks. It treats all field values and comparison values as atomic strings, making it fast for lookup. You use Term to find documents where the status = "active" or where the category is in a list of values like ["vehicle", "sporting"]. Crucially, Term is also the condition that enables nested queries, as it checks if a local field value is in the set of results returned by an entirely separate subquery.
The Range condition provides a structured way to define boundaries for a continuous field, like price, date, or age. It allows you to specify a clear, flexible window using four distinct parameters: gte (greater than or equal to), gt (greater than), lte (less than or equal to), and lt (less than). You would use Range to find documents posted gte "2024-09-01" and lt "2024-10-01", defining a precise date window.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": [
{
"range": {
"field": "price",
"from": "0",
"to": "50000.00"
}
},
{
"range": {
"field": "datePosted",
"from": "2025-09-01",
"to": "2025-10-31"
}
}
]
},
"source": ["id", "title", "price", "datePosted"],
"sort": [
{ "field": "price", "order": "asc" },
{ "field": "postDate", "order": "asc" }
],
"limit": 10,
"offset": 0
}
}
Matches documents with the field missing.
{"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": [
{
"missing": {
"field": "reviews"
}
}
]
}
}
}
Matches documents that have the field.
{"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": [
{
"exists": {
"field": "reviews"
}
}
]
}
}
}
The GeoDistance condition is specialized for spatial filtering. It requires a reference point (a fixed latitude and longitude) and a search radius (the distance, e.g., "50km" or "10mi"). A document matches only if its geographical coordinates fall within that specified distance of the reference point. This is the tool for queries like, "Show me all ads within 5 miles of my current location."
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"limit": 10,
"offset": 0,
"bool": {
"all": [
{
"geoDistance": {
"field": "coordinates",
"lat": 42.3314,
"lon": -83.0458,
"distance": 1,
"unit": "km"
}
}
]
}
}
}
Example Response:
[
{
"category": "vehicle",
"coordinates": {
"lat": 42.3314,
"lon": -83.0458
},
"datePosted": "2024-09-01T10:00:00Z",
"description": "Excellent condition, always garaged. Must see!",
"id": "c43f3b74-a0d9-11f0-b8b5-befd7c762a0a",
"isPromoted": true,
"location": "detroit",
"price": 35000,
"rating": 4.8,
"seller": {
"isVerified": true,
"joinDate": "2022-01-15T00:00:00Z"
},
"status": "active",
"title": "2020 Ford Mustang GT - Low Miles",
"userId": "44e8b438-a061-70ca-faa7-3d3e93c015cb",
"views": 1200
}
]
{"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"bool": {
"all": [
{"geoPolygon": {
"field": "coordinates",
"polygon":
[
{ "lat": 40.730610, "lon": -73.935242 },
{ "lat": 40.740610, "lon": -73.935242 },
{ "lat": 40.740610, "lon": -73.925242 },
{ "lat": 40.730610, "lon": -73.925242 }
]
}
}]
}
}
}
Search within multiple geo polygons.
{"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"bool": {
"all": [
{"geoMultiPolygon": {
"field": "coordinates",
"polygons": [
[
{ "lat": 40.730610, "lon": -73.935242 },
{ "lat": 40.740610, "lon": -73.935242 },
{ "lat": 40.740610, "lon": -73.925242 },
{ "lat": 40.730610, "lon": -73.925242 }
],
[
{ "lat": 42.360081, "lon": -71.058884 },
{ "lat": 42.370081, "lon": -71.058884 },
{ "lat": 42.370081, "lon": -71.048884 },
{ "lat": 42.360081, "lon": -71.048884 }
]
]
}
}]
}
}
}
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"bool": {
"all": [
{
"geoLine": {
"field": "coordinates",
"line": [
{ "lat": 37.7749, "lon": -122.4194 },
{ "lat": 36.7783, "lon": -119.4179 },
{ "lat": 34.0522, "lon": -118.2437 }
],
"distance": 50.0,
"unit": "km"
}
}
]
}
}
}
A go template that evaluates to true or false can be used to provide your own custom script for matching documents. The context is the current document object.
{"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": [
{
"template": {
"code": "{{if gt .price 38000.00}}\\\"true\\\"{{end}}"
}
}
]
}
}
}
The scoring system involves two main phases to determine the final document rank, or _score:
Base Score Calculation: The initial _score is generated during the query execution phase by the match conditional logic. This is where primary keyword relevance and any static boost factors defined within the match clause are applied. The score is aggregated correctly through AND (sum) and OR (max) logic in the Bool.Evaluate function.
Dynamic Score Modification (Function Score): After the base score is set, users can apply custom logic via scoreModifiers in the query. These functions allow for complex, metadata-driven adjustments, such as:
Factor Boosting: Multiplying the score based on a document's property (e.g., boosting by the log of views).
Decay: Reducing the score based on a field's distance from an ideal value (e.g., decaying the score of old listings based on datePosted).
Conditional Bonuses: Adding a fixed score if a boolean condition is met (e.g., adding 0.5 if the seller is verified).
Technical Foundation
This dynamic boosting is powered by Go template expressions. To make the system work reliably, custom functions were registered to overcome the limitations of the Go template parser:
Utility Functions: Helpers like toFloat64 and toTime were implemented and registered to correctly convert document field data into usable types for mathematical operations.
Arithmetic Helpers: Because templates lack native arithmetic operators, custom functions like add and div were created. All mathematical boosting formulas must use this functional notation instead of standard infix operators.
The final _score, which includes the cumulative effect of all relevance and boost factors, is used to sort the documents, ensuring the most relevant and valuable results are presented first.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"one": [
{
"match": {
"field": "description",
"value": "mils",
"fuzziness": 1
}
},{
"match": {
"field": "title",
"value": "Pickup",
"fuzziness": 1,
"boost": 2
}
}
]
}
}
}
This function uses the document's views count to multiply the base relevance score, prioritizing more popular listings. We use log to prevent extremely high view counts from completely dominating the ranking.
Rule
- Factor: Multiply the base score by the natural logarithm of views.
- Effect: A document with 1,200 views gets a factor of ln(1200)≈7.09.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"bool": {
"one": [
{ "match": { "field": "title", "value": "Mustang GT" } }
]
},
"scoreModifiers": {
"combine": "multiply",
"functions": [
{
"type": "factor",
"field": "views",
"code": "{{ .views | toFloat64 | log }}",
"weight": 1.0
}
]
}
}
}
This function applies a decay to the score based on age, ensuring the "2020 Ford Mustang GT" (posted 2024-09-01) ranks lower than a similar new listing. We use the time difference in hours for decay.
Rule
- Decay: Documents older than 720 hours (30 days) get a strongly reduced score.
- Factor: The score is multiplied by a factor between 0.0 and 1.0.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"bool": {
"all": [
{ "match": { "field": "title", "value": "Mustang GT" } }
]
},
"scoreModifiers": {
"combine": "multiply",
"functions": [
{
"type": "decay",
"field": "datePosted",
"weight": 1.0,
"code": "{{ div 1.0 (add 1.0 (div (now.Sub (.datePosted | toTime)).Hours 1440.0)) }}"
}
]
}
}
}
This function adds a fixed bonus score if a specific condition is met, regardless of the base relevance score. This is often used for trusted sellers or promoted listings.
Rule
- Bonus: Add 0.5 to the base score if the seller is verified.
- Effect: Uses a simple conditional statement to output 0.5 or 0.0.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"bool": {
"all": [
{ "match": { "field": "title", "value": "Mustang GT" } }
]
},
"scoreModifiers": {
"combine": "sum",
"functions": [
{
"type": "factor",
"field": "seller.isVerified",
"weight": 1.0,
"code": "{{if .seller.isVerified}}0.5{{else}}0.0{{end}}"
}
]
}
}
}
A union query combines results across multiple partitions into a single result set.
The below query is a simple example that merely combines the same query 3 times. So 3 duplicate records are present in the result set.
Query:
{ "union":
{"queries": [
{
"index": "8c11a17b-94be-11f0-a161-1ea64a6f54e7",
"composite": {
"senderId": "cfcd0b3f-b18a-4d36-928f-108264b45986",
"recipientId": "d44be993-bbeb-453f-996b-a0ca72768679"
},
"bool": {
"one": [
{
"bool": {
"all": [
{
"term": {
"field": "id",
"value": "52214085-9531-11f0-8542-fe0add70a39d",
"modifiers": {
"operation": 0
}
}
}
]
}
}
]
}
},
{
"index": "8c11a17b-94be-11f0-a161-1ea64a6f54e7",
"composite": {
"senderId": "cfcd0b3f-b18a-4d36-928f-108264b45986",
"recipientId": "d44be993-bbeb-453f-996b-a0ca72768679"
},
"bool": {
"one": [
{
"bool": {
"all": [
{
"term": {
"field": "id",
"value": "52214085-9531-11f0-8542-fe0add70a39d",
"modifiers": {
"operation": 0
}
}
}
]
}
}
]
}
},
{
"index": "8c11a17b-94be-11f0-a161-1ea64a6f54e7",
"composite": {
"senderId": "cfcd0b3f-b18a-4d36-928f-108264b45986",
"recipientId": "d44be993-bbeb-453f-996b-a0ca72768679"
},
"bool": {
"one": [
{
"bool": {
"all": [
{
"term": {
"field": "id",
"value": "52214085-9531-11f0-8542-fe0add70a39d",
"modifiers": {
"operation": 0
}
}
}
]
}
}
]
}
}
]
}
}
Nested queries can be used to query against nested data like objects, and arrays. If the document has a match the resolution is true.
{"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": [
{
"nested": {
"path": "reviews",
"bool": {
"all": [
{
"filter": {
"field": "reviewRating",
"value": "3",
"modifiers": {
"operation": 0
}
}
}
]
}
}
}
]
}
}
}
example response:
[
{
"category": "Vehicle",
"coordinates": {
"lat": 42.3314,
"lon": -83.0458
},
"datePosted": "2024-09-01T10:00:00Z",
"description": "Excellent condition, always garaged. Must see!",
"id": "79378c64-a10a-11f0-b4bd-124dcbc6d096",
"isPromoted": true,
"location": "detroit",
"price": 35000,
"rating": 4.8,
"reviews": [
{
"feedback": "blah blah blah",
"reviewRating": 1
},
{
"feedback": "hello world",
"reviewRating": 3
},
{
"feedback": "ennie meenie miie moe",
"reviewRating": 2
}
],
"seller": {
"isVerified": true,
"joinDate": "2022-01-15T00:00:00Z"
},
"status": "active",
"title": "2020 Ford Mustang GT - Low Miles",
"userId": "44e8b438-a061-70ca-faa7-3d3e93c015cb",
"views": 1200
}
]
A subquery can be used with a IN or NOT IN operation. You specify the result fieldto use to match against the parent field.
{"query":
{
"index": "8c11a17b-94be-11f0-a161-1ea64a6f54e7",
"composite": {
"senderId": "cfcd0b3f-b18a-4d36-928f-108264b45986",
"recipientId": "d44be993-bbeb-453f-996b-a0ca72768679"
},
"bool": {
"one": [
{
"bool": {
"all": [
{
"term": {
"field": "id",
"modifiers": {
"operation": 9
},
"subquery": {
"resultField": "id",
"index": "8c11a17b-94be-11f0-a161-1ea64a6f54e7",
"composite": {
"senderId": "cfcd0b3f-b18a-4d36-928f-108264b45986",
"recipientId": "d44be993-bbeb-453f-996b-a0ca72768679"
},
"bool": {
"one": [
{
"bool": {
"all": [
{
"term": {
"field": "id",
"value": "52214085-9531-11f0-8542-fe0add70a39d",
"modifiers": {
"operation": 0
}
}
}
]
}
}
]
}
}
}
}
]
}
}
]
}
}
}
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"postFilter": {
"all": [
{
"filter": {
"field": "price",
"value": "25000"
}
}
]
}
}}
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"facetingAggs": {
"categories": {
"type": "terms",
"groupBy": ["category"]
},
"promotion_status": {
"type": "terms",
"groupBy": ["isPromoted"]
}
}
}}
response data:
{
"statusCode": 200,
"totalHits": 4,
"isAggregation": false,
"hits": [
...
],
"facetingResults": {
"categories": [
{
"key": "vehicle",
"count": 4
}
],
"promotion_status": [
{
"key": "true",
"count": 4
}
]
}
}
You can use an optional group by field(s) with metrics to calculate per group and get buckets back with the aggregated statistical data.
{"query":
{
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"aggs": {
"status_breakdown": {
"groupBy": ["status"],
"metrics": [
{
"type": "avg",
"fields": {
"price": "averagePriceByStatus"
}
}
],
"aggs": {
"name": "promotion_breakdown",
"groupBy": ["isPromoted"],
"metrics": [
{
"type": "sum",
"fields": {
"views": "totalViewsForPromotionType"
}
},
{
"type": "max",
"fields": {
"price": "highestPricedItem"
}
}
]
}}}
}
}
Example Response
{
"name": "",
"buckets": [
{
"key": "active",
"count": 3,
"metrics": {
"averagePriceByStatus": 25433.333333333332
},
"buckets": [
{
"key": "true",
"count": 2,
"metrics": {
"highestPricedItem": 38500,
"totalViewsForPromotionType": 5700
}
},
{
"key": "false",
"count": 1,
"metrics": {
"highestPricedItem": 2800,
"totalViewsForPromotionType": 1200
}
}
]
},
{
"key": "pending",
"count": 1,
"metrics": {
"averagePriceByStatus": 9500
},
"buckets": [
{
"key": "false",
"count": 1,
"metrics": {
"highestPricedItem": 9500,
"totalViewsForPromotionType": 3100
}
}
]
}
],
"PipelineMetrics": {}
}
You can define your buckets using ranges and have the data distributed between each bucket depending on which range it falls into.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"aggs": {
"name": "ProductAnalyticsByCategory",
"groupBy": ["category"],
"metrics": [
{
"type": "sum",
"fields": { "price": "totalSales" }
},
{
"type": "avg",
"fields": { "price": "averagePrice" }
},
{
"type": "median",
"fields": { "price": "priceMedian" }
},
{
"type": "percentile",
"percentile": 90.0,
"fields": { "price": "priceP90" }
},
{
"type": "std_dev",
"fields": { "price": "priceStdDev" }
}
],
"aggs": {
"name": "PriceDistribution",
"rangeBuckets": {
"price": [
{ "key": "lowCost", "from": 0.0, "to": 50.0 },
{ "key": "midRange", "from": 50.0, "to": 200.0 },
{ "key": "highEnd", "from": 200.0, "to": 10000.0 }
]
},
"metrics": [
{
"type": "count",
"fields": { "id": "itemCount" }
}
]
}
}
}
}
example response:
{
"name": "ProductAnalyticsByCategory",
"buckets": [
{
"key": "Vehicle",
"count": 3,
"metrics": {
"averagePrice": 16933.333333333332,
"priceMedian": 9500,
"priceP90": 32700,
"priceStdDev": 18975.334867488724,
"totalSales": 50800
},
"buckets": [
{
"key": "lowCost",
"count": 0,
"metrics": {
"itemCount": 0
}
},
{
"key": "midRange",
"count": 0,
"metrics": {
"itemCount": 0
}
},
{
"key": "highEnd",
"count": 2,
"metrics": {
"itemCount": 0
}
}
]
}
]
}
Supports building a date histogram.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"aggs": {
"name": "monthly_activity",
"dateHistogram": {
"field": "datePosted",
"interval": "month"
},
"metrics": [
{
"type": "avg",
"fields": {
"price": "average_monthly_price"
}
}
]
}
}
}
example response:
{
"name": "monthly_activity",
"buckets": [
{
"key": "2025-09",
"count": 2,
"metrics": {
"average_monthly_price": 24000
}
},
{
"key": "2025-10",
"count": 1,
"metrics": {
"average_monthly_price": 2800
}
}
]
}
Provides cardinality metric along with a host of other metrics like mean, median, mode, max, sum, standard deviation, etc.
Note: group by is not required using top level grouping.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": []
},
"aggs": {
"name": "global_metrics",
"metrics": [
{
"type": "cardinality",
"fields": {
"status": "distinct_statuses"
}
},
{
"type": "cardinality",
"fields": {
"location": "distinct_locations"
}
},
{
"type": "sum",
"fields": {
"views": "total_views"
}
}
]
}
}
}
example response:
{
"name": "global_metrics",
"buckets": [
{
"key": "",
"count": 3,
"metrics": {
"distinct_locations": 1,
"distinct_statuses": 2,
"total_views": 8800
}
}
]
}
Bucket the top hits of the aggregation.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": []
},
"aggs": {
"name": "ads_by_status",
"groupBy": ["status"],
"metrics": [
{
"type": "count",
"fields": {
"title": "ad_count"
}
}
],
"topHits": {
"size": 2,
"sort": [
{
"field": "views",
"order": "desc"
},
{
"field": "price",
"order": "asc"
}
],
"source": ["title", "price", "views"]
}
}
}
}
response example:
{
"name": "ads_by_status",
"buckets": [
{
"key": "active",
"count": 2,
"metrics": {
"ad_count": 0
},
"topHits": [
{
"price": 38500,
"title": "2024 Ford Bronco Sport",
"views": 4500
},
{
"price": 2800,
"title": "1998 Ford F-150 Pickup",
"views": 1200
}
]
},
{
"key": "pending",
"count": 1,
"metrics": {
"ad_count": 0
},
"topHits": [
{
"price": 9500,
"title": "2015 Chevy Malibu LT",
"views": 3100
}
]
}
]
}
Aggregate against documents nested in sub arrays.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"aggs": {
"name": "reviews",
"path": "reviews",
"subAggs": {
"ratings_breakdown": {
"groupBy": ["reviewRating"],
"metrics": [
{
"name": "total_reviews",
"type": "count",
"field": "reviewRating"
}
],
"aggs": {
"top_reviews": {
"topHits": {
"size": 1,
"source": ["reviewRating", "feedback"]
}
}
}
}
}
}
}
}
example response:
{
"name": "reviews",
"buckets": [
{
"key": "ratings_breakdown",
"count": 2,
"buckets": [
{
"key": "5",
"count": 1
},
{
"key": "4",
"count": 1
}
]
}
]
}
Write your own metrics scripts using go template syntax.
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"bool": {
"all": [
{
"term": {
"field": "status",
"value": "active"
}
}
]
},
"aggs": {
"name": "Category_Weighted_Value",
"groupBy": ["category"],
"metrics": [
{
"scripted": {
"name": "avg_weighted_value",
"reduceType": "avg",
"initialValue": 0.0,
"script": "{{ div (mul .price .rating) (sqrt .views) }}"
}
}
]
}
}
}
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "Vehicle"
},
"aggs": {
"groupby": ["category"],
"metrics": [
{"type": "avg", "path": "reviews", "fields": {"reviewRating": "avg_review_rating"}}
]
}
}
}
response example:
{
"name": "",
"buckets": [
{
"key": "Vehicle",
"count": 4,
"metrics": {
"avg_review_rating": 2
}
}
]
}
intra-bucket pipeline metric
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"aggs": {
"name": "stats_by_category",
"groupBy": ["category"],
"metrics": [
{
"type": "sum",
"fields": {
"price": "total_price"
},
"resultName": "total_price"
},
{
"type": "sum",
"fields": {
"views": "total_views"
},
"resultName": "total_views"
},
{
"type": "bucket_script",
"resultName": "value_per_view",
"script": "{{ div .total_price .total_views }}",
"bucketsPath": {
"total_price": "total_price",
"total_views": "total_views"
}
}
]
}
}
}
response example:
{
"name": "stats_by_category",
"buckets": [
{
"key": "vehicle",
"count": 4,
"metrics": {
"total_price": 100000,
"total_views": 4800,
"value_per_view": 20.833333333333332
}
}
]
}
{
"query": {
"index": "classified-ad-location-category",
"composite": {
"location": "detroit",
"category": "vehicle"
},
"aggs": {
"group_by_category": {
"groupBy": ["category"],
"metrics": [
{
"type": "avg",
"fields": { "price": "average_price"},
"resultName": "average_price"
},
{
"type": "sum",
"fields": { "views": "total_views" },
"resultName": "total_views"
},
{
"type": "bucket_script",
"bucketsPath": {
"views": "total_views",
"price": "average_price"
},
"script": "{{ div .views .price }}",
"resultName": "views_per_dollar_ratio"
}
]
},
"category_efficiency_stats": {
"type": "stats_bucket",
"path": "group_by_category>views_per_dollar_ratio"
}
}}
}
example response:
{
"statusCode": 200,
"totalHits": 0,
"isAggregation": true,
"aggregationResults": {
"group_by_category": {
"name": "",
"buckets": [
{
"key": "vehicle",
"count": 4,
"metrics": {
"average_price": 25000,
"total_views": 4800,
"views_per_dollar_ratio": 0.192
}
}
],
"pipelineMetrics": {
"category_efficiency_stats": {
"avg": 0.192,
"count": 1,
"max": 0.192,
"min": 0.192,
"sum": 0.192
}
}
}
}
}