Skip to content

Commit 5c3bff2

Browse files
Spamerczclaude
andcommitted
docs: CHANGELOG + README + Query doc updates for v2
- New CHANGELOG.md documenting the full v2 surface: test infrastructure, the 4 bug fixes, ~150 new constructor arguments across existing classes, ~25 new query/aggregation/score-function/ sort/option types, and BC-affecting rewrites (GeoDistance, Nested, WeightedAvg, TopHits, Filter agg, IpRange, Composite, Highlight, FilterCollection). - README features list rewritten to reflect v2 coverage. - doc/02-query-objects.md updated where breaking changes landed: GeoDistance now takes distance + validation_method + ignore_unmapped + boost; Nested takes score_mode/ignore_unmapped/inner_hits; Terms accepts TermsLookup. New sections for Knn, SparseVector, Semantic, TextExpansion, RuleQuery, WeightedTokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent df4e324 commit 5c3bff2

3 files changed

Lines changed: 291 additions & 12 deletions

File tree

CHANGELOG.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Changelog
2+
3+
## v2 — full DSL coverage
4+
5+
This branch brings every documented Elasticsearch DSL feature under typed PHP objects, fixes two queries that produced invalid DSL, and round-trips every feature against a real ES container via the new `AbstractElasticTestCase`.
6+
7+
### Test infrastructure
8+
9+
- New `tests/SpameriTests/ElasticQuery/AbstractElasticTestCase` base class with `createIndex($mapping)`, `indexDocument($body, $id, $refresh)`, `search($elasticQuery)`, `deleteIndex()`, and a generic `request($method, $path, $body)`. Tests that extend it shrink from ~70 lines of curl boilerplate to ~15.
10+
11+
### Bug fixes (BC-breaking)
12+
13+
| File | Previous | Fixed |
14+
| --- | --- | --- |
15+
| `Query/GeoDistance` | Emitted `{pin: {location: ...}}` (invalid DSL) and lacked the required `distance` argument. | Emits proper `geo_distance` envelope. Constructor now takes `distance` (required), plus `distance_type`, `validation_method`, `ignore_unmapped`, `boost`. |
16+
| `Query/Nested` | Wrapped inner query in `[$queryArray]` (extra list level — rejected by ES). | Inner query is now an object. Added `score_mode`, `ignore_unmapped`, `inner_hits`. |
17+
| `Query/PhrasePrefix` | `int $boost = 1` (inconsistent type). | `float $boost = 1.0`. |
18+
| `Options/GeoDistanceSort` | `ignore_unmapped` hard-coded to `true`. | Constructor arg `bool $ignoreUnmapped = true`. |
19+
20+
### New query types
21+
22+
- **Knn** — vector similarity (field, queryVector, k, numCandidates, similarity, filter, boost).
23+
- **SparseVector** — ELSER-style sparse vector query (inference_id+query or queryVector tokens).
24+
- **TextExpansion** — legacy ELSER form (model_id, model_text).
25+
- **Semantic** — queries a `semantic_text` field.
26+
- **RuleQuery** — Search Application query rules over an organic query.
27+
- **WeightedTokens** — token weights against a sparse_vector field.
28+
29+
### Existing queries — new constructor arguments
30+
31+
| Query | New args |
32+
| --- | --- |
33+
| `ElasticMatch` | `zero_terms_query`, `auto_generate_synonyms_phrase_query`, `lenient`, `prefix_length`, `max_expansions`, `fuzzy_transpositions`, `fuzzy_rewrite` |
34+
| `MultiMatch` | `tie_breaker`, `slop`, `prefix_length`, `max_expansions`, `lenient`, `zero_terms_query`, `auto_generate_synonyms_phrase_query`, `fuzzy_transpositions`, `fuzzy_rewrite` |
35+
| `MatchPhrase` | `zero_terms_query` |
36+
| `PhrasePrefix` | `analyzer`, `max_expansions`, `zero_terms_query` |
37+
| `MatchBoolPrefix` | `fuzziness`, `prefix_length`, `max_expansions`, `fuzzy_transpositions`, `fuzzy_rewrite` |
38+
| `QueryString` | `analyze_wildcard`, `auto_generate_synonyms_phrase_query`, `enable_position_increments`, `fuzziness`, `fuzzy_max_expansions`, `fuzzy_prefix_length`, `fuzzy_transpositions`, `lenient`, `max_determinized_states`, `minimum_should_match`, `quote_analyzer`, `phrase_slop`, `quote_field_suffix`, `rewrite`, `time_zone`, `type`, `tie_breaker` |
39+
| `SimpleQueryString` | `analyze_wildcard`, `auto_generate_synonyms_phrase_query`, `fuzzy_max_expansions`, `fuzzy_prefix_length`, `fuzzy_transpositions`, `lenient`, `minimum_should_match`, `quote_field_suffix` |
40+
| `CombinedFields` | `auto_generate_synonyms_phrase_query` |
41+
| `Term` | `case_insensitive` |
42+
| `Terms` | accepts `TermsLookup` for cross-document terms resolution |
43+
| `Range` | `gt`, `lt`, `format`, `relation` (new `Range\Relation` constants), `time_zone` |
44+
| `Exists` | `boost` |
45+
| `WildCard` | `case_insensitive`, `rewrite` |
46+
| `Prefix` | `rewrite` |
47+
| `Fuzzy` | `transpositions`, `rewrite` |
48+
| `Regexp` | `rewrite` |
49+
| `TermSet` | `boost` |
50+
| `HasChild` | `inner_hits` |
51+
| `HasParent` | `inner_hits` |
52+
| `Nested` | `score_mode`, `ignore_unmapped`, `inner_hits` |
53+
| `ParentId` | `boost` |
54+
| `GeoBoundingBox` | `validation_method`, `ignore_unmapped`, `boost` |
55+
| `GeoShape` | `indexed_shape` (new `IndexedShape` sub-object), `boost` |
56+
| `Shape` | `indexed_shape`, `boost` |
57+
| `MoreLikeThis` | `boost_terms`, `include`, `min_doc_freq`, `max_doc_freq`, `min_word_length`, `max_word_length`, `stop_words`, `analyzer`, `boost`, `fail_on_unsupported_field` |
58+
| `Percolate` | `documents` (multi-doc), `name`, `routing`, `preference`, `version` |
59+
60+
### New sub-objects
61+
62+
- `Query/TermsLookup``index`, `id`, `path`, `routing`.
63+
- `Query/Range/Relation` — constants: `INTERSECTS`, `CONTAINS`, `WITHIN`.
64+
- `Query/InnerHits``name`, `from`, `size`, `sort`, `_source`, `highlight`, `explain`, `script_fields`, `docvalue_fields`, `version`, `seq_no_primary_term`, `stored_fields`, `track_scores`.
65+
- `Query/IndexedShape``id`, `index`, `path`, `routing`.
66+
- `Script` (top-level) — reusable script value object (`source`, `lang`, `params`).
67+
68+
### Aggregations — new types
69+
70+
Bucket: `Filters` (named filters), `AutoDateHistogram`, `VariableWidthHistogram`, `CategorizeText` *(platinum license)*, `FrequentItemSets` *(platinum license)*, `IpPrefix`, `TimeSeries`.
71+
72+
Metric: `TopMetrics`, `GeoLine` *(gold license)*, `TTest`, `Rate`, `MatrixStats`.
73+
74+
Pipeline/sampler/ML: `RandomSampler`, `CumulativeCardinality`, `ExtendedStatsBucket`, `Inference`.
75+
76+
### Aggregations — new constructor arguments
77+
78+
| Agg | New args |
79+
| --- | --- |
80+
| `Min`/`Max`/`Avg`/`Sum`/`ValueCount`/`Stats` | `missing`, `script`, `format` |
81+
| `ExtendedStats` | `missing`, `script`, `format` (kept `sigma`) |
82+
| `Cardinality` | `script`, `missing`, `rehash` |
83+
| `MedianAbsoluteDeviation`/`StringStats` | `missing`, `script` |
84+
| `BoxPlot` | `missing`, `script`, `execution_hint` |
85+
| `Percentiles` | `tdigest`, `hdr`, `missing`, `script` |
86+
| `PercentileRanks` | `hdr`, `missing`, `script` |
87+
| `WeightedAvg` | **rewritten** — takes typed `WeightedAvgValue` for value/weight (each with `field`/`script`/`missing`), plus `format` |
88+
| `TopHits` | **rewritten**`from`, `sort`, `_source`, `highlight`, `explain`, `script_fields`, `docvalue_fields`, `version`, `seq_no_primary_term`, `stored_fields`, `track_scores` |
89+
| `Term` | `min_doc_count`, `shard_size`, `shard_min_doc_count`, `show_term_doc_count_error`, `script`, `collect_mode`, `execution_hint`, `value_type`, `format`; `include`/`exclude` accept arrays |
90+
| `MultiTerms` | `order`, `min_doc_count`, `shard_size`, `shard_min_doc_count`, `collect_mode`, `format` |
91+
| `RareTerms` | `include`, `exclude`, `missing` |
92+
| `SignificantTerms` | `shard_size`, `shard_min_doc_count`, `execution_hint`, `background_filter`, `heuristic` (with `HEURISTIC_*` constants) |
93+
| `SignificantText` | `shard_size`, `shard_min_doc_count`, `min_doc_count`, `background_filter`, `source_fields` |
94+
| `Range` | `script`, `missing`, `format` |
95+
| `DateRange` | `script`, `missing` |
96+
| `Histogram` | `min_doc_count`, `extended_bounds` (new `Histogram\Bounds`), `hard_bounds`, `offset`, `order`, `script`, `missing`, `keyed`, `format` |
97+
| `DateHistogram` | `extended_bounds`, `hard_bounds`, `keyed`, `order`, `script`, `missing` |
98+
| `IpRange` | **rewritten** — new `IpRange\IpRangeValue` with `mask` (CIDR) support |
99+
| `Filter` | **rewritten** — accepts any `LeafQueryInterface` directly |
100+
| `Composite` | typed sources: `Composite\TermsSource`, `Composite\HistogramSource`, `Composite\DateHistogramSource`, `Composite\GeotileGridSource`, each with `order`/`missing_bucket` |
101+
| `AdjacencyMatrix` | `separator`, accepts `LeafQueryInterface` for filters |
102+
| `GeoDistance` (agg) | `keyed`, `script`, `missing` |
103+
| `GeoHashGrid`/`GeoTileGrid` | `bounds` |
104+
| `DiversifiedSampler` | `execution_hint`, `script` |
105+
| `Missing` | `script` |
106+
107+
### Score functions
108+
109+
- New `FunctionScore/ScoreFunction/Decay/Gauss`, `Linear`, `Exp` with shared `AbstractDecay` parent (`field`, `origin`, `scale`, `offset`, `decay`, `multi_value_mode`).
110+
- New `FunctionScore/ScoreFunction/ScriptScore` (function variant — distinct from the `Query/ScriptScore` leaf).
111+
- `FunctionScore` gained `boost`, `boost_mode` (with `BOOST_MODE_*` constants), `max_boost`, `min_score`.
112+
113+
### Sort
114+
115+
- `Sort` gains `mode`, `nested` (new `NestedSort`), `numeric_type`, `unmapped_type`, `format`.
116+
- New `Options/ScriptSort` — script-based sort.
117+
- New `Options/NestedSort` — path/filter/max_children for nested sorting (recursive).
118+
119+
### Highlight — rewritten
120+
121+
- `Highlight/HighlightField` — per-field config (type, number_of_fragments, fragment_size, all boundary_*, encoder, force_source, fragmenter, highlight_query, matched_fields, no_match_size, order, phrase_limit, require_field_match, tags_schema, pre_tags, post_tags).
122+
- `Highlight/HighlightFieldCollection` — typed collection.
123+
- `Highlight` accepts either `HighlightFieldCollection` or simple `array<string>` of field names (BC). Adds all global options.
124+
125+
### Options — many new fields
126+
127+
| Field | Type |
128+
| --- | --- |
129+
| `_source` | new `Options\Source` (includes/excludes, or `false`) |
130+
| `track_total_hits` | `bool\|int` |
131+
| `track_scores` | `bool` |
132+
| `explain` | `bool` |
133+
| `terminate_after` | `int` |
134+
| `timeout` | `string` |
135+
| `search_after` | `array` |
136+
| `pit` | new `Options\Pit` |
137+
| `stored_fields` | `array` |
138+
| `docvalue_fields` | `array` |
139+
| `fields` | `array` |
140+
| `script_fields` | `array` |
141+
| `runtime_mappings` | `array` |
142+
| `seq_no_primary_term` | `bool` |
143+
| `indices_boost` | `array` |
144+
| `collapse` | new `Options\Collapse` |
145+
| `rescore` | `array<Options\Rescore>` |
146+
| `suggesters` | `array<Suggest\SuggesterInterface>` |
147+
| `profile` | `bool` |
148+
| `stats` | `array<string>` |
149+
| `ext` | `array` |
150+
151+
`ElasticQuery::toArray()` wires `collapse`, `rescore`, and `suggest` to the top-level request body.
152+
153+
### Filter container — bool expansion
154+
155+
`Filter/FilterCollection` previously exposed only `must()`. It now mirrors `Query/QueryCollection` with `must()`, `should()`, `mustNot()`, and `filter()` — the `bool` body emits all four arms.
156+
157+
### Suggesters
158+
159+
- `Options/Suggest/SuggesterInterface`
160+
- `Options/Suggest/TermSuggester`
161+
- `Options/Suggest/PhraseSuggester`
162+
- `Options/Suggest/CompletionSuggester`
163+
164+
### Response mapper
165+
166+
- `ResultMapper` now handles named buckets (string keys, e.g. from `Filters` agg) and composite-key buckets (array keys).
167+
- `Result/Aggregation/Bucket.from`/`to` accept `string` (e.g. for IP / date range buckets).
168+
169+
### CI / tests
170+
171+
- 218 tests, 3 skipped on basic license (geo_line, categorize_text).
172+
- ES 9.2.2 container in CI; `make tests` passes end-to-end against it.
173+
- The two pre-existing buggy tests for `GeoDistance` and `Nested` (which asserted invalid output) are now corrected and re-run as integration tests against ES.

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,16 @@ A PHP library that converts Elasticsearch query DSL into strongly-typed PHP obje
44

55
## Features
66

7-
- **Type-safe queries** - Full-text, term-level, compound, geo, and nested queries
8-
- **Aggregations** - Metric (min, max, avg) and bucket (terms, histogram, range, filter) aggregations
9-
- **Response mapping** - Automatic mapping of Elasticsearch responses to typed objects
10-
- **Index mapping** - Define index settings, analyzers, tokenizers, and filters
11-
- **Function scoring** - Custom scoring with field value factors, weights, and random scores
12-
- **Highlighting** - Search result highlighting support
13-
- **Pagination & sorting** - Options for size, offset, scroll, and geo-distance sorting
7+
- **Type-safe queries** — full-text, term-level, compound, geo, nested, joining, vector (knn / sparse_vector / semantic), span queries, and rule queries
8+
- **Aggregations** — metric (min, max, avg, stats, weighted_avg, top_hits, top_metrics, t_test, geo_line, …), bucket (terms, histogram, date_histogram, range, filter, filters, composite with typed sources, ip_prefix, time_series, …), pipeline (cumulative_*, bucket_*, normalize, serial_diff, inference, …)
9+
- **Function scoring** — field value factor, weight, random, decay (gauss / linear / exp), script_score; score_mode + boost_mode
10+
- **Sort** — field, geo-distance, script-based, with nested sort (filter / max_children / recursive)
11+
- **Highlight** — per-field config (type, fragment_size, boundary scanner, encoder, fragmenter, highlight_query, matched_fields, no_match_size, order, phrase_limit, …)
12+
- **Search options**`_source`, `track_total_hits`, `search_after`, `pit`, `collapse`, `rescore`, `suggest` (term / phrase / completion), `runtime_mappings`, `script_fields`, `docvalue_fields`, `stored_fields`, `terminate_after`, `timeout`, `profile`, `stats`, `ext`
13+
- **Response mapping** — automatic mapping of Elasticsearch responses (including composite/named buckets, IP/date range buckets) to typed objects
14+
- **Index mapping** — index settings, analyzers, tokenizers, filters
15+
16+
See [CHANGELOG.md](CHANGELOG.md) for the full list of types and arguments added in v2.
1417

1518
## Requirements
1619

doc/02-query-objects.md

Lines changed: 108 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -179,15 +179,27 @@ new \Spameri\ElasticQuery\Query\Term(
179179
```
180180

181181
##### Terms Query
182-
Match any of multiple exact values.
182+
Match any of multiple exact values, or fetch values from another document.
183183
- Class: `\Spameri\ElasticQuery\Query\Terms`
184184
- [Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html)
185185
- [Implementation](https://github.com/Spameri/ElasticQuery/blob/master/src/Query/Terms.php)
186186

187187
```php
188+
// Inline values
188189
new \Spameri\ElasticQuery\Query\Terms(
189190
field: 'category',
190-
values: ['books', 'movies', 'music'],
191+
query: ['books', 'movies', 'music'],
192+
);
193+
194+
// terms_lookup — values pulled from another document
195+
new \Spameri\ElasticQuery\Query\Terms(
196+
field: 'user_id',
197+
query: new \Spameri\ElasticQuery\Query\TermsLookup(
198+
index: 'users',
199+
id: '42',
200+
path: 'friends',
201+
routing: null,
202+
),
191203
);
192204
```
193205

@@ -418,12 +430,98 @@ Query nested objects with their own scope.
418430
- [Implementation](https://github.com/Spameri/ElasticQuery/blob/master/src/Query/Nested.php)
419431

420432
```php
421-
$nested = new \Spameri\ElasticQuery\Query\Nested(path: 'comments');
433+
$nested = new \Spameri\ElasticQuery\Query\Nested(
434+
path: 'comments',
435+
scoreMode: \Spameri\ElasticQuery\Query\Nested::SCORE_MODE_AVG, // optional
436+
ignoreUnmapped: false, // optional
437+
innerHits: new \Spameri\ElasticQuery\Query\InnerHits( // optional
438+
name: 'matched_comments',
439+
size: 5,
440+
),
441+
);
422442
$nested->getQuery()->must()->add(
423443
new \Spameri\ElasticQuery\Query\Term('comments.author', 'john')
424444
);
425-
$nested->getQuery()->must()->add(
426-
new \Spameri\ElasticQuery\Query\Range('comments.date', gte: '2024-01-01')
445+
```
446+
447+
##### Knn Query
448+
k-nearest neighbour vector similarity search.
449+
- Class: `\Spameri\ElasticQuery\Query\Knn`
450+
- [Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-knn-query.html)
451+
- [Implementation](https://github.com/Spameri/ElasticQuery/blob/master/src/Query/Knn.php)
452+
453+
```php
454+
new \Spameri\ElasticQuery\Query\Knn(
455+
field: 'vector',
456+
queryVector: [1.0, 2.0, 3.0],
457+
k: 5,
458+
numCandidates: 50,
459+
similarity: 0.7, // optional
460+
filter: new \Spameri\ElasticQuery\Query\Term('status', 'on'), // optional
461+
boost: 1.0,
462+
);
463+
```
464+
465+
##### SparseVector Query
466+
Sparse vector / ELSER-style query.
467+
- Class: `\Spameri\ElasticQuery\Query\SparseVector`
468+
- [Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-sparse-vector-query.html)
469+
470+
```php
471+
// Via inference endpoint
472+
new \Spameri\ElasticQuery\Query\SparseVector(
473+
field: 'tokens',
474+
inferenceId: '.elser_model_2',
475+
query: 'big cat',
476+
);
477+
478+
// Via pre-computed tokens
479+
new \Spameri\ElasticQuery\Query\SparseVector(
480+
field: 'tokens',
481+
queryVector: ['lion' => 0.5, 'tiger' => 0.7],
482+
);
483+
```
484+
485+
##### Semantic Query
486+
Query a `semantic_text` field.
487+
- Class: `\Spameri\ElasticQuery\Query\Semantic`
488+
489+
```php
490+
new \Spameri\ElasticQuery\Query\Semantic(field: 'inference_field', query: 'large cat');
491+
```
492+
493+
##### TextExpansion Query
494+
Legacy ELSER (`model_id`/`model_text`).
495+
- Class: `\Spameri\ElasticQuery\Query\TextExpansion`
496+
497+
```php
498+
new \Spameri\ElasticQuery\Query\TextExpansion(
499+
field: 'tokens',
500+
modelId: '.elser_model_2',
501+
modelText: 'big cat',
502+
);
503+
```
504+
505+
##### RuleQuery
506+
Apply Search Application query rules over an organic query.
507+
- Class: `\Spameri\ElasticQuery\Query\RuleQuery`
508+
509+
```php
510+
new \Spameri\ElasticQuery\Query\RuleQuery(
511+
organic: new \Spameri\ElasticQuery\Query\ElasticMatch('title', 'puggles'),
512+
rulesetIds: ['my-ruleset'],
513+
matchCriteria: ['query_string' => 'puggles'],
514+
);
515+
```
516+
517+
##### WeightedTokens Query
518+
Token weights against a sparse_vector field.
519+
- Class: `\Spameri\ElasticQuery\Query\WeightedTokens`
520+
521+
```php
522+
new \Spameri\ElasticQuery\Query\WeightedTokens(
523+
field: 'tokens',
524+
tokens: ['lion' => 0.5, 'tiger' => 0.7],
427525
);
428526
```
429527

@@ -438,6 +536,11 @@ new \Spameri\ElasticQuery\Query\GeoDistance(
438536
field: 'location',
439537
lat: 40.7128,
440538
lon: -74.0060,
539+
distance: '50km',
540+
distanceType: 'arc', // optional: 'arc' | 'plane'
541+
validationMethod: 'STRICT', // optional: 'STRICT' | 'COERCE' | 'IGNORE_MALFORMED'
542+
ignoreUnmapped: false, // optional
543+
boost: 1.0,
441544
);
442545
```
443546

0 commit comments

Comments
 (0)