Skip to content

Commit b529ee1

Browse files
Merge pull request #1431 from redis/DOC-4544-php-vec-example
DOC-4544 added PHP vector query example
2 parents 284a3df + 70ee96e commit b529ee1

File tree

3 files changed

+274
-2
lines changed

3 files changed

+274
-2
lines changed

content/develop/clients/php/connect.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ categories:
1212
description: Connect your PHP application to a Redis database
1313
linkTitle: Connect
1414
title: Connect to the server
15-
weight: 2
15+
weight: 10
1616
---
1717

1818
## Basic connection

content/develop/clients/php/queryjson.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ categories:
1212
description: Learn how to use the Redis query engine with JSON
1313
linkTitle: Index and query JSON
1414
title: Example - Index and query JSON documents
15-
weight: 2
15+
weight: 20
1616
---
1717

1818
This example shows how to index and query Redis JSON data using `predis`.
Lines changed: 272 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
---
2+
categories:
3+
- docs
4+
- develop
5+
- stack
6+
- oss
7+
- rs
8+
- rc
9+
- oss
10+
- kubernetes
11+
- clients
12+
description: Learn how to index and query vector embeddings with Redis
13+
linkTitle: Index and query vectors
14+
title: Index and query vectors
15+
weight: 30
16+
---
17+
18+
[Redis Query Engine]({{< relref "/develop/interact/search-and-query" >}})
19+
lets you index vector fields in [hash]({{< relref "/develop/data-types/hashes" >}})
20+
or [JSON]({{< relref "/develop/data-types/json" >}}) objects (see the
21+
[Vectors]({{< relref "/develop/interact/search-and-query/advanced-concepts/vectors" >}})
22+
reference page for more information).
23+
Among other things, vector fields can store *text embeddings*, which are AI-generated vector
24+
representations of the semantic information in pieces of text. The
25+
[vector distance]({{< relref "/develop/interact/search-and-query/advanced-concepts/vectors#distance-metrics" >}})
26+
between two embeddings indicates how similar they are semantically. By comparing the
27+
similarity of an embedding generated from some query text with embeddings stored in hash
28+
or JSON fields, Redis can retrieve documents that closely match the query in terms
29+
of their meaning.
30+
31+
The example below uses the [HuggingFace](https://huggingface.co/) model
32+
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
33+
to generate the vector embeddings to store and index with Redis Query Engine.
34+
35+
## Initialize
36+
37+
You can use the [TransformersPHP](https://transformers.codewithkyrian.com/)
38+
library to create the vector embeddings. Install the library with the following
39+
command:
40+
41+
```bash
42+
composer require codewithkyrian/transformers
43+
```
44+
45+
## Import dependencies
46+
47+
Import the following classes and function in your source file:
48+
49+
```php
50+
<?php
51+
52+
require 'vendor/autoload.php';
53+
54+
// TransformersPHP
55+
use function Codewithkyrian\Transformers\Pipelines\pipeline;
56+
57+
// Redis client and query engine classes.
58+
use Predis\Client;
59+
use Predis\Command\Argument\Search\CreateArguments;
60+
use Predis\Command\Argument\Search\SearchArguments;
61+
use Predis\Command\Argument\Search\SchemaFields\TextField;
62+
use Predis\Command\Argument\Search\SchemaFields\TagField;
63+
use Predis\Command\Argument\Search\SchemaFields\VectorField;
64+
```
65+
66+
## Create a tokenizer instance
67+
68+
The code below shows how to use the
69+
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
70+
tokenizer to generate the embeddings. The vectors that represent the
71+
embeddings have 384 dimensions, regardless of the length of the input
72+
text. Here, the `pipeline()` call creates the `$extractor` function that
73+
generates embeddings from text:
74+
75+
```php
76+
$extractor = pipeline('embeddings', 'Xenova/all-MiniLM-L6-v2');
77+
```
78+
79+
## Create the index
80+
81+
Connect to Redis and delete any index previously created with the
82+
name `vector_idx`. (The
83+
[`ftdropindex()`]({{< relref "/commands/ft.dropindex" >}})
84+
call throws an exception if the index doesn't already exist, which is
85+
why you need the `try...catch` block.)
86+
87+
```php
88+
$client = new Predis\Client([
89+
'host' => 'localhost',
90+
'port' => 6379,
91+
]);
92+
93+
try {
94+
$client->ftdropindex("vector_idx");
95+
} catch (Exception $e){}
96+
```
97+
98+
Next, create the index.
99+
The schema in the example below includes three fields: the text content to index, a
100+
[tag]({{< relref "/develop/interact/search-and-query/advanced-concepts/tags" >}})
101+
field to represent the "genre" of the text, and the embedding vector generated from
102+
the original text content. The `embedding` field specifies
103+
[HNSW]({{< relref "/develop/interact/search-and-query/advanced-concepts/vectors#hnsw-index" >}})
104+
indexing, the
105+
[L2]({{< relref "/develop/interact/search-and-query/advanced-concepts/vectors#distance-metrics" >}})
106+
vector distance metric, `Float32` values to represent the vector's components,
107+
and 384 dimensions, as required by the `all-MiniLM-L6-v2` embedding model.
108+
109+
The `CreateArguments` parameter to [`ftcreate()`]({{< relref "/commands/ft.create" >}})
110+
specifies hash objects for storage and a prefix `doc:` that identifies the hash objects
111+
to index.
112+
113+
```php
114+
$schema = [
115+
new TextField("content"),
116+
new TagField("genre"),
117+
new VectorField(
118+
"embedding",
119+
"HNSW",
120+
[
121+
"TYPE", "FLOAT32",
122+
"DIM", 384,
123+
"DISTANCE_METRIC", "L2"
124+
]
125+
)
126+
];
127+
128+
$client->ftcreate("vector_idx", $schema,
129+
(new CreateArguments())
130+
->on('HASH')
131+
->prefix(["doc:"])
132+
);
133+
```
134+
135+
## Add data
136+
137+
You can now supply the data objects, which will be indexed automatically
138+
when you add them with [`hmset()`]({{< relref "/commands/hset" >}}), as long as
139+
you use the `doc:` prefix specified in the index definition.
140+
141+
Use the `$extractor()` function as shown below to create the embedding that
142+
represents the `content` field. Note that `$extractor()` can generate multiple
143+
embeddings from multiple strings parameters at once, so it returns an array of
144+
embedding vectors. Here, there is only one embedding in the returned array.
145+
The `normalize:` and `pooling:` named parameters relate to details
146+
of the embedding model (see the
147+
[`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
148+
page for more information).
149+
150+
To add an embedding as a field of a hash object, you must encode the
151+
vector array as a binary string. The built-in
152+
[`pack()`](https://www.php.net/manual/en/function.pack.php) function is a convenient
153+
way to do this in PHP, using the `g*` format specifier to denote a packed
154+
array of `float` values. Note that if you are using
155+
[JSON]({{< relref "/develop/data-types/json" >}})
156+
objects to store your documents instead of hashes, then you should store
157+
the `float` array directly without first converting it to a binary
158+
string.
159+
160+
```php
161+
$content = "That is a very happy person";
162+
$emb = $extractor($content, normalize: true, pooling: 'mean');
163+
164+
$client->hmset("doc:0",[
165+
"content" => $content,
166+
"genre" => "persons",
167+
"embedding" => pack('g*', ...$emb[0])
168+
]);
169+
170+
$content = "That is a happy dog";
171+
$emb = $extractor($content, normalize: true, pooling: 'mean');
172+
173+
$client->hmset("doc:1",[
174+
"content" => $content,
175+
"genre" => "pets",
176+
"embedding" => pack('g*', ...$emb[0])
177+
]);
178+
179+
$content = "Today is a sunny day";
180+
$emb = $extractor($content, normalize: true, pooling: 'mean');
181+
182+
$client->hmset("doc:2",[
183+
"content" => $content,
184+
"genre" => "weather",
185+
"embedding" => pack('g*', ...$emb[0])
186+
]);
187+
```
188+
189+
## Run a query
190+
191+
After you have created the index and added the data, you are ready to run a query.
192+
To do this, you must create another embedding vector from your chosen query
193+
text. Redis calculates the vector distance between the query vector and each
194+
embedding vector in the index as it runs the query. You can request the results to be
195+
sorted to rank them in order of ascending distance.
196+
197+
The code below creates the query embedding using the `$extractor()` function, as with
198+
the indexing, and passes it as a parameter when the query executes (see
199+
[Vector search]({{< relref "/develop/interact/search-and-query/query/vector-search" >}})
200+
for more information about using query parameters with embeddings).
201+
The query is a
202+
[K nearest neighbors (KNN)]({{< relref "/develop/interact/search-and-query/advanced-concepts/vectors#knn-vector-search" >}})
203+
search that sorts the results in order of vector distance from the query vector.
204+
205+
The results are returned as an array with the number of results in the
206+
first element. The remaining elements are alternating pairs with the
207+
key of the returned document (for example, `doc:0`) first, followed by an array containing
208+
the fields you requested (again as alternating key-value pairs).
209+
210+
```php
211+
$queryText = "That is a happy person";
212+
$queryEmb = $extractor($queryText, normalize: true, pooling: 'mean');
213+
214+
$result = $client->ftsearch(
215+
"vector_idx",
216+
'*=>[KNN 3 @embedding $vec AS vector_distance]',
217+
new SearchArguments()
218+
->addReturn(1, "vector_distance")
219+
->dialect("2")
220+
->params([
221+
"vec", pack('g*', ...$queryEmb[0])
222+
])
223+
->sortBy("vector_distance")
224+
);
225+
226+
$numResults = $result[0];
227+
echo "Number of results: $numResults" . PHP_EOL;
228+
// >>> Number of results: 3
229+
230+
for ($i = 1; $i < ($numResults * 2 + 1); $i += 2) {
231+
$key = $result[$i];
232+
echo "Key: $key" . PHP_EOL;
233+
$fields = $result[$i + 1];
234+
echo "Field: {$fields[0]}, Value: {$fields[1]}" . PHP_EOL;
235+
}
236+
// >>> Key: doc:0
237+
// >>> Field: vector_distance, Value: 3.76152896881
238+
// >>> Key: doc:1
239+
// >>> Field: vector_distance, Value: 18.6544265747
240+
// >>> Key: doc:2
241+
// >>> Field: vector_distance, Value: 44.6189727783
242+
```
243+
244+
Assuming you have added the code from the steps above to your source file,
245+
it is now ready to run, but note that it may take a while to complete when
246+
you run it for the first time (which happens because the tokenizer must download the
247+
`all-MiniLM-L6-v2` model data before it can
248+
generate the embeddings). When you run the code, it outputs the following result text:
249+
250+
```
251+
Number of results: 3
252+
Key: doc:0
253+
Field: vector_distance, Value: 3.76152896881
254+
Key: doc:1
255+
Field: vector_distance, Value: 18.6544265747
256+
Key: doc:2
257+
Field: vector_distance, Value: 44.6189727783
258+
```
259+
260+
Note that the results are ordered according to the value of the `distance`
261+
field, with the lowest distance indicating the greatest similarity to the query.
262+
As you would expect, the text *"That is a very happy person"* (from the `doc:0`
263+
document)
264+
is the result judged to be most similar in meaning to the query text
265+
*"That is a happy person"*.
266+
267+
## Learn more
268+
269+
See
270+
[Vector search]({{< relref "/develop/interact/search-and-query/query/vector-search" >}})
271+
for more information about the indexing options, distance metrics, and query format
272+
for vectors.

0 commit comments

Comments
 (0)