CSHARP-5763: Auto-embedding for vector search #1842

ajcvickers · 2026-01-06T14:04:38Z

Note that I have not been able to test this against a server that fully supports auto-embedding. The version I have access to throws for a variety of cases, has no available embedding models, and index creation never completes.

Copilot

Pull request overview

This PR adds support for auto-embedding in vector search indexes, allowing MongoDB Atlas to automatically create vector embeddings from text fields using specified embedding models (e.g., "voyage-4"). The implementation includes new API constructors, query vector handling for text input, and comprehensive test coverage.

Key changes:

New VectorEmbeddingModality enum to specify the type of data being embedded
QueryVector now supports text strings for auto-embedding indexes
CreateVectorSearchIndexModel has new constructors for auto-embedding indexes with support for compression profiles and explicit compression settings

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
VectorEmbeddingModality.cs	New enum defining modality types for auto-embedding (currently only Text)
QueryVector.cs	Added string constructor and implicit operator for text-based queries
PipelineStageDefinitionBuilder.cs	Modified to use "query" field for text vs "queryVector" for arrays
CreateVectorSearchIndexModel.cs	Added auto-embedding constructors, compression profile support, and updated rendering logic
VectorSearchTests.cs	Added test for auto-embedding vector search with new Movie class
AtlasSearchIndexManagmentTests.cs	Added comprehensive tests for auto-embedding index creation with various options and renamed existing tests for consistency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

tests/MongoDB.Driver.Tests/Search/VectorSearchTests.cs

src/MongoDB.Driver/QueryVector.cs

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

BorisDog · 2026-01-07T02:18:19Z

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

+    /// indexes, this is only used when specifying explicit field compression using <see cref="Quantization"/>.
    /// </summary>
-    public int Dimensions { get; }
+    public int Dimensions { get; init; }


Should these fields be called CompressionDimensions and CompressionQuantization similar to CompressionProfileName ?
I think we need to make it more clear that this translates to compression and compression.quantization fields.

Another thought: would wrapping this in Compression object with

CompressionProfile : Compression { string ProfileName; } CompressionQuantization : Compression { int Dimensions; Quantization Quantization }

be an overkill ?

I'll follow up on this. I initially created these as separate things in the API, but then I realized when documenting them that they were the same, except one said "for auto-embedding" and the other "not for auto-embedding." As far as I can tell, Quantization and Dimensions have the same meaning either way, so duplicating these properties didn't seem helpful.

I can add to the documentation what structure they end up in the BSON in each case.

Removing this code for now, since compression is not included in this release.

BorisDog · 2026-01-07T02:23:40Z

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs


    /// <summary>
-    /// Type of automatic vector quantization for your vectors.
+    /// Type of automatic vector quantization for your vectors. At most one of <see cref="Quantization"/> and


I hope that VectorQuantization is reusable across fields.quantization and fields.compression.quantization fields.
At this point looks like we need to add binary_rescore value to VectorQuantization?

Yes, this isn't clear, I will follow up on this.

Not doing this for now, since current release doesn't include compression. We can re-visit once that part is finalized.

BorisDog · 2026-01-07T02:29:25Z

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

+        else
        {
-            vectorField.Add("quantization", BsonString.Create(Quantization.ToString()?.ToLower()));
+            vectorField.Add("type", "autoEmbed");


So if we are hiding type, do we want to consider having
CreateVectorSearchIndexModel
and
CreateVectorSearchAutoEmbedIndexModel
?

First, I'm going to assume all binary breaking changes are off the table, since otherwise this API would not be the compromise it is in the first place. (If binary changes are allowed then a) we should re-do the part of this API that is already checked in to clean it up, and b_) we should probably introduce a new subclass for vector indexes, which can have one or two subclasses for the kinds of vector indexes.)

Given that, we can create a new type which means that:

Much of the existing configuration for vector indexes will be duplicated.

When we add something new to vector indexes in the future, then it will likely need to be duplicated as well.

Or we can keep the same time which means that:

Some property combinations will (currently) not be valid.

It's potentially unclear which kind of vector index you are creating.

I opted for the second choice since, on balance, it seems a better long-term choice.

I believe that we didn't release typed index builders (CSHARP-5717) yet. So we got some flexibility here.

Replied to wrong comment yesterday: Pushed a new design with subclasses for different vector index types. We will have to pull members down into the base type as they are also supported in auto-embed indexes.

src/MongoDB.Driver/QueryVector.cs

BorisDog · 2026-01-07T02:32:19Z

src/MongoDB.Driver/QueryVector.cs

+        {
+            Ensure.IsNotNull(bsonText, nameof(bsonText));
+
+            Vector = bsonText;


We can cover these changes in QueryVectorTests.

BorisDog · 2026-01-07T02:40:21Z

src/MongoDB.Driver/PipelineStageDefinitionBuilder.cs

                    var vectorSearchOperator = new BsonDocument
                    {
-                        { "queryVector", queryVector.Vector },
+                        { queryData is BsonString ? "query" : "queryVector", queryData },


Reading through the spec, looks like model is not strictly required as of now, but might be in the future. Do we want to add it later or now?

It doesn't seem useful to me--it becomes a "quiz API". That is, I know that you need to pass this as the model, because that is how you created the index, but I'm going to ask you to tell me anyway, and if you get it right, I will proceed, while if you get it wrong, I will stop. That is generally not a useful thing to do.

Followed up privately.

Added model to options.

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

BorisDog · 2026-01-07T20:06:00Z

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

+        else
        {
-            vectorField.Add("quantization", BsonString.Create(Quantization.ToString()?.ToLower()));
+            vectorField.Add("type", "autoEmbed");


I believe that we didn't release typed index builders (CSHARP-5717) yet. So we got some flexibility here.

BorisDog · 2026-01-07T20:06:21Z

src/MongoDB.Driver/PipelineStageDefinitionBuilder.cs

                    var vectorSearchOperator = new BsonDocument
                    {
-                        { "queryVector", queryVector.Vector },
+                        { queryData is BsonString ? "query" : "queryVector", queryData },


Followed up privately.

src/MongoDB.Driver/QueryVector.cs

…elease. - Changing QueryVector to take string instead of BsonString - Added model to vector query options - Fixed lookup for index status on community server - Added more query tests - Changed query tests to run on small set of documents that can be queried in a reasonable amount of time.

BorisDog

Few minor comments

tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs

src/MongoDB.Driver/CreateVectorSearchIndexModelBase.cs

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs

BorisDog

There are some failing unittests.

BorisDog · 2026-01-09T21:19:51Z

src/MongoDB.Driver/CreateAutoEmbeddingVectorSearchIndexModel.cs

+    }
+
+    /// <inheritdoc/>
+    internal override BsonDocument Render(RenderArgs<TDocument> renderArgs)


Suggestion for CreateSearchIndexModel:
Should we introduce
BsonDocument Render(RenderArgs<TDocument> renderArgs) => Definition
for more uniform handling in CreateCreateIndexesOperation?

Also now that Render is internal, the exception in CreateSearchIndexModel.Definition is not accurate. Is it possible to override that property and throw there? I think making a property is acceptable as breaking change?

BorisDog · 2026-01-09T21:25:00Z

tests/MongoDB.Driver.Tests/Search/AutoEmbedVectorSearchTests.cs

+        ]);
+
+        _collection.SearchIndexes.CreateOne(new CreateAutoEmbeddingVectorSearchIndexModel<Movie>(
+            e => e.Plot, _autoEmbedIndexName, "voyage-4", filterFields: [e => e.Runtime, e => e.Year]));


Should we introduce a temporary expectation for a failure here?

CSHARP-5763: Auto-embedding for vector search

07b4a96

Note that I have not been able to test this against a server that fully supports auto-embedding. The version I have access to throws for a variety of cases, has no available embedding models, and index creation never completes.

ajcvickers requested review from BorisDog and Copilot January 6, 2026 14:04

ajcvickers requested a review from a team as a code owner January 6, 2026 14:04

Copilot started reviewing on behalf of ajcvickers January 6, 2026 14:05 View session

ajcvickers added the feature Adds new user-facing functionality. label Jan 6, 2026

Copilot AI reviewed Jan 6, 2026

View reviewed changes

Fixes for Copilot feedback.

653a923

BorisDog requested a review from adelinowona January 6, 2026 16:58

BorisDog requested changes Jan 7, 2026

View reviewed changes

Updates based on community server testing.

92a1f02

BorisDog requested changes Jan 7, 2026

View reviewed changes

ajcvickers added 3 commits January 8, 2026 15:14

- Cleanup and testing of negative cases

e2e42d8

Create subclasses for different kinds of vector index.

baf4731

ajcvickers requested a review from BorisDog January 8, 2026 16:21

BorisDog requested changes Jan 8, 2026

View reviewed changes

tests/MongoDB.Driver.Tests/Search/AtlasSearchIndexManagmentTests.cs Outdated Show resolved Hide resolved

src/MongoDB.Driver/CreateVectorSearchIndexModelBase.cs Outdated Show resolved Hide resolved

src/MongoDB.Driver/CreateVectorSearchIndexModel.cs Outdated Show resolved Hide resolved

Review updates.

343199a

ajcvickers requested a review from BorisDog January 9, 2026 09:47

Updated ordering in test baseline.

df5e499

BorisDog requested changes Jan 9, 2026

View reviewed changes

CSHARP-5763: Auto-embedding for vector search #1842

Are you sure you want to change the base?

CSHARP-5763: Auto-embedding for vector search #1842

Uh oh!

Conversation

ajcvickers commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajcvickers Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BorisDog left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BorisDog left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

ajcvickers Jan 7, 2026 •

edited

Loading