Skip to content

Conversation

@ajcvickers
Copy link
Contributor

Note that I have not been able to test this against a server that fully supports auto-embedding. The version I have access to throws for a variety of cases, has no available embedding models, and index creation never completes.

Note that I have not been able to test this against a server that fully supports auto-embedding. The version I have access to throws for a variety of cases, has no available embedding models, and index creation never completes.
@ajcvickers ajcvickers requested review from BorisDog and Copilot January 6, 2026 14:04
@ajcvickers ajcvickers requested a review from a team as a code owner January 6, 2026 14:04
@ajcvickers ajcvickers added the feature Adds new user-facing functionality. label Jan 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for auto-embedding in vector search indexes, allowing MongoDB Atlas to automatically create vector embeddings from text fields using specified embedding models (e.g., "voyage-4"). The implementation includes new API constructors, query vector handling for text input, and comprehensive test coverage.

Key changes:

  • New VectorEmbeddingModality enum to specify the type of data being embedded
  • QueryVector now supports text strings for auto-embedding indexes
  • CreateVectorSearchIndexModel has new constructors for auto-embedding indexes with support for compression profiles and explicit compression settings

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
VectorEmbeddingModality.cs New enum defining modality types for auto-embedding (currently only Text)
QueryVector.cs Added string constructor and implicit operator for text-based queries
PipelineStageDefinitionBuilder.cs Modified to use "query" field for text vs "queryVector" for arrays
CreateVectorSearchIndexModel.cs Added auto-embedding constructors, compression profile support, and updated rendering logic
VectorSearchTests.cs Added test for auto-embedding vector search with new Movie class
AtlasSearchIndexManagmentTests.cs Added comprehensive tests for auto-embedding index creation with various options and renamed existing tests for consistency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@BorisDog BorisDog requested a review from adelinowona January 6, 2026 16:58
/// indexes, this is only used when specifying explicit field compression using <see cref="Quantization"/>.
/// </summary>
public int Dimensions { get; }
public int Dimensions { get; init; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these fields be called CompressionDimensions and CompressionQuantization similar to CompressionProfileName ?
I think we need to make it more clear that this translates to compression and compression.quantization fields.

Another thought: would wrapping this in Compression object with

CompressionProfile  :  Compression
{
     string  ProfileName;
}
CompressionQuantization : Compression
{
    int Dimensions;
    Quantization Quantization
}

be an overkill ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow up on this. I initially created these as separate things in the API, but then I realized when documenting them that they were the same, except one said "for auto-embedding" and the other "not for auto-embedding." As far as I can tell, Quantization and Dimensions have the same meaning either way, so duplicating these properties didn't seem helpful.

I can add to the documentation what structure they end up in the BSON in each case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this code for now, since compression is not included in this release.


/// <summary>
/// Type of automatic vector quantization for your vectors.
/// Type of automatic vector quantization for your vectors. At most one of <see cref="Quantization"/> and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that VectorQuantization is reusable across fields.quantization and fields.compression.quantization fields.
At this point looks like we need to add binary_rescore value to VectorQuantization?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't clear, I will follow up on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not doing this for now, since current release doesn't include compression. We can re-visit once that part is finalized.

else
{
vectorField.Add("quantization", BsonString.Create(Quantization.ToString()?.ToLower()));
vectorField.Add("type", "autoEmbed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we are hiding type, do we want to consider having
CreateVectorSearchIndexModel
and
CreateVectorSearchAutoEmbedIndexModel
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I'm going to assume all binary breaking changes are off the table, since otherwise this API would not be the compromise it is in the first place. (If binary changes are allowed then a) we should re-do the part of this API that is already checked in to clean it up, and b_) we should probably introduce a new subclass for vector indexes, which can have one or two subclasses for the kinds of vector indexes.)

Given that, we can create a new type which means that:

  • Much of the existing configuration for vector indexes will be duplicated.
  • When we add something new to vector indexes in the future, then it will likely need to be duplicated as well.

Or we can keep the same time which means that:

  • Some property combinations will (currently) not be valid.
  • It's potentially unclear which kind of vector index you are creating.

I opted for the second choice since, on balance, it seems a better long-term choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we didn't release typed index builders (CSHARP-5717) yet. So we got some flexibility here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied to wrong comment yesterday: Pushed a new design with subclasses for different vector index types. We will have to pull members down into the base type as they are also supported in auto-embed indexes.

{
Ensure.IsNotNull(bsonText, nameof(bsonText));

Vector = bsonText;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can cover these changes in QueryVectorTests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var vectorSearchOperator = new BsonDocument
{
{ "queryVector", queryVector.Vector },
{ queryData is BsonString ? "query" : "queryVector", queryData },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the spec, looks like model is not strictly required as of now, but might be in the future. Do we want to add it later or now?

Copy link
Contributor Author

@ajcvickers ajcvickers Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem useful to me--it becomes a "quiz API". That is, I know that you need to pass this as the model, because that is how you created the index, but I'm going to ask you to tell me anyway, and if you get it right, I will proceed, while if you get it wrong, I will stop. That is generally not a useful thing to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up privately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added model to options.

else
{
vectorField.Add("quantization", BsonString.Create(Quantization.ToString()?.ToLower()));
vectorField.Add("type", "autoEmbed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we didn't release typed index builders (CSHARP-5717) yet. So we got some flexibility here.

var vectorSearchOperator = new BsonDocument
{
{ "queryVector", queryVector.Vector },
{ queryData is BsonString ? "query" : "queryVector", queryData },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up privately.

…elease.

- Changing QueryVector to take string instead of BsonString
- Added model to vector query options
- Fixed lookup for index status on community server
- Added more query tests
- Changed query tests to run on small set of documents that can be queried in a reasonable amount of time.
@ajcvickers ajcvickers requested a review from BorisDog January 8, 2026 16:21
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments

@ajcvickers ajcvickers requested a review from BorisDog January 9, 2026 09:47
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some failing unittests.

}

/// <inheritdoc/>
internal override BsonDocument Render(RenderArgs<TDocument> renderArgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for CreateSearchIndexModel:
Should we introduce
BsonDocument Render(RenderArgs<TDocument> renderArgs) => Definition
for more uniform handling in CreateCreateIndexesOperation?

Also now that Render is internal, the exception in CreateSearchIndexModel.Definition is not accurate. Is it possible to override that property and throw there? I think making a property is acceptable as breaking change?

]);

_collection.SearchIndexes.CreateOne(new CreateAutoEmbeddingVectorSearchIndexModel<Movie>(
e => e.Plot, _autoEmbedIndexName, "voyage-4", filterFields: [e => e.Runtime, e => e.Year]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce a temporary expectation for a failure here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Adds new user-facing functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants