Skip to content

Conversation

ajcvickers
Copy link
Contributor

@ajcvickers ajcvickers commented Sep 2, 2025

Again as discussed with Boris, this has been update to be a typed builder implementation for Atlas indexes.


Original:

As discussed with Boris, these are used for index building in EF Core. There isn't any strongly typed API for vector indexes in the driver yet, but, when there is, then these enums will be used there as well as being used by EF.

@ajcvickers ajcvickers requested a review from BorisDog September 2, 2025 13:41
@ajcvickers ajcvickers requested a review from a team as a code owner September 2, 2025 13:41
As discussed with Boris, these are used for index building in EF Core. There isn't any strongly typed API for vector indexes in the driver yet, but, when there is, then these enums will be used there as well as being used by EF.
@ajcvickers ajcvickers marked this pull request as draft September 5, 2025 15:14
@ajcvickers
Copy link
Contributor Author

@BorisDog Can you take a look at this and see if it is going in the right direction? If so, then I'll do the non-vector search indexes as well. (Put this on the wrong PR before.)

@ajcvickers ajcvickers changed the title CSHARP-5717: Introduce basic vector enum types CSHARP-5717: Typed builders for Atlas indexes Sep 8, 2025
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks fine. See comments.

public class CreateAtlasVectorIndexModel<TDocument> : CreateSearchIndexModel
{
private readonly RenderArgs<TDocument> _renderArgs
= new(BsonSerializer.LookupSerializer<TDocument>(), BsonSerializer.SerializerRegistry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentSerializer should come from the collection.DocumentSerializer and not from the registry.

That's why we have Render methods with RenderArgs so that the document serializer can be passed in later when it is known.

=> SearchIndexType.VectorSearch;

/// <inheritdoc/>
public override BsonDocument Definition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to see a property getter doing so much work. Usually properties simply return an existing value.

But... making this a method would be a breaking change.

{
get
{
if (base.Definition != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really happen right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. The idea here is that the type is immutable, and we don't want to re-create the document every time the property is accessed--as you mentioned elsewhere, ideally this should not be a property. So, the first time this is called, the definition stored in the base is null. It is then built and cached so it doesn't have to be built again if the property is accessed again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't cache the result of Render because if Render is called again with different RenderArgs the result could be different.

I think you should throw an exception if the Document property is accessed on this subclass. Probably the very existence of this property is dubious.

{ "type", BsonString.Create("vector") },
{ "path", Field.Render(_renderArgs).FieldName },
{ "numDimensions", BsonInt32.Create(Dimensions) },
{ "similarity", BsonString.Create(similarityValue) },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call BsonString.Create or BsonInt32.Create.

There are implicit conversions to simplify code like this.

}
}

base.Definition = new BsonDocument { { "fields", BsonArray.Create(fieldDocuments) } };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have made the fieldDocuments a BsonArray in the first place. No need for the intermediate List.

var vectorField = new BsonDocument
{
{ "type", BsonString.Create("vector") },
{ "path", Field.Render(_renderArgs).FieldName },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really possible to use a cached renderArgs.

This implies that this functionality should really be in a Render method that TAKES a renderArgs.

How to do that given that this is a property is a conundrum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would someone have done this correctly (that is, create the document with the correct serializer) before my changes? What happens if they do it wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rstam I've been pondering this some more. Given that we are building metadata for the server here, rather than user documents, shouldn't we be using a standard form of serialization always? In other words, what is the scenario where this metadata should be built with custom serializers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually use serializers to build commands, just for data.

{
get
{
if (base.Definition != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't cache the result of Render because if Render is called again with different RenderArgs the result could be different.

I think you should throw an exception if the Document property is accessed on this subclass. Probably the very existence of this property is dubious.

var vectorField = new BsonDocument
{
{ "type", BsonString.Create("vector") },
{ "path", Field.Render(_renderArgs).FieldName },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't usually use serializers to build commands, just for data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants