-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CSHARP-5717: Typed builders for Atlas indexes #1769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
As discussed with Boris, these are used for index building in EF Core. There isn't any strongly typed API for vector indexes in the driver yet, but, when there is, then these enums will be used there as well as being used by EF.
3e80e4e
to
5c2e17e
Compare
@BorisDog Can you take a look at this and see if it is going in the right direction? If so, then I'll do the non-vector search indexes as well. (Put this on the wrong PR before.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks fine. See comments.
public class CreateAtlasVectorIndexModel<TDocument> : CreateSearchIndexModel | ||
{ | ||
private readonly RenderArgs<TDocument> _renderArgs | ||
= new(BsonSerializer.LookupSerializer<TDocument>(), BsonSerializer.SerializerRegistry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentSerializer
should come from the collection.DocumentSerializer
and not from the registry.
That's why we have Render
methods with RenderArgs
so that the document serializer can be passed in later when it is known.
=> SearchIndexType.VectorSearch; | ||
|
||
/// <inheritdoc/> | ||
public override BsonDocument Definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd to see a property getter doing so much work. Usually properties simply return an existing value.
But... making this a method would be a breaking change.
{ | ||
get | ||
{ | ||
if (base.Definition != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't really happen right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. The idea here is that the type is immutable, and we don't want to re-create the document every time the property is accessed--as you mentioned elsewhere, ideally this should not be a property. So, the first time this is called, the definition stored in the base is null. It is then built and cached so it doesn't have to be built again if the property is accessed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't cache the result of Render
because if Render
is called again with different RenderArgs
the result could be different.
I think you should throw an exception if the Document
property is accessed on this subclass. Probably the very existence of this property is dubious.
{ "type", BsonString.Create("vector") }, | ||
{ "path", Field.Render(_renderArgs).FieldName }, | ||
{ "numDimensions", BsonInt32.Create(Dimensions) }, | ||
{ "similarity", BsonString.Create(similarityValue) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call BsonString.Create
or BsonInt32.Create
.
There are implicit conversions to simplify code like this.
} | ||
} | ||
|
||
base.Definition = new BsonDocument { { "fields", BsonArray.Create(fieldDocuments) } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have made the fieldDocuments
a BsonArray
in the first place. No need for the intermediate List
.
var vectorField = new BsonDocument | ||
{ | ||
{ "type", BsonString.Create("vector") }, | ||
{ "path", Field.Render(_renderArgs).FieldName }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really possible to use a cached renderArgs
.
This implies that this functionality should really be in a Render
method that TAKES a renderArgs
.
How to do that given that this is a property is a conundrum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would someone have done this correctly (that is, create the document with the correct serializer) before my changes? What happens if they do it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstam I've been pondering this some more. Given that we are building metadata for the server here, rather than user documents, shouldn't we be using a standard form of serialization always? In other words, what is the scenario where this metadata should be built with custom serializers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't usually use serializers to build commands, just for data.
{ | ||
get | ||
{ | ||
if (base.Definition != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't cache the result of Render
because if Render
is called again with different RenderArgs
the result could be different.
I think you should throw an exception if the Document
property is accessed on this subclass. Probably the very existence of this property is dubious.
var vectorField = new BsonDocument | ||
{ | ||
{ "type", BsonString.Create("vector") }, | ||
{ "path", Field.Render(_renderArgs).FieldName }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't usually use serializers to build commands, just for data.
Again as discussed with Boris, this has been update to be a typed builder implementation for Atlas indexes.
Original:
As discussed with Boris, these are used for index building in EF Core. There isn't any strongly typed API for vector indexes in the driver yet, but, when there is, then these enums will be used there as well as being used by EF.