Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Add PostgresVectorStore Memory connector. #9324

Open
wants to merge 73 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
8778d5f
Add PostgresVectorStore Memory connector.
Oct 18, 2024
ddad99a
Add UpsertBatch, GetBatch, and DeleteBatch
Oct 18, 2024
5447815
Remove unused CreateMapping
Oct 18, 2024
7533f8c
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 18, 2024
9a4f836
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 21, 2024
68a000e
Add vector search to PostgresVectorStore
Oct 22, 2024
317f6af
create index on collection creation
Oct 23, 2024
f4f5ba2
Support Guid, test distance functions
Oct 23, 2024
2acf118
Format tests
Oct 23, 2024
5db2c59
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 23, 2024
f4b4dc5
Add service and kernel extensions
Oct 24, 2024
5c58400
Default to Euclidean distance if no distance function is specified
Oct 24, 2024
8ea21cd
Add Postgres sample to concepts
Oct 24, 2024
4dcd222
Add docs for setting configuration in samples\Concepts
Oct 24, 2024
74b3764
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 24, 2024
5c3e63f
Enforce dimension size in index creation
Oct 24, 2024
6d9f1fd
Create index for CreateTableIfNotExistsAsyc
Oct 24, 2024
b4266cc
Log warning when index not created due to dimensions
Oct 24, 2024
f86613a
Refactor and tests; make SqlBuilder internal
Oct 24, 2024
716b794
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 24, 2024
8d8283b
Remove old migration note
Oct 25, 2024
89027fc
Fix docstring
Oct 25, 2024
8f45d9c
Use parameter for tableName
Oct 25, 2024
48811bd
Fix support for DateTime, DateTimeOffset
Oct 25, 2024
1d6082d
Fix warnings in test
Oct 25, 2024
eb0a683
Remove kernel extensions, improve service extensions
Oct 25, 2024
a66d835
Make PostgresSqlCommandInfo internal
Oct 25, 2024
53f1009
Default to a Hnsw index
Oct 25, 2024
08ea55f
Default to cosine distance
Oct 25, 2024
319648b
Consistently use includeVectors
Oct 25, 2024
5b52bdc
Simplify AsyncEnumerable return
Oct 25, 2024
cd845ee
Pass properties instead of full definition
Oct 25, 2024
1d09a21
Throw instead of log for too high dimensionality
Oct 25, 2024
74e9757
Remove DefaultVectorSize
Oct 25, 2024
ad5628c
Remove unused using statements
Oct 25, 2024
dbf1aef
Remove VectorStore constructor that creates datsaource
Oct 25, 2024
a355bf7
Fix duplicate mapper call
Oct 25, 2024
e499a80
Fix docstring typo
Oct 25, 2024
c95e2b3
Comment clarifying that multiple keys should be previously validated
Oct 25, 2024
9d972b3
Refactor ExecuteNonQueryAsync calls to reduce code dupe
Oct 25, 2024
6eb3793
Forward Schema option.
Oct 25, 2024
ed59fed
Make PostgresVectorStoreDbClient internal
Oct 25, 2024
1749adb
Support more enumerable types
Oct 25, 2024
ea7b01c
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 25, 2024
86486d7
Refactor to support default + transactions
Oct 28, 2024
b9b4a44
Fix issue with converting readonly array on upsert
Oct 28, 2024
c53a8ee
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 28, 2024
97ef60a
Fix SLN merge error
Oct 28, 2024
81e1805
Improve error handling
Oct 28, 2024
a587260
Avoid CA1859 in test class
Oct 28, 2024
e8fe800
Account for ngpsql missing func in .net std 2.0
Oct 28, 2024
96c088e
Fix servicecollection tests
Oct 28, 2024
0fc76f6
Logic for dimension max moved and tested elsewhere
Oct 28, 2024
266310b
Remove unused using statement
Oct 28, 2024
08f110c
Remove logger from PostgresVectorStoreRecordCollection
Oct 29, 2024
26516c5
Merge branch 'main' into feature/postgres-vector-store-dotnet
lossyrob Oct 30, 2024
5b44a80
Use Flat instead of None index kind
Oct 30, 2024
b9b2487
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Oct 31, 2024
24577a0
Remove unnecessary overloads
Oct 31, 2024
60d6512
Change tests to be true to name
Oct 31, 2024
5a66a13
Remove reduntant key type based test
Oct 31, 2024
581b6ab
Remove unnecessary overloads
Oct 31, 2024
494a0d4
Better error handling for IAsyncEnumerable
Oct 31, 2024
5f19889
Default to Flat (no index) instead of Hnsw
Oct 31, 2024
62ac8eb
Add enumerable to record mapper test
Oct 31, 2024
364b592
Remove unused fixture properties
Oct 31, 2024
bf58cab
Test StoragePropertyName in sql builder tests
Oct 31, 2024
aa592de
Remove dynamic from integration test
Nov 1, 2024
9a3b216
Add test to read from manually inserted record
Nov 1, 2024
1ee09c1
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Nov 1, 2024
b037075
Formatting, spelling
Nov 1, 2024
29d91ba
Merge remote-tracking branch 'upstream/main' into feature/postgres-ve…
Nov 1, 2024
c2937e0
Fix test.
Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions dotnet/SK-dotnet.sln
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,8 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SemanticKernel.AotTests", "
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Connectors.Postgres.UnitTests", "src\Connectors\Connectors.Postgres.UnitTests\Connectors.Postgres.UnitTests.csproj", "{232E1153-6366-4175-A982-D66B30AAD610}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "Process.Utilities.UnitTests", "src\Experimental\Process.Utilities.UnitTests\Process.Utilities.UnitTests.csproj", "{DAC54048-A39A-4739-8307-EA5A291F2EA0}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -1195,6 +1197,7 @@ Global
{6ECFDF04-2237-4A85-B114-DAA34923E9E6} = {5D4C0700-BBB5-418F-A7B2-F392B9A18263}
{39EAB599-742F-417D-AF80-95F90376BB18} = {831DDCA2-7D2C-4C31-80DB-6BDB3E1F7AE0}
{232E1153-6366-4175-A982-D66B30AAD610} = {0247C2C9-86C3-45BA-8873-28B0948EDC0C}
{DAC54048-A39A-4739-8307-EA5A291F2EA0} = {0D8C6358-5DAA-4EA6-A924-C268A9A21BC9}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {FBDC56A3-86AD-4323-AA0F-201E59123B83}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.AzureOpenAI;
using Microsoft.SemanticKernel.Connectors.Postgres;
using Npgsql;

namespace Memory;

Expand Down Expand Up @@ -44,7 +45,7 @@ public async Task ExampleWithDIAsync()

// Initialize the Postgres docker container via the fixtures and register the Postgres VectorStore.
await PostgresFixture.ManualInitializeAsync();
kernelBuilder.AddPostgresVectorStore(ConnectionString);
kernelBuilder.Services.AddPostgresVectorStore(ConnectionString);

// Register the test output helper common processor with the DI container.
kernelBuilder.Services.AddSingleton<ITestOutputHelper>(this.Output);
Expand Down Expand Up @@ -73,7 +74,10 @@ public async Task ExampleWithoutDIAsync()

// Initialize the Postgres docker container via the fixtures and construct the Postgres VectorStore.
await PostgresFixture.ManualInitializeAsync();
var vectorStore = new PostgresVectorStore(ConnectionString);
var dataSourceBuilder = new NpgsqlDataSourceBuilder(ConnectionString);
dataSourceBuilder.UseVector();
await using var dataSource = dataSourceBuilder.Build();
var vectorStore = new PostgresVectorStore(dataSource);

// Create the common processor that works for any vector store.
var processor = new VectorStore_VectorSearch_MultiStore_Common(vectorStore, textEmbeddingGenerationService, this.Output);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,11 @@ internal interface IPostgresVectorStoreCollectionSqlBuilder
/// </summary>
/// <param name="schema">The schema of the table.</param>
/// <param name="tableName">The name of the table.</param>
/// <param name="vectorProperty">The vector property to create an index for.</param>
/// <param name="vectorColumnName">The name of the vector column.</param>
/// <param name="indexKind">The kind of index to create.</param>
/// <param name="distanceFunction">The distance function to use for the index.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildCreateVectorIndexCommand(string schema, string tableName, VectorStoreRecordVectorProperty vectorProperty);
PostgresSqlCommandInfo BuildCreateVectorIndexCommand(string schema, string tableName, string vectorColumnName, string indexKind, string distanceFunction);

/// <summary>
/// Builds a SQL command to drop a table in the Postgres vector store.
Expand Down Expand Up @@ -127,8 +129,8 @@ internal interface IPostgresVectorStoreCollectionSqlBuilder
/// <param name="vectorValue">The vector to match.</param>
/// <param name="filter">The filter conditions for the query.</param>
/// <param name="skip">The number of records to skip.</param>
/// <param name="withEmbeddings">Specifies whether to include embeddings in the result.</param>
/// <param name="includeVectors">Specifies whether to include vectors in the result.</param>
/// <param name="limit">The maximum number of records to return.</param>
/// <returns>The built SQL command info.</returns>
PostgresSqlCommandInfo BuildGetNearestMatchCommand(string schema, string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, VectorStoreRecordVectorProperty vectorProperty, Vector vectorValue, VectorSearchFilter? filter, int? skip, bool withEmbeddings, int limit);
PostgresSqlCommandInfo BuildGetNearestMatchCommand(string schema, string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, VectorStoreRecordVectorProperty vectorProperty, Vector vectorValue, VectorSearchFilter? filter, int? skip, bool includeVectors, int limit);
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,21 @@
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.VectorData;
using Npgsql;
using Pgvector;

namespace Microsoft.SemanticKernel.Connectors.Postgres;

/// <summary>
/// Internal interface for client managing postgres database operations.
/// </summary>
public interface IPostgresVectorStoreDbClient
internal interface IPostgresVectorStoreDbClient
{
/// <summary>
/// The <see cref="NpgsqlDataSource"/> used to connect to the database.
/// </summary>
public NpgsqlDataSource DataSource { get; }

/// <summary>
/// Check if a table exists.
/// </summary>
Expand All @@ -28,22 +34,14 @@ public interface IPostgresVectorStoreDbClient
/// <returns>A group of tables.</returns>
IAsyncEnumerable<string> GetTablesAsync(CancellationToken cancellationToken = default);
/// <summary>
/// Create a table.
/// Create a table. Also creates an index on vector columns if the table has vector properties defined.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="recordDefinition">The record definition of the table.</param>
/// <param name="properties">The properties of the record definition that define the table.</param>
/// <param name="ifNotExists">Specifies whether to include IF NOT EXISTS in the command.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
/// <returns></returns>
Task CreateTableAsync(string tableName, VectorStoreRecordDefinition recordDefinition, bool ifNotExists = true, CancellationToken cancellationToken = default);

/// <summary>
/// Create a vector index.
/// </summary>
/// <param name="tableName">The name assigned to a table of entries.</param>
/// <param name="vectorProperty">The vector property to create an index for.</param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests. The default is <see cref="CancellationToken.None"/>.</param>
Task CreateVectorIndexAsync(string tableName, VectorStoreRecordVectorProperty vectorProperty, CancellationToken cancellationToken = default);
Task CreateTableAsync(string tableName, IReadOnlyList<VectorStoreRecordProperty> properties, bool ifNotExists = true, CancellationToken cancellationToken = default);

/// <summary>
/// Drop a table.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// Copyright (c) Microsoft. All rights reserved.

using Microsoft.Extensions.VectorData;
using Npgsql;

namespace Microsoft.SemanticKernel.Connectors.Postgres;

Expand All @@ -14,10 +15,10 @@ public interface IPostgresVectorStoreRecordCollectionFactory
/// </summary>
/// <typeparam name="TKey">The data type of the record key.</typeparam>
/// <typeparam name="TRecord">The data model to use for adding, updating and retrieving data from storage.</typeparam>
/// <param name="client">The Postgres client.</param>
/// <param name="dataSource">The Postgres data source.</param>
/// <param name="name">The name of the collection to connect to.</param>
/// <param name="vectorStoreRecordDefinition">An optional record definition that defines the schema of the record type. If not present, attributes on <typeparamref name="TRecord"/> will be used.</param>
/// <returns>The new instance of <see cref="IVectorStoreRecordCollection{TKey, TRecord}"/>.</returns>
IVectorStoreRecordCollection<TKey, TRecord> CreateVectorStoreRecordCollection<TKey, TRecord>(IPostgresVectorStoreDbClient client, string name, VectorStoreRecordDefinition? vectorStoreRecordDefinition)
IVectorStoreRecordCollection<TKey, TRecord> CreateVectorStoreRecordCollection<TKey, TRecord>(NpgsqlDataSource dataSource, string name, VectorStoreRecordDefinition? vectorStoreRecordDefinition)
where TKey : notnull;
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

using System;
using System.Collections.Generic;
using Microsoft.Extensions.VectorData;

namespace Microsoft.SemanticKernel.Connectors.Postgres;

Expand Down Expand Up @@ -35,20 +36,29 @@ internal static class PostgresConstants
typeof(decimal),
typeof(decimal?),
typeof(string),
typeof(DateTime),
typeof(DateTime?),
typeof(DateTimeOffset),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgresVectorStoreRecordPropertyMapping does not have a DateTimeOffset mapping, but does have a DateTime mapping.

typeof(DateTimeOffset?),
typeof(Guid),
typeof(Guid?),
typeof(byte[]),
typeof(List<bool>),
typeof(List<short>),
typeof(List<int>),
typeof(List<long>),
typeof(List<float>),
typeof(List<double>),
typeof(List<decimal>),
typeof(List<string>),
typeof(List<DateTimeOffset>),
];

/// <summary>A <see cref="HashSet{T}"/> of types that enumerable data properties on the provided model may use as their element types.</summary>
public static readonly HashSet<Type> SupportedEnumerableDataElementTypes =
[
typeof(bool),
typeof(short),
typeof(int),
typeof(long),
typeof(float),
typeof(double),
typeof(decimal),
typeof(string),
typeof(DateTime),
typeof(DateTimeOffset),
typeof(Guid),
];

/// <summary>A <see cref="HashSet{T}"/> of types that vector properties on the provided model may have.</summary>
Expand All @@ -64,4 +74,15 @@ internal static class PostgresConstants
/// <summary>The name of the column that returns distance value in the database.</summary>
/// <remarks>It is used in the similarity search query. Must not conflict with model property.</remarks>
public const string DistanceColumnName = "sk_pg_distance";

/// <summary>The default index kind.</summary>
public const string DefaultIndexKind = IndexKind.Hnsw;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also OK to use Flat here as the default. When I was suggesting using Hnsw earlier, I didn't realise that postgres can also work without, and that's a totally acceptable default.


/// <summary>The default distance function.</summary>
public const string DefaultDistanceFunction = DistanceFunction.CosineDistance;

public static readonly Dictionary<string, int> IndexMaxDimensions = new()
{
{ IndexKind.Hnsw, 2000 },
};
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ internal sealed class PostgresGenericDataModelMapper<TKey> : IVectorStoreRecordM
/// <summary>
/// Initializes a new instance of the <see cref="PostgresGenericDataModelMapper{TKey}"/> class.
/// /// </summary>
/// <param name="propertyReader">A <see cref="VectorStoreRecordDefinition"/> that defines the schema of the data in the database.</param>
/// <param name="propertyReader"><see cref="VectorStoreRecordPropertyReader"/> with helpers for reading vector store model properties and their attributes.</param>
public PostgresGenericDataModelMapper(VectorStoreRecordPropertyReader propertyReader)
{
Verify.NotNull(propertyReader);

this._propertyReader = propertyReader;

// Validate property types.
this._propertyReader.VerifyDataProperties(PostgresConstants.SupportedDataTypes, supportEnumerable: false);
this._propertyReader.VerifyDataProperties(PostgresConstants.SupportedDataTypes, PostgresConstants.SupportedEnumerableDataElementTypes);
this._propertyReader.VerifyVectorProperties(PostgresConstants.SupportedVectorTypes);
}
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel)
Expand Down

This file was deleted.

Loading
Loading