feat: Adding multi modal support for PGVectorStore #207

dishaprakash · 2025-04-29T08:51:44Z

feat: Adding multi modal support for PGVectorStore

Add new image search APIs similarity_search_image() and asimilarity_search_image().
Add add_images() and aadd_images() endpoints to add images to vector store.
Add tests.

averikitsch · 2025-04-29T22:45:00Z

langchain_postgres/v2/async_vectorstore.py

+        gcs_uri = re.match("gs://(.*?)/(.*)", uri)
+        if gcs_uri:
+            bucket_name, object_name = gcs_uri.groups()
+            storage_client = storage.Client()


We may want to wrap this in a try except block to provide a more clear error or do you think the error is clear if they are not running in a Google Cloud environment or have set up credentials.

The other langchain packages don't have running integrations tests for 3P providers. We could mock this test or just test this functionality in our package downstream.

Currently this is the error
google.auth.exceptions.DefaultCredentialsError: Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/external/set-up-adc for more information.
I think this is pretty descriptive, let me know what you think.

The options for the tests are:

If we want we can try making the images publicly accessible on a GCP project (which claims that we would not need credentials to fetch it).

We could also store the image directly and skip testing the pathway of GCP.

Not test the add_images at all.

What do you suggest?

If GCS storage call can be easily mock, let's go ahead and do that. If it can't let's keep the test but skip it.

Currently, the mock solutions may need more debugging, I've removed the gcs uri from being tested, the other images being created locally are still under the test.
I will recreate the GCS path testing in our libraries.

pyproject.toml

langchain_postgres/v2/async_vectorstore.py

eyurtsev · 2025-05-10T01:31:02Z

langchain_postgres/v2/async_vectorstore.py

+
+        web_uri = re.match(r"^(https?://).*", uri)
+        if web_uri:
+            response = requests.get(uri, stream=True)


This is an SSRF attack

eyurtsev · 2025-05-10T01:31:56Z

langchain_postgres/v2/async_vectorstore.py

+
+    async def aadd_images(
+        self,
+        uris: list[str],


Accepting URIs without safe guards is an SSRF attack

My understanding is that SSRF attacks are generally dealt with by the application layer. Is that correct, or is it more of a framework responsibility?

The way I'd think about this is:

What is the likelihood that users would expose this method directly as a web-endpoint without any input validation (it's pretty high in this case)

Can anything surprising be done with this endpoint? (Yes, a malicious user could ask for the contents of /etc/passwd or some other file on the server or have it make a request to an internal network address.)

Given that this code is supposed to be optimized for production, there really isn't a reason to access the local file system.

eyurtsev · 2025-05-10T01:33:55Z

langchain_postgres/v2/async_vectorstore.py

 import uuid
 from typing import Any, Callable, Iterable, Optional, Sequence

 import numpy as np
+import requests
+from google.cloud import storage  # type: ignore


The dependency should be optional not required. So the import cannot appear in the global namespace.

The actual implementation should be against a key-value store interface not specifically google cloud storage. You can use the LangChain key-value store abstraction to support cloud storage.

I've removed the import from global namespace.

I'm not sure how I should use the key-value store in this case. Could you please point me to the right usage?

langchain_postgres/v2/async_vectorstore.py

averikitsch · 2025-07-02T22:30:30Z

At this time, we are not going to move forward with this implementation.

dishaprakash added 6 commits April 29, 2025 08:50

feat: Adding multi modal support for PGVectorStore

43f89a5

New poetry lock

97b387c

reformat fixes

3dc9cc5

format

b5bb4ff

add request

f9f5337

poetry lock

ffe8c7a

averikitsch reviewed Apr 29, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

dishaprakash and others added 13 commits April 30, 2025 19:38

Make GCS an extra dependency

f45e4de

Trial mock gcs url test

273a57b

Linter fix

3dfbad6

download gcs for testing

aaa1514

download gcs for testing

9b41ade

Remove GCS test

cc26044

remove gcs download

d68c75c

Fix

9efdac8

Upgrade poetry version in github actions workflow

92663b9

Upgrade poetry version in github actions workflow

5c35c6f

Fix test

a5399a4

Fix test

477a038

Merge branch 'main' into upstream_add

39dc8f1

dishaprakash marked this pull request as ready for review May 9, 2025 16:43

dishaprakash requested review from averikitsch and eyurtsev May 9, 2025 16:43

eyurtsev reviewed May 10, 2025

View reviewed changes

dishaprakash and others added 2 commits May 17, 2025 00:52

review changes

7532fdf

Merge branch 'main' into upstream_add

0287543

dishaprakash requested a review from eyurtsev May 22, 2025 09:34

averikitsch closed this Jul 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Adding multi modal support for PGVectorStore #207

feat: Adding multi modal support for PGVectorStore #207

Uh oh!

dishaprakash commented Apr 29, 2025

Uh oh!

averikitsch Apr 29, 2025

Uh oh!

averikitsch Apr 29, 2025

Uh oh!

dishaprakash Apr 30, 2025

Uh oh!

averikitsch May 6, 2025

Uh oh!

dishaprakash May 8, 2025

Uh oh!

Uh oh!

Uh oh!

eyurtsev May 10, 2025

Uh oh!

eyurtsev May 10, 2025

Uh oh!

dishaprakash May 17, 2025

Uh oh!

eyurtsev Jun 9, 2025

Uh oh!

eyurtsev May 10, 2025

Uh oh!

dishaprakash May 17, 2025

Uh oh!

Uh oh!

averikitsch commented Jul 2, 2025

Uh oh!

Uh oh!

feat: Adding multi modal support for PGVectorStore #207

feat: Adding multi modal support for PGVectorStore #207

Uh oh!

Conversation

dishaprakash commented Apr 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

averikitsch commented Jul 2, 2025

Uh oh!

Uh oh!