-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
community: CrateDB: Vector Store #27710
community: CrateDB: Vector Store #27710
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
379ce72
to
513249a
Compare
libs/community/langchain_community/vectorstores/cratedb/base.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/vectorstores/cratedb/base.py
Outdated
Show resolved
Hide resolved
libs/community/langchain_community/vectorstores/cratedb/base.py
Outdated
Show resolved
Hide resolved
d13f281
to
46750b6
Compare
libs/community/langchain_community/vectorstores/cratedb/base.py
Outdated
Show resolved
Hide resolved
46750b6
to
ffda5c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
476d718
to
e6ccd80
Compare
libs/community/langchain_community/vectorstores/cratedb/base.py
Outdated
Show resolved
Hide resolved
7ff1319
to
1ee02dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
libs/community/langchain_community/vectorstores/cratedb/base.py
Outdated
Show resolved
Hide resolved
1ee02dd
to
16ca3be
Compare
Before, the adapter used CrateDB's built-in `_score` field for ranking. Now, it uses the dedicated `vector_similarity()` function to compute the similarity between two vectors.
We don't need anything on top of it, ie we don't need this function and instead should use value from CrateDB as is. Similarity is already in the (0,1] interval and dividing by math.sqrt(2) won't normalize it but return wrong result, for example 1 will become 0.714.
16ca3be
to
4b9310e
Compare
Dear @eyurtsev, may I humbly ask you if you could afford a few cycles to review our patches? Thanks in advance! With kind regards, |
Hey! This adds a net-new community integration or feature, which has been replaced by dedicated integration packages. I'll close this PR if it's ok with you, and would recommend reopening with just docs updates, as well as registering your package in Here's the guide, and if you have questions, feel free to leave them in the comments on those pages so others can see them! https://python.langchain.com/docs/contributing/how_to/integrations/ This will pair very nicely with the variety of integrations you're working on at the moment! Will leave this PR open to discuss and link back here when closing the other ones (to make sure we're discussing in one place) |
Hi Erick, thanks for your reply. So, we will conceive and publish a dedicated Python package
I think it would be coherent to also close this PR, and then discuss on behalf of a separate dedicated issue to accompany the genesis of I will open the other issue when it is time to start the discussion, i.e. when we have something to show that starts working. Do you agree with this approach? With kind regards, |
sounds like a great approach! |
…"provider" documentation (#28877) Hi Erick. Coming back from a previous attempt, we now made a separate package for the CrateDB adapter, called `langchain-cratedb`, as advised. Other than registering the package within `libs/packages.yml`, this patch includes a minimal amount of documentation to accompany the advent of this new package. Let us know about any mistakes we made, or changes you would like to see. Thanks, Andreas. ## About - **Description:** Register a new database adapter package, `langchain-cratedb`, providing traditional vector store, document loader, and chat message history features for a start. - **Addressed to:** @efriis, @eyurtsev - **References:** GH-27710 - **Preview:** [Providers » More » CrateDB](https://langchain-git-fork-crate-workbench-register-la-4bf945-langchain.vercel.app/docs/integrations/providers/cratedb/) ## Status - **PyPI:** https://pypi.org/project/langchain-cratedb/ - **GitHub:** https://github.com/crate/langchain-cratedb - **Documentation (CrateDB):** https://cratedb.com/docs/guide/integrate/langchain/ - **Documentation (LangChain):** _This PR._ ## Backlog? Is this applicable for this kind of patch? > - [ ] **Add tests and docs**: If you're adding a new integration, please include > 1. a test for the integration, preferably unit tests that do not rely on network access, > 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. ## Q&A 1. Notebooks that use the LangChain CrateDB adapter are currently at [CrateDB LangChain Examples](https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain), and the documentation refers to them. Because they are derived from very old blueprints coming from LangChain 0.0.x times, we guess they need a refresh before adding them to `docs/docs/integrations`. Is it applicable to merge this minimal package registration + documentation patch, which already includes valid code snippets in `cratedb.mdx`, and add corresponding notebooks on behalf of a subsequent patch later? 2. How would it work getting into the tabular list of _Integration Packages_ enumerated on the [documentation entrypoint page about Providers](https://python.langchain.com/docs/integrations/providers/)? /cc Please also review, @ckurze, @wierdvanderhaar, @kneth, @simonprickett, if you can find the time. Thanks!
About
Status
Sandbox
A little walkthrough how to exercise the software tests on your workstation.
Trivia
The CrateDB implementation is heavily based on PGVector's, with a few adjustments. Previous generalizations and improvements to PGVector have been submitted the other day already.
bulk_save_objects
method to improve insert performance #16244