Skip to content

Conversation

jairad26
Copy link
Contributor

@jairad26 jairad26 commented Oct 16, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR adds support for using the embedding function defined in the schema on a collection during _embed. This lets auto embedding via schema occur on inserts and queries
  • New functionality
    • ...

Test plan

How are these changes tested?
added tests in stacked pr for e2e schema tests

  • [x ] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

jairad26 commented Oct 16, 2025

@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 16, 2025
@jairad26 jairad26 marked this pull request as ready for review October 16, 2025 18:54
Copy link
Contributor

propel-code-bot bot commented Oct 16, 2025

Enhance Schema Embedding Logic and Sparse Embedding Function Registration

This pull request introduces support for leveraging the embedding function defined in the schema on a collection during the _embed process. It allows schema-level embedding functions to be used automatically for dense vector generation during inserts and queries. The update also improves how sparse embedding functions can utilize the document source key and adds a decorator to facilitate custom sparse embedding function registration.

Key Changes

• Added support in CollectionCommon to use the schema's embedding function for dense embeddings, both for inserts and queries.
• Modified the _apply_sparse_embeddings_to_metadatas method in CollectionCommon to use the document value if the sparse embedding config is set to reference the document key.
• Added EMBEDDING_KEY and DOCUMENT_KEY handling in relevant sections of chromadb/api/models/CollectionCommon.py.
• Introduced register_sparse_embedding_function decorator to streamline registration of custom sparse embedding functions in chromadb/utils/embedding_functions/__init__.py.
• Minor signature update for sparse_known_embedding_functions with # type: ignore.
• Updated internal logic to prefer schema-defined embedding functions if set, falling back to client or configuration embedding functions.

Affected Areas

chromadb/api/models/CollectionCommon.py
chromadb/utils/embedding_functions/__init__.py

This summary was automatically generated by @propel-code-bot

@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from 357259b to e2d52d1 Compare October 16, 2025 18:56
@jairad26 jairad26 changed the base branch from main to graphite-base/5617 October 16, 2025 20:12
@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from e2d52d1 to bd0e70d Compare October 16, 2025 20:12
@jairad26 jairad26 changed the base branch from graphite-base/5617 to jai/hosted-splade-ef October 16, 2025 20:12
@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from bd0e70d to 7697fa1 Compare October 16, 2025 20:33
@jairad26 jairad26 force-pushed the jai/hosted-splade-ef branch from 20bcadf to 6515339 Compare October 16, 2025 20:33
@jairad26 jairad26 requested a review from sanketkedia October 16, 2025 23:03
@jairad26 jairad26 changed the base branch from jai/hosted-splade-ef to graphite-base/5617 October 16, 2025 23:35
@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from 7697fa1 to c92cdf5 Compare October 16, 2025 23:35
@jairad26 jairad26 changed the base branch from graphite-base/5617 to main October 16, 2025 23:35
@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from c92cdf5 to a02dcea Compare October 17, 2025 00:40
@jairad26 jairad26 enabled auto-merge (squash) October 17, 2025 01:13
@jairad26 jairad26 merged commit 37d20ea into main Oct 17, 2025
115 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants