Skip to content

Conversation

jairad26
Copy link
Contributor

@jairad26 jairad26 commented Oct 15, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR adds e2e testing for schema, including metadata key discovery, embedding via schema, using sparse, dense vecs & efs in queries, metadata filtering with disabled indexes fails, etc.
  • New functionality
    • ...

Test plan

How are these changes tested?

  • [ x] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

jairad26 commented Oct 15, 2025

@jairad26 jairad26 changed the base branch from main to graphite-base/5616 October 16, 2025 00:02
@jairad26 jairad26 force-pushed the jai/schema-e2e-tests branch from bbfbed8 to 5c434c5 Compare October 16, 2025 00:02
@jairad26 jairad26 changed the base branch from graphite-base/5616 to jai/update-embed-logic October 16, 2025 00:03
@blacksmith-sh blacksmith-sh bot deleted a comment from jairad26 Oct 16, 2025
@jairad26 jairad26 force-pushed the jai/schema-e2e-tests branch 2 times, most recently from bc6c705 to 93a1561 Compare October 16, 2025 18:56
@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from 357259b to e2d52d1 Compare October 16, 2025 18:56
@jairad26 jairad26 marked this pull request as ready for review October 16, 2025 18:57
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 16, 2025

Add Comprehensive End-to-End Schema Tests

This PR introduces a suite of end-to-end (e2e) tests for schema-related functionality in ChromaDB, focusing on the full lifecycle and behavior of schema-driven metadata keys, embeddings (dense and sparse), index configuration, discoverability, and schema persistence. It provides broad coverage for schema features including default and custom override keys, index enablement/disablement, error handling for invalid filtering, key discovery on compaction, embedding source key mechanics, forked collection schema isolation, and ensures compatibility across client instances and collection recreation. The tests leverage various embedding functions, enforce correct error propagation, and validate index precedence (overrides vs. discoverables vs. defaults).

Key Changes

• Introduced extensive e2e tests validating schema persistence, index configuration, and lifecycle for collections in chromadb/test/api/test_schema_e2e.py.
• Added positive and negative test cases for index enabling/disabling, filter applicability, error types on disallowed filters, and index restoration after deletion.
• Tested custom, default, and dynamically-discovered schema keys, and ensured proper schema behavior during collection deletion and recreation.
• Added and registered deterministic test embedding functions for both dense (SimpleEmbeddingFunction) and sparse (DeterministicSparseEmbeddingFunction) scenarios.
• Validated schema and index persistence across client restarts, including embedding function configurations.
• Tested the schema inheritance and isolation for forked collections in a clustered environment.
• Checked precedence between explicit key overrides, discovered schema keys, and default keys, asserting query/filter correctness.
• Ensured strictness of index/embedding configuration enforcement, including correct handling of conflicting types and invalid arguments.

Affected Areas

chromadb/test/api/test_schema_e2e.py

This summary was automatically generated by @propel-code-bot

@jairad26 jairad26 force-pushed the jai/schema-e2e-tests branch from 8bd6af6 to be61947 Compare October 17, 2025 00:40
@jairad26 jairad26 force-pushed the jai/update-embed-logic branch from c92cdf5 to a02dcea Compare October 17, 2025 00:40
Comment on lines 96 to +99
persisted_schema = collection.schema
assert persisted_schema is not None

vector_index = persisted_schema.keys["#embedding"].float_list.vector_index
print(persisted_schema.serialize_to_json())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Debug print statement should be removed before merging. This print(persisted_schema.serialize_to_json()) statement appears to be leftover debugging code that will clutter test output in production.

Suggested change
persisted_schema = collection.schema
assert persisted_schema is not None
vector_index = persisted_schema.keys["#embedding"].float_list.vector_index
print(persisted_schema.serialize_to_json())
persisted_schema = collection.schema
assert persisted_schema is not None
embedding_override = persisted_schema.keys["#embedding"].float_list

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**BestPractice**]

Debug print statement should be removed before merging. This `print(persisted_schema.serialize_to_json())` statement appears to be leftover debugging code that will clutter test output in production.

```suggestion
    persisted_schema = collection.schema
    assert persisted_schema is not None

    embedding_override = persisted_schema.keys["#embedding"].float_list
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: chromadb/test/api/test_schema_e2e.py
Line: 99

@jairad26 jairad26 changed the base branch from jai/update-embed-logic to graphite-base/5616 October 17, 2025 03:00
@jairad26 jairad26 force-pushed the jai/schema-e2e-tests branch from be61947 to 9322a2c Compare October 17, 2025 03:00
@jairad26 jairad26 changed the base branch from graphite-base/5616 to main October 17, 2025 03:00
@jairad26 jairad26 merged commit 8e1d146 into main Oct 17, 2025
67 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants