Skip to content

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented Oct 16, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Renames key_overrides to keys
    • Removes $ and # from special key names, value type names and index type names
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None

Observability plan

None

Documentation Changes

None

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@sanketkedia sanketkedia marked this pull request as ready for review October 16, 2025 00:24
@sanketkedia sanketkedia requested a review from jairad26 October 16, 2025 00:24
Copy link
Contributor

propel-code-bot bot commented Oct 16, 2025

Schema Field Renaming: Remove Special Characters and Standardize Serialization Across Languages

This PR performs a cross-language refactor to rename schema-related fields and constants by removing special characters (such as $ and #) from value type, index type, and key names throughout the Python, Rust, and TypeScript codebases. The key_overrides field has been renamed to keys in all relevant schema structures. All serialization/deserialization mechanisms, API surfaces, and test assertions are updated to use the new simple field names, aiming for consistency and improved compatibility between client, server, and persistence layers.

Key Changes

• Renamed key_overrides to keys throughout backend (Rust), API (Python), and client (TypeScript).
• Replaced special characters ($, #) in all schema value type and index type field and constant names across Python, Rust, and TypeScript.
• Updated constants (e.g., STRING_VALUE_NAME, VECTOR_INDEX_NAME, etc.) to use names without special prefixes.
• Refactored all serialization and deserialization logic to use new field names and constants (e.g., string instead of #string, vector_index instead of $vector_index).
• Synchronized schema field naming conventions between Python, Rust (serde), and TypeScript OpenAPI generation.
• Modified affected tests to expect new field names in JSON schema representations.
• Applied changes to all schema-consuming and producing code in Rust (API structs, serde annotations, unit tests), Python (API types, collection models, tests), and TypeScript (OpenAPI-generated types).
• Addressed code in data loaders, validation utilities, and orchestration modules to reference new schema keys where appropriate.

Affected Areas

chromadb/api/types.py - schema constants and serialization
chromadb/api/models/CollectionCommon.py - schema handling
chromadb/test/api/test_schema_e2e.py - schema persistence/compat tests
rust/types/src/collection_schema.rs - Rust schema types and serde layer
rust/types/src/validators.rs - schema key checks in validation
rust/worker/src/execution/orchestration/sparse_knn.rs - key lookup for sparse index
clients/new-js/packages/chromadb/src/api/types.gen.ts - OpenAPI-generated schema types

This summary was automatically generated by @propel-code-bot

#[derive(Clone, Debug, PartialEq, Serialize, Deserialize, ToSchema, Default)]
pub struct ValueTypes {
#[serde(rename = "#string", skip_serializing_if = "Option::is_none")]
#[serde(rename = "string", skip_serializing_if = "Option::is_none")] // STRING_VALUE_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Serialization compatibility issue: The serde rename attributes have been changed but there's no backward compatibility for deserialization. Code attempting to deserialize old JSON with #string, #float_list, etc. will fail.

#[serde(rename = "string", skip_serializing_if = "Option::is_none")] // Was "#string"

This creates a breaking change where old serialized data cannot be read. Consider using serde aliases to maintain backward compatibility:

#[serde(rename = "string", alias = "#string", skip_serializing_if = "Option::is_none")]

Validation: ✅ This approach is correct and follows serde best practices. The alias attribute allows deserialization from multiple field names while rename controls the serialization output. Multiple aliases can be specified by repeating the attribute: #[serde(alias = "old_name1", alias = "old_name2")].

Context for Agents
[**BestPractice**]

**Serialization compatibility issue**: The serde rename attributes have been changed but there's no backward compatibility for deserialization. Code attempting to deserialize old JSON with `#string`, `#float_list`, etc. will fail.

```rust
#[serde(rename = "string", skip_serializing_if = "Option::is_none")] // Was "#string"
```

This creates a breaking change where old serialized data cannot be read. Consider using serde aliases to maintain backward compatibility:

```rust
#[serde(rename = "string", alias = "#string", skip_serializing_if = "Option::is_none")]
```

**Validation**: ✅ This approach is correct and follows serde best practices. The `alias` attribute allows deserialization from multiple field names while `rename` controls the serialization output. Multiple aliases can be specified by repeating the attribute: `#[serde(alias = "old_name1", alias = "old_name2")]`.

File: rust/types/src/collection_schema.rs
Line: 137

pub defaults: ValueTypes,
/// Key-specific index overrides
pub key_overrides: HashMap<String, ValueTypes>,
pub keys: HashMap<String, ValueTypes>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Data migration issue: The keys field rename from key_overrides without serde alias will cause deserialization failures for existing data. Any persisted schemas using the old key_overrides field name will fail to deserialize.

Add backward compatibility:

#[serde(rename = "keys", alias = "key_overrides")]
pub keys: HashMap<String, ValueTypes>,

Validation: ✅ This solution is correct. Serde's alias attribute is specifically designed for this use case - it allows deserialization from both the old field name (key_overrides) and the new field name (keys), while serialization will always use the new name specified in rename. This ensures backward compatibility during data migration without breaking existing serialized data.

Context for Agents
[**CriticalError**]

**Data migration issue**: The `keys` field rename from `key_overrides` without serde alias will cause deserialization failures for existing data. Any persisted schemas using the old `key_overrides` field name will fail to deserialize.

Add backward compatibility:
```rust
#[serde(rename = "keys", alias = "key_overrides")]
pub keys: HashMap<String, ValueTypes>,
```

**Validation**: ✅ This solution is correct. Serde's `alias` attribute is specifically designed for this use case - it allows deserialization from both the old field name (`key_overrides`) and the new field name (`keys`), while serialization will always use the new name specified in `rename`. This ensures backward compatibility during data migration without breaking existing serialized data.

File: rust/types/src/collection_schema.rs
Line: 98

@blacksmith-sh blacksmith-sh bot deleted a comment from sanketkedia Oct 16, 2025
@blacksmith-sh blacksmith-sh bot deleted a comment from sanketkedia Oct 16, 2025
@sanketkedia sanketkedia force-pushed the 10-15-_enh_rename_schema_fields branch from 78946c0 to 48287a9 Compare October 16, 2025 22:52
Comment on lines +83 to +84
pub const DOCUMENT_KEY: &str = "#document";
pub const EMBEDDING_KEY: &str = "#embedding";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: do we have this somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard-coded in some places for search and rank. That's a bigger change to consolidate it under this. Created a note

@blacksmith-sh blacksmith-sh bot deleted a comment from sanketkedia Oct 16, 2025
@jairad26 jairad26 merged commit 724a114 into main Oct 17, 2025
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants