Skip to content

Conversation

tanujnay112
Copy link
Contributor

@tanujnay112 tanujnay112 commented Oct 2, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Added HTTP routes for create_task/remove_task
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_

Copy link
Contributor Author

tanujnay112 commented Oct 2, 2025

Copy link

github-actions bot commented Oct 2, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@tanujnay112 tanujnay112 changed the base branch from task_api_db_schema to graphite-base/5547 October 2, 2025 07:47
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from a1843a1 to 7719cef Compare October 6, 2025 20:11
@tanujnay112 tanujnay112 changed the title Add REST API endpoints for task management [ENH]: Add REST API endpoints for task management Oct 6, 2025
@tanujnay112 tanujnay112 marked this pull request as ready for review October 6, 2025 20:12
Copy link
Contributor

propel-code-bot bot commented Oct 6, 2025

Add REST API Endpoints and Data Model for Task Management

This PR introduces a major feature for task management within the system, adding REST/gRPC APIs and core data models for creation, retrieval, and deletion of tasks associated with collections. It includes the initial database schema, backend Go models and data access logic, protobuf definition and gRPC implementations, Rust types, test coverage, and synchronization primitives for operator definitions between Go and Rust. The API supports atomic task creation, task lookup by name, soft deletion (with optional cascading deletion of output collections), and exposes a new operators listing interface for clients to query built-in operator support.

The changes reflect a full vertical slice: DB migrations for new tables (operators, tasks, task_templates), Go ORM and data access logic, coordinator service endpoints, gRPC bindings, protobuf/IDL expansion, Rust-side type auto-generation and exposure, and associated tests and mocks. This enables further expansion for asynchronous data transformation pipelines and sets up standardized operator ID infrastructure (with cross-language constant management for synchronization between Go and Rust).

Key Changes

• Introduced operators, tasks, and task_templates tables via migration (20251001073000.sql)
• Added Go ORM models: Task, Operator, their respective interfaces, and mock scaffolding
• Implemented CRUD handlers and gRPC endpoints for task lifecycle: CreateTask, GetTaskByName, DeleteTask, GetOperators
• Extended protobuf API (coordinator.proto) with task and operator messages and service methods
• Created and registered new DAO types for TaskDb and OperatorDb in Go, with soft-delete and unique constraint logic
• Added Rust Task/TaskUuid types and generated operators constants (operators_generated.rs)
• Implemented Rust-side task API stubs and added integration tests verifying cross-language constant/database sync
• Updated test infrastructure: DAO, mocks, and end-to-end cross-language/integration tests for operators and tasks
• Auto-generation tooling for Rust operator constants, Go-Rust synchronization, and developer documentation

Affected Areas

go/pkg/sysdb/metastore/db/dao (data access and schema)
go/pkg/sysdb/metastore/db/dbmodel (ORM types and interfaces)
go/pkg/sysdb/metastore/db/dbcore (migration/bootstrap logic)
go/pkg/sysdb/coordinator (task service logic, API handlers)
go/pkg/sysdb/grpc (gRPC server handlers)
idl/chromadb/proto/coordinator.proto (protobuf/gRPC API)
rust/types/ (Rust types, operator constant tooling, task struct)
rust/sysdb/src/sysdb.rs (Rust gRPC stubs and task handling)
rust/frontend/src/impls/service_based_frontend.rs (Rust integration tests for op sync)
• Migration artifacts, documentation (README_OPERATORS.md)

This summary was automatically generated by @propel-code-bot

@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from 7719cef to de66f29 Compare October 6, 2025 20:19
@tanujnay112 tanujnay112 changed the base branch from graphite-base/5547 to task_api_db_schema October 6, 2025 20:19
@blacksmith-sh blacksmith-sh bot deleted a comment from tanujnay112 Oct 6, 2025
@tanujnay112 tanujnay112 changed the base branch from task_api_db_schema to graphite-base/5547 October 6, 2025 21:17
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from de66f29 to 8f92bb4 Compare October 6, 2025 21:28
@tanujnay112 tanujnay112 changed the base branch from graphite-base/5547 to task_api_db_schema October 6, 2025 21:28
@blacksmith-sh blacksmith-sh bot deleted a comment from tanujnay112 Oct 6, 2025
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from 8f92bb4 to aaf5680 Compare October 6, 2025 21:35
@blacksmith-sh blacksmith-sh bot deleted a comment from tanujnay112 Oct 6, 2025
name: str,
operator_id: str,
output_collection: str,
params: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this optional str? What format does it take?

@@ -0,0 +1,84 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example could probably change prints to asserts and be a good intg test.

@tanujnay112 tanujnay112 changed the base branch from task_api_db_schema to graphite-base/5547 October 6, 2025 22:18
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from aaf5680 to 0c0345a Compare October 7, 2025 10:26
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from 0c0345a to 72036f3 Compare October 7, 2025 10:29
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from 72036f3 to a7e9cff Compare October 7, 2025 10:32
@tanujnay112 tanujnay112 changed the base branch from graphite-base/5547 to task_api_db_schema October 7, 2025 10:32
Comment on lines 514 to 516
... name="count_docs",
... operator_id="record_counter",
... output_collection="doc_counts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Documentation]

The parameter names in the example code of the create_task docstring do not match the method's signature. Specifically, name should be task_name and output_collection should be output_collection_name. Aligning the example with the actual parameters will prevent confusion for users.

Context for Agents
[**Documentation**]

The parameter names in the example code of the `create_task` docstring do not match the method's signature. Specifically, `name` should be `task_name` and `output_collection` should be `output_collection_name`. Aligning the example with the actual parameters will prevent confusion for users.

File: chromadb/api/models/Collection.py
Line: 516

@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from a7e9cff to ebc1594 Compare October 7, 2025 17:36
@tanujnay112 tanujnay112 changed the base branch from task_api_db_schema to graphite-base/5547 October 7, 2025 20:24
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from ebc1594 to e8daafd Compare October 7, 2025 20:24
@blacksmith-sh blacksmith-sh bot deleted a comment from tanujnay112 Oct 7, 2025
@graphite-app graphite-app bot changed the base branch from graphite-base/5547 to main October 7, 2025 20:24
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from e8daafd to e204fbf Compare October 7, 2025 20:24
Comment on lines 1822 to 1829
CollectionUuid(uuid::Uuid::parse_str(&input_collection_id).map_err(|e| {
RemoveTaskError::Internal(Box::new(chroma_error::TonicError(
tonic::Status::invalid_argument(format!(
"Invalid input_collection_id format: {}",
e
)),
)))
})?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

An invalid input_collection_id format is a client error and should result in a 400-level error (InvalidArgument), not a 500-level Internal error. The current implementation wraps the parsing error as RemoveTaskError::Internal, which will produce a 500 status code.

This manual parsing is necessary because RemoveTaskRequest::input_collection_id is a String. For consistency with CreateTaskRequest (which uses CollectionUuid), consider changing the type of input_collection_id to CollectionUuid in rust/types/src/api_types.rs. This would allow serde to handle UUID validation automatically and return a 400 error for invalid formats, simplifying this code and providing a more accurate error response.

Context for Agents
[**BestPractice**]

An invalid `input_collection_id` format is a client error and should result in a 400-level error (`InvalidArgument`), not a 500-level `Internal` error. The current implementation wraps the parsing error as `RemoveTaskError::Internal`, which will produce a 500 status code.

This manual parsing is necessary because `RemoveTaskRequest::input_collection_id` is a `String`. For consistency with `CreateTaskRequest` (which uses `CollectionUuid`), consider changing the type of `input_collection_id` to `CollectionUuid` in `rust/types/src/api_types.rs`. This would allow `serde` to handle UUID validation automatically and return a 400 error for invalid formats, simplifying this code and providing a more accurate error response.

File: rust/frontend/src/impls/service_based_frontend.rs
Line: 1829

@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from e204fbf to f1b4368 Compare October 7, 2025 20:37
@blacksmith-sh blacksmith-sh bot deleted a comment from tanujnay112 Oct 7, 2025
Comment on lines 736 to 745
resp_json = self._make_request(
"post",
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
json={
"input_collection_id": str(input_collection_id),
"task_name": task_name,
"delete_output": delete_output,
},
)
return cast(bool, resp_json["success"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Missing error handling: Similar to create_task, the _make_request call can fail but there's no error handling. Network failures, timeouts, or server errors will propagate as unhandled exceptions.

Suggested Change
Suggested change
resp_json = self._make_request(
"post",
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
json={
"input_collection_id": str(input_collection_id),
"task_name": task_name,
"delete_output": delete_output,
},
)
return cast(bool, resp_json["success"])
try:
resp_json = self._make_request(
"post",
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
json={
"input_collection_id": str(input_collection_id),
"task_name": task_name,
"delete_output": delete_output,
},
)
return cast(bool, resp_json["success"])
except Exception as e:
# Handle request errors appropriately
raise

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**CriticalError**]

Missing error handling: Similar to create_task, the `_make_request` call can fail but there's no error handling. Network failures, timeouts, or server errors will propagate as unhandled exceptions.

<details>
<summary>Suggested Change</summary>

```suggestion
try:
    resp_json = self._make_request(
        "post",
        f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
        json={
            "input_collection_id": str(input_collection_id),
            "task_name": task_name,
            "delete_output": delete_output,
        },
    )
    return cast(bool, resp_json["success"])
except Exception as e:
    # Handle request errors appropriately
    raise
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: chromadb/api/fastapi.py
Line: 745

Comment on lines 744 to 745
)
return cast(bool, resp_json["success"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Potential KeyError: The response parsing assumes resp_json["success"] exists without validation. If the server returns an error response or malformed JSON, this will raise a KeyError.

Suggested Change
Suggested change
)
return cast(bool, resp_json["success"])
resp_json = self._make_request(
"post",
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
json={
"input_collection_id": str(input_collection_id),
"task_name": task_name,
"delete_output": delete_output,
},
)
if "success" not in resp_json:
raise ValueError("Invalid response format from server")
return cast(bool, resp_json["success"])

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**CriticalError**]

Potential KeyError: The response parsing assumes `resp_json["success"]` exists without validation. If the server returns an error response or malformed JSON, this will raise a KeyError.

<details>
<summary>Suggested Change</summary>

```suggestion
resp_json = self._make_request(
    "post",
    f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
    json={
        "input_collection_id": str(input_collection_id),
        "task_name": task_name,
        "delete_output": delete_output,
    },
)
if "success" not in resp_json:
    raise ValueError("Invalid response format from server")
return cast(bool, resp_json["success"])
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

</details>

File: chromadb/api/fastapi.py
Line: 745

raise NotImplementedError(
"Seach is not implemented for SegmentAPI"
)
raise NotImplementedError("Seach is not implemented for SegmentAPI")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NitPick]

Typo in error message: "Seach" should be "Search" in the NotImplementedError message.

Suggested change
raise NotImplementedError("Seach is not implemented for SegmentAPI")
raise NotImplementedError("Search is not implemented for SegmentAPI")

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**NitPick**]

Typo in error message: "Seach" should be "Search" in the NotImplementedError message.

```suggestion
raise NotImplementedError("Search is not implemented for SegmentAPI")
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: chromadb/api/segment.py
Line: 430

## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
    - Added db models and SysDB routes to Create, Delete and Get Tasks.
- New functionality
    - ^^^
`CreateTask` here does not kick off the requisite backfill task runs in this change. That is left as followup work.
## Test plan

Tests have been added in `go/pkg/sysdb/metastore/db/dao/task_test.go`.

- [ ] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust

## Migration plan

_Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?_

## Observability plan

_What is the plan to instrument and monitor this change?_

## Documentation Changes

_Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the_ [_docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
@tanujnay112 tanujnay112 force-pushed the task_api_rest_endpoints branch from f1b4368 to 8afc2b1 Compare October 8, 2025 09:27
@blacksmith-sh blacksmith-sh bot deleted a comment from tanujnay112 Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants