-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[ENH]: Add REST API endpoints for task management #5547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
a1843a1
to
7719cef
Compare
6abeaa6
to
3b40333
Compare
Add REST API Endpoints and Data Model for Task Management This PR introduces a major feature for task management within the system, adding REST/gRPC APIs and core data models for creation, retrieval, and deletion of tasks associated with collections. It includes the initial database schema, backend Go models and data access logic, protobuf definition and gRPC implementations, Rust types, test coverage, and synchronization primitives for operator definitions between Go and Rust. The API supports atomic task creation, task lookup by name, soft deletion (with optional cascading deletion of output collections), and exposes a new operators listing interface for clients to query built-in operator support. The changes reflect a full vertical slice: DB migrations for new tables ( Key Changes• Introduced Affected Areas• This summary was automatically generated by @propel-code-bot |
7719cef
to
de66f29
Compare
de66f29
to
8f92bb4
Compare
3b40333
to
067ee2a
Compare
067ee2a
to
e7ce88d
Compare
8f92bb4
to
aaf5680
Compare
chromadb/api/models/Collection.py
Outdated
name: str, | ||
operator_id: str, | ||
output_collection: str, | ||
params: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this optional str? What format does it take?
examples/task_api_example.py
Outdated
@@ -0,0 +1,84 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example could probably change prints to asserts and be a good intg test.
aaf5680
to
0c0345a
Compare
e7ce88d
to
ad3767e
Compare
0c0345a
to
72036f3
Compare
ad3767e
to
3baa37f
Compare
72036f3
to
a7e9cff
Compare
chromadb/api/models/Collection.py
Outdated
... name="count_docs", | ||
... operator_id="record_counter", | ||
... output_collection="doc_counts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Documentation]
The parameter names in the example code of the create_task
docstring do not match the method's signature. Specifically, name
should be task_name
and output_collection
should be output_collection_name
. Aligning the example with the actual parameters will prevent confusion for users.
Context for Agents
[**Documentation**]
The parameter names in the example code of the `create_task` docstring do not match the method's signature. Specifically, `name` should be `task_name` and `output_collection` should be `output_collection_name`. Aligning the example with the actual parameters will prevent confusion for users.
File: chromadb/api/models/Collection.py
Line: 516
a7e9cff
to
ebc1594
Compare
3baa37f
to
bfad21e
Compare
bfad21e
to
53edf57
Compare
ebc1594
to
e8daafd
Compare
e8daafd
to
e204fbf
Compare
CollectionUuid(uuid::Uuid::parse_str(&input_collection_id).map_err(|e| { | ||
RemoveTaskError::Internal(Box::new(chroma_error::TonicError( | ||
tonic::Status::invalid_argument(format!( | ||
"Invalid input_collection_id format: {}", | ||
e | ||
)), | ||
))) | ||
})?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
An invalid input_collection_id
format is a client error and should result in a 400-level error (InvalidArgument
), not a 500-level Internal
error. The current implementation wraps the parsing error as RemoveTaskError::Internal
, which will produce a 500 status code.
This manual parsing is necessary because RemoveTaskRequest::input_collection_id
is a String
. For consistency with CreateTaskRequest
(which uses CollectionUuid
), consider changing the type of input_collection_id
to CollectionUuid
in rust/types/src/api_types.rs
. This would allow serde
to handle UUID validation automatically and return a 400 error for invalid formats, simplifying this code and providing a more accurate error response.
Context for Agents
[**BestPractice**]
An invalid `input_collection_id` format is a client error and should result in a 400-level error (`InvalidArgument`), not a 500-level `Internal` error. The current implementation wraps the parsing error as `RemoveTaskError::Internal`, which will produce a 500 status code.
This manual parsing is necessary because `RemoveTaskRequest::input_collection_id` is a `String`. For consistency with `CreateTaskRequest` (which uses `CollectionUuid`), consider changing the type of `input_collection_id` to `CollectionUuid` in `rust/types/src/api_types.rs`. This would allow `serde` to handle UUID validation automatically and return a 400 error for invalid formats, simplifying this code and providing a more accurate error response.
File: rust/frontend/src/impls/service_based_frontend.rs
Line: 1829
e204fbf
to
f1b4368
Compare
chromadb/api/fastapi.py
Outdated
resp_json = self._make_request( | ||
"post", | ||
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete", | ||
json={ | ||
"input_collection_id": str(input_collection_id), | ||
"task_name": task_name, | ||
"delete_output": delete_output, | ||
}, | ||
) | ||
return cast(bool, resp_json["success"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CriticalError]
Missing error handling: Similar to create_task, the _make_request
call can fail but there's no error handling. Network failures, timeouts, or server errors will propagate as unhandled exceptions.
Suggested Change
resp_json = self._make_request( | |
"post", | |
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete", | |
json={ | |
"input_collection_id": str(input_collection_id), | |
"task_name": task_name, | |
"delete_output": delete_output, | |
}, | |
) | |
return cast(bool, resp_json["success"]) | |
try: | |
resp_json = self._make_request( | |
"post", | |
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete", | |
json={ | |
"input_collection_id": str(input_collection_id), | |
"task_name": task_name, | |
"delete_output": delete_output, | |
}, | |
) | |
return cast(bool, resp_json["success"]) | |
except Exception as e: | |
# Handle request errors appropriately | |
raise |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**CriticalError**]
Missing error handling: Similar to create_task, the `_make_request` call can fail but there's no error handling. Network failures, timeouts, or server errors will propagate as unhandled exceptions.
<details>
<summary>Suggested Change</summary>
```suggestion
try:
resp_json = self._make_request(
"post",
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
json={
"input_collection_id": str(input_collection_id),
"task_name": task_name,
"delete_output": delete_output,
},
)
return cast(bool, resp_json["success"])
except Exception as e:
# Handle request errors appropriately
raise
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: chromadb/api/fastapi.py
Line: 745
chromadb/api/fastapi.py
Outdated
) | ||
return cast(bool, resp_json["success"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CriticalError]
Potential KeyError: The response parsing assumes resp_json["success"]
exists without validation. If the server returns an error response or malformed JSON, this will raise a KeyError.
Suggested Change
) | |
return cast(bool, resp_json["success"]) | |
resp_json = self._make_request( | |
"post", | |
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete", | |
json={ | |
"input_collection_id": str(input_collection_id), | |
"task_name": task_name, | |
"delete_output": delete_output, | |
}, | |
) | |
if "success" not in resp_json: | |
raise ValueError("Invalid response format from server") | |
return cast(bool, resp_json["success"]) |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**CriticalError**]
Potential KeyError: The response parsing assumes `resp_json["success"]` exists without validation. If the server returns an error response or malformed JSON, this will raise a KeyError.
<details>
<summary>Suggested Change</summary>
```suggestion
resp_json = self._make_request(
"post",
f"/tenants/{tenant}/databases/{database}/collections/{input_collection_id}/tasks/delete",
json={
"input_collection_id": str(input_collection_id),
"task_name": task_name,
"delete_output": delete_output,
},
)
if "success" not in resp_json:
raise ValueError("Invalid response format from server")
return cast(bool, resp_json["success"])
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
</details>
File: chromadb/api/fastapi.py
Line: 745
chromadb/api/segment.py
Outdated
raise NotImplementedError( | ||
"Seach is not implemented for SegmentAPI" | ||
) | ||
raise NotImplementedError("Seach is not implemented for SegmentAPI") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[NitPick]
Typo in error message: "Seach" should be "Search" in the NotImplementedError message.
raise NotImplementedError("Seach is not implemented for SegmentAPI") | |
raise NotImplementedError("Search is not implemented for SegmentAPI") |
⚡ Committable suggestion
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Context for Agents
[**NitPick**]
Typo in error message: "Seach" should be "Search" in the NotImplementedError message.
```suggestion
raise NotImplementedError("Search is not implemented for SegmentAPI")
```
⚡ **Committable suggestion**
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
File: chromadb/api/segment.py
Line: 430
## Description of changes _Summarize the changes made by this PR._ - Improvements & Bug fixes - Added db models and SysDB routes to Create, Delete and Get Tasks. - New functionality - ^^^ `CreateTask` here does not kick off the requisite backfill task runs in this change. That is left as followup work. ## Test plan Tests have been added in `go/pkg/sysdb/metastore/db/dao/task_test.go`. - [ ] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Migration plan _Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?_ ## Observability plan _What is the plan to instrument and monitor this change?_ ## Documentation Changes _Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the_ [_docs section](https://github.com/chroma-core/chroma/tree/main/docs/docs.trychroma.com)?_
f1b4368
to
8afc2b1
Compare
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustMigration plan
Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?
Observability plan
What is the plan to instrument and monitor this change?
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_