refactor(composio): extract derive_toolkit_slug helper#3035
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCentralizes toolkit-slug derivation into a private ChangesToolkit Slug Extraction
EnvVarGuard safety docs
AgentProgress test updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/inference_local_admin_raw_coverage_e2e.rs (1)
52-64: ⚡ Quick winConsider enforcing the lock precondition at compile time.
The safety comments document that callers must hold
ENV_VAR_LOCK, but this precondition isn't enforced. A future test could callEnvVarGuard::setor::unsetwithout acquiring the lock first, violating the safety contract for theunsafeenv mutations.🔒 Proposed fix to enforce the lock at compile time
impl EnvVarGuard { - fn set(key: &'static str, value: impl AsRef<std::ffi::OsStr>) -> Self { + fn set(key: &'static str, value: impl AsRef<std::ffi::OsStr>, _lock: &tokio::sync::MutexGuard<()>) -> Self { let previous = std::env::var_os(key); // SAFETY: tests that mutate environment variables hold ENV_VAR_LOCK. unsafe { std::env::set_var(key, value) }; Self { key, previous } } - fn unset(key: &'static str) -> Self { + fn unset(key: &'static str, _lock: &tokio::sync::MutexGuard<()>) -> Self { let previous = std::env::var_os(key); // SAFETY: tests that mutate environment variables hold ENV_VAR_LOCK. unsafe { std::env::remove_var(key) }; Self { key, previous } }Then update call sites to pass the lock reference:
let _env_lock = ENV_VAR_LOCK.lock().await; // ... - let _path = EnvVarGuard::set("PATH", scripts.path()); + let _path = EnvVarGuard::set("PATH", scripts.path(), &_env_lock);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/inference_local_admin_raw_coverage_e2e.rs` around lines 52 - 64, EnvVarGuard::set and EnvVarGuard::unset perform unsafe environment mutations but only document that callers must hold ENV_VAR_LOCK; enforce this at compile time by requiring a reference/token representing the acquired lock (e.g. an &EnvVarLock or EnvVarLockGuard parameter) in both function signatures instead of calling them with no proof, update the implementations to use that parameter and remove the explanatory SAFETY comment, and update all call sites that currently call EnvVarGuard::set/unset to pass the lock guard (the ENV_VAR_LOCK acquisition site should now produce the guard value that callers forward).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/inference_local_admin_raw_coverage_e2e.rs`:
- Around line 52-64: EnvVarGuard::set and EnvVarGuard::unset perform unsafe
environment mutations but only document that callers must hold ENV_VAR_LOCK;
enforce this at compile time by requiring a reference/token representing the
acquired lock (e.g. an &EnvVarLock or EnvVarLockGuard parameter) in both
function signatures instead of calling them with no proof, update the
implementations to use that parameter and remove the explanatory SAFETY comment,
and update all call sites that currently call EnvVarGuard::set/unset to pass the
lock guard (the ENV_VAR_LOCK acquisition site should now produce the guard value
that callers forward).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 28a3d9b4-0689-4529-9b65-d667563fb050
📒 Files selected for processing (1)
tests/inference_local_admin_raw_coverage_e2e.rs
…oolkit-slug # Conflicts: # tests/inference_local_admin_raw_coverage_e2e.rs
…oolkit-slug # Conflicts: # src/openhuman/threads/turn_state/mirror_tests.rs # tests/memory_core_threads_raw_coverage_e2e.rs # tests/memory_threads_raw_coverage_e2e.rs
sanil-23
left a comment
There was a problem hiding this comment.
The core refactor is clean: derive_toolkit_slug lifts the duplicated split('_').next().unwrap_or("integration").to_ascii_lowercase() expression verbatim, the doc-comment correctly explains the unreachable "integration" fallback, and the three unit tests (multi-segment, single-segment, empty-string parity) lock the behavior down. Nice touch covering the empty-string Some("") edge that the unwrap_or does not catch.
I checked the riskier bundled changes and they hold up:
generate_id()switching from a UUID string to an opaquer+a-p encoding is safe -- rule ids are treated as opaque strings everywhere (only ever prefixed withrule/as a storage key, never parsed back as a UUID), so no consumer breaks. Still 128 bits of entropy, and it's covered by a test.- The
meet_agentfile.flush()is a correct durability fix -- tokio'sFiledoesn't guarantee a flush on drop. - The
thread_stack_sizebump and thearia-pressedaddition are additive and low-risk.
One thing worth fixing for the record, not blocking: the description frames this as a "pure refactor with zero behavior change," but the PR actually bundles several real production changes that aren't called out in the Changes list -- the server runtime stack-size bump (src/core/cli.rs), the rule-id generation format (memory_tools/types.rs), and the meet-agent flush (meet_agent/store.rs). They're each defensible, but they aren't behavior-neutral and they're unrelated to the toolkit-slug refactor. Please update the description so the production changes are visible to reviewers and the changelog, and ideally split unrelated production fixes into their own PRs next time -- it keeps the diff reviewable and the history bisectable.
Everything is correct and CI is fully green, so approving. Thanks for also stabilizing the flaky env-mutating coverage tests with the shared env_lock -- that's a real improvement.
Change summary:
| Area | Change | Assessment |
|---|---|---|
| composio/error_mapping | extract derive_toolkit_slug + tests |
clean, the stated refactor |
| core/cli.rs | 8 MiB worker thread stack | safe, undisclosed in description |
| memory_tools/types.rs | opaque rule-id format | safe, ids are opaque |
| meet_agent/store.rs | flush after write | correct durability fix |
| frontend (ToolsPanel) | aria-pressed + tests |
good accessibility win |
| tests (~15 files) | env serialization, fixtures, pagination | stabilization, fine |
| let rt = tokio::runtime::Builder::new_multi_thread() | ||
| .enable_all() | ||
| // Match the desktop host runtime stack: sub-agent prompt/tool | ||
| // orchestration can exceed Tokio's default 2 MiB worker stack. |
There was a problem hiding this comment.
[minor] This raises the server runtime's per-worker stack to 8 MiB, which is a real production change (and a sensible one) -- but it isn't mentioned in the PR's Changes list, which describes the PR as a zero-behavior-change refactor. Please call it out in the description so it's visible to reviewers and the changelog.
oxoxDev
left a comment
There was a problem hiding this comment.
Walkthrough
Core refactor is clean (~16 LOC) — zero-behavior extraction of a 4-line duplicate into derive_toolkit_slug(&str) -> String. But the branch carries 5 drive-by edits beyond what the title and Summary suggest: production Tokio worker-stack bump in cli.rs, PII-defensive re-encoding of ToolMemoryRule::generate_id(), durability flush() in meet_agent::store, aria-pressed a11y in ToolsPanel.tsx, and 16 e2e tests migrating to env_lock()-based env-var serialization. Requesting changes to split out the two production-impact items (#1, #2 below) so they can be audited independently of the composio refactor.
Changes
| File | Summary |
|---|---|
composio/error_mapping.rs |
New private derive_toolkit_slug; 2 call sites delegate. |
composio/error_mapping_tests.rs |
3 unit tests (happy / single-segment / empty-input parity). |
src/core/cli.rs |
Server tokio runtime thread_stack_size(8 MiB) (was default 2 MiB). |
memory_tools/types.rs |
generate_id() switched from Uuid::to_string() to 'a'..'p' nibble-hex encoding; new test ties storage key to has_likely_pii boundary. |
meet_agent/store.rs |
file.flush().await after write_all in append_record_to. |
ToolsPanel.tsx + .test.tsx |
aria-pressed on toggle + 74-LOC new test. |
16 × tests/*_raw_coverage_e2e.rs |
EnvVarGuard SAFETY-comment refresh + env_lock()-based serialization for cargo llvm-cov parallel execution. |
Actionable comments (4)
Major
1. src/openhuman/memory_tools/types.rs:128-134 — on-disk id-shape change buried in a "composio refactor" PR
pub fn generate_id() -> String {
let mut id = String::with_capacity(33);
id.push('r');
for byte in uuid::Uuid::new_v4().as_bytes() {
id.push((b'a' + (byte >> 4)) as char);
id.push((b'a' + (byte & 0x0f)) as char);
}
id
}Switches every newly-generated ToolMemoryRule.id from a 36-char UUID-string (123e4567-e89b-12d3-a456-426614174000) to a 33-char r-prefixed nibble-hex string (rabcd…). Pre-deploy rows keep the UUID shape; post-deploy rows get the new shape. Any consumer that parses id as a UUID, range-queries by it, or assumes a specific length will silently diverge across the boundary.
The new test generated_rule_ids_are_safe_memory_document_keys explicitly couples the encoding choice to safety::pii::has_likely_pii boundary behavior — which is in flux in PR #3028 (drops bare-NANP / E.164 from the strict set). The defensive change here may be partially obsoleted once #3028 lands.
Action requested: Split into a separate fix(memory_tools): id-encoding avoids PII boundary PR. Document which has_likely_pii shape was tripping Uuid::to_string() output (likely RRN \b\d{6}-[1-4]\d{6}\b or DNI \b\d{8}[A-Z]\b lining up with UUID nibbles), confirm no migration is needed for the two id shapes coexisting on disk, and merge that PR after #3028 so we can decide whether this re-encoding is still required.
2. src/core/cli.rs:283 — Tokio worker stack 2 → 8 MiB is an unannounced production change
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.thread_stack_size(8 * 1024 * 1024)
.build()?;@sanil-23 already flagged this. Adds ~6 MiB × num_workers resident memory per server process (≈96 MiB on a 16-vCPU host) for every non-desktop deployment that goes through this CLI path (docker, cloud, standalone openhuman-core serve). Desktop host runtime is already on 8 MiB, but the server / CLI containers' memory budgets weren't part of that decision.
Action requested: Either (a) split out into a dedicated PR with a ## Server runtime section linking the original issue that justified the desktop bump and confirming container memory budgets, or (b) add the same section to this PR's description and explicitly request acks from operators of cloud / docker deployments. Don't merge it silently under a refactor title.
Minor
3. src/openhuman/composio/error_mapping.rs:127-130 — unwrap_or("integration") is dead, your own test pins the opposite
tool.split('_')
.next()
.unwrap_or("integration")
.to_ascii_lowercase()Your doc-comment correctly states split('_').next() is Some(_) for any &str, and derive_toolkit_slug_empty_input_returns_empty_not_fallback pins exactly that: empty input returns "", NOT "integration". So the "integration" literal is unreachable — keeping it "for parity with the original call sites" is now self-contradictory (the original sites don't exist anymore, and the parity test enforces the opposite). Either drop the fallback:
tool.split('_').next().unwrap_or_default().to_ascii_lowercase()or keep it with an inline // dead — pre-#3009 fallback; kept to match merged behavior in case strict-input changes downstream and accept that the comment + test contradict.
4. src/openhuman/meet_agent/store.rs:112-114 — silent durability tweak in append_record_to
file.flush().await
.map_err(|e| format!("flush {}: {e}", path.display()))?;Forces an OS-level flush on every appended MeetCallRecord. If this path is low-frequency (call lifecycle events), fine. If it's used in a tight loop, the extra fsync-equivalent per write is measurable. Either:
- mention "added explicit flush for durability" in the description, or
- drop the flush — the file is reopened per call so tokio's drop-flush semantics already handle the durability case.
Nitpicks (2)
app/src/components/settings/panels/ToolsPanel.tsx:132—aria-pressedis good; consider also settingrole="switch"since the button represents a binary toggle, not a one-shot action.src/openhuman/composio/error_mapping_tests.rs:1-3— Imports insideuse super::{…}aren't alphabetical after thederive_toolkit_sluginsert.
Questions for the author (1)
env_lock()is added across 16 e2e test files. Is it a shared helper (intests/common/?) or did each file roll its own? If bespoke, worth extracting before this lands so the next coverage-e2e author copy-pastes the right shape.
Outside the diff
memory_tools::types::generate_idcouples tosafety::pii::has_likely_pii, whose strict set is being narrowed in PR #3028 (drops bare-NANP / E.164). The id re-encoding here may not be necessary after #3028 merges. Cross-link the two PRs in both descriptions, and decide merge order before either lands.
Verified / looks good
derive_toolkit_slugis a verbatim extraction; the empty-input parity test pins the previous behavior exactly.- Linked-issue chain is consistent: #2913 CLOSED → #3009 MERGED → this PR addresses CR's
derive_*nit. - 18/18 CI green, MERGEABLE / CLEAN.
EnvVarGuardRAII +env_lock()mutex pattern is sound; SAFETY comments updated correctly for the new precondition.- New docstring on
derive_toolkit_slugcaptures the non-obvious "fallback is unreachable for&strinput" reasoning, which is genuinely useful for future readers.
- 把 unwrap_or("integration") 改成 unwrap_or_default();str::split('_').next()
对 &str 始终返回 Some(_),旧 fallback 永远走不到,新写法和 doc-comment 已对齐。
- ToolsPanel toggle 改用 role="switch" + aria-checked,更贴 ARIA 规范;
单元测试和 Playwright settings-feature-preferences.spec.ts 同步迁移。
…oolkit-slug # Conflicts: # tests/inference_provider_admin_round22_raw_coverage_e2e.rs
|
@oxoxDev @sanil-23 — addressed the review feedback: Code changes (this push, commit
Description rewritten
Not done
Rebase
CI re-running. |
…#3113 upstream/main 的 tinyhumansai#3113 把 upsert_composio_source 默认改成 enabled=false(用户必须显式启用同步),所以 round23 测试在断言 list_enabled_by_kind(Composio).len() == 1 时拿到 0。先在 upsert 后断言列表为空,再 update_source 翻 enabled=true,重新断言 1。
upstream/main 的 tinyhumansai#3047/tinyhumansai#3113 让 GithubReader 优先用本地 git clone 跑 list_commits/read_commit,并且简化了 issue 渲染器(不再把 comments 拼到 body)。本测试用 fake gh 只 mock 了 API 路径,所以: - 在 PATH 上加 fake git 让 ensure_bare_clone/log/show 全部失败,让 reader 回退到 gh API 那条路径,断言 commit:abc123 / issue:42 / pr:43 才能继续。 - read_issue body 改为断言 "# Issue tinyhumansai#42" + "## Description",去掉 "## Comments" 期望,与新的 renderer 输出一致。
…oolkit-slug # Conflicts: # src/core/cli.rs # tests/inference_voice_http_round23_raw_coverage_e2e.rs # tests/memory_sources_readers_round21_raw_coverage_e2e.rs # tests/memory_sync_sources_raw_coverage_e2e.rs # tests/memory_threads_raw_coverage_e2e.rs # tests/near90_closure_raw_coverage_e2e.rs # tests/owned_domain_raw_coverage_e2e.rs # tests/tool_registry_approval_raw_coverage_e2e.rs # tests/tools_agent_credentials_state_raw_coverage_e2e.rs # tests/tools_composio_network_leftovers_raw_coverage_e2e.rs # tests/tools_composio_round22_raw_coverage_e2e.rs # tests/worker_b_raw_coverage_e2e.rs
Summary
Follow-up to the CodeRabbit nitpick on PR #3009 (Composio trigger-permission copy). The toolkit-slug extraction logic was duplicated across two user-facing message builders in
src/openhuman/composio/error_mapping.rs:This is a pure refactor with zero behavior change at the call sites: the expression is extracted into a shared
derive_toolkit_slug(tool: &str) -> Stringhelper.Addresses CodeRabbit's nit from #3009 → closes #2913.
Core refactor (zero behavior change)
derive_toolkit_sluginsrc/openhuman/composio/error_mapping.rs."integration"fallback:str::split('_').next()always yieldsSome(_)for&strinput, sounwrap_or_default()is the equivalent terminator and the helper's tests pin that exactly.format_insufficient_scope_messageformat_trigger_permission_messageerror_mapping_tests.rs:GMAIL_NEW_GMAIL_MESSAGE→gmailSLACK(single segment) →slack""(empty input) →""— behavior-parity guard verifying"".split('_').next()yieldsSome(""), so no fallback applies, exactly as before.Drive-by edits (calling out the production-impact pieces)
This branch picked up the following beyond the refactor; flagging them up-front so they don't get reviewed as "part of a zero-behavior refactor":
1. Server runtime — Tokio worker stack 2 MiB → 8 MiB (
src/core/cli.rs:283)Why: the server CLI path (
openhuman-core serve, used by docker / cloud / standalone debug) shares the same sub-agent prompt/tool-orchestration code as the desktop host, which already runs on an 8 MiB worker stack. Undercargo llvm-covinstrumentation the default 2 MiB worker stack overflows in CI; matching the desktop value resolves the flake.Cost: approximately
+6 MiB × num_workersresident RSS per server process (≈ 96 MiB on a 16-vCPU host). Within budget for our cloud / docker deployments, but explicitly calling out so operators can confirm; happy to split into its own PR if anyone wants a cleaner audit trail.2. Tool-memory rule id encoding (
src/openhuman/memory_tools/types.rs:128-134)ToolMemoryRule::generate_id()switched fromUuid::to_string()(36 chars, digits + hyphens) to a 33-charr-prefixed nibble-hex encoding (all lowercase letters).Why: UUID-shaped digit runs trip our
safety::pii::has_likely_piiheuristics (the digit/hyphen layout overlaps with RRN / DNI shapes), making the storage key occasionally PII-flagged. The new encoding is all[a-p]so it can never collide with those regexes. New testgenerated_rule_ids_are_safe_memory_document_keyscouples the encoding to the boundary.Compatibility: pre-deploy rule ids keep their UUID shape, post-deploy ids get the new shape, and we never parse the id as a UUID anywhere — only treat it as an opaque string key. No migration required.
Note: PR #3028 is narrowing
has_likely_pii(drops bare-NANP / E.164). The id re-encoding here is still useful (defense-in-depth, andhas_likely_piimay shift again later), so I'm keeping it.3. Meet-agent record durability (
src/openhuman/meet_agent/store.rs:112-114)Added an explicit
file.flush().awaitafterwrite_allinappend_record_to. The call site is low-frequency (per-call lifecycle event, not per-message), so the extrafsync-equivalent is not in any hot path.4. ToolsPanel a11y (
app/src/components/settings/panels/ToolsPanel.tsx:129-134)Per @oxoxDev review nit: the tool toggle now ships
role="switch"+aria-checked(wasaria-pressed). Unit test and the Playwright e2e spec (settings-feature-preferences.spec.ts) updated accordingly.role="switch"is the correct ARIA mapping for binary toggles;aria-checkedis the ARIA spec partner for that role.5. Test-infra:
env_lock()serialization forcargo llvm-covparallel runs16 raw-coverage e2e tests refresh their
EnvVarGuardSAFETY comments and route through a sharedenv_lock()mutex so concurrentcargo llvm-covworkers can no longer race onstd::env::set_var. Theenv_lockhelper lives intests/common/and is shared, not copy-pasted per file (answering @oxoxDev's question).6. Microsoft Teams toolkit slug expectation
Raw-coverage doc/expectation aligned with the current
toolkit_from_slugbehavior:MICROSOFT_TEAMS_*→microsoft_teams(no functional change to the function itself, just docs/tests catching up).Tests
Validation:
GGML_NATIVE=OFF cargo test -p openhuman composio::error_mapping— 12 passed (incl. the 3 newderive_toolkit_slugtests).GGML_NATIVE=OFF cargo test -p openhuman --test inference_local_admin_raw_coverage_e2e -- --test-threads=4— 4 passed.GGML_NATIVE=OFF cargo test -p openhuman --test inference_provider_admin_round22_raw_coverage_e2e -- --test-threads=4— 5 passed.GGML_NATIVE=OFF cargo test -p openhuman --test inference_voice_http_round23_raw_coverage_e2e -- --test-threads=4— 3 passed.GGML_NATIVE=OFF cargo test -p openhuman --test memory_raw_coverage_e2e memory_sources_validation_and_sync_classification_edges -- --exact— 1 passed.GGML_NATIVE=OFF cargo test -p openhuman --test memory_threads_raw_coverage_e2e memory_sync_composio_catalog_scope_and_state_helpers_cover_edge_cases -- --exact— 1 passed.GGML_NATIVE=OFF cargo test -p openhuman --test memory_tree_sync_raw_coverage_e2e composio_providers_sync_state_and_bus_surfaces_cover_read_write_edges -- --exact— 1 passed.GGML_NATIVE=OFF cargo test -p openhuman --lib toolkit_from_slug_handles_known_multi_segment_toolkits— 1 passed.pnpm --filter openhuman-app test:unit src/components/settings/panels/ToolsPanel.test.tsx— 1 passed (afterrole="switch"change).cargo fmt --manifest-path Cargo.toml --all --check— passed.pnpm compile— passed.Not run locally:
cargo llvm-covend-to-end —cargo-llvm-covnot installed; CI Rust Core Coverage remains the source of truth.Notes
chore/2913-derive-toolkit-slug.derive_toolkit_slugdoc-comment andunwrap_or_default()migration.Summary by CodeRabbit
Refactor
Tests