Skip to content

test: green Rust Core Coverage (stale assertions + env-race serialization)#3156

Merged
sanil-23 merged 15 commits into
tinyhumansai:mainfrom
sanil-23:fix/coverage-stale-tests
Jun 1, 2026
Merged

test: green Rust Core Coverage (stale assertions + env-race serialization)#3156
sanil-23 merged 15 commits into
tinyhumansai:mainfrom
sanil-23:fix/coverage-stale-tests

Conversation

@sanil-23
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 commented Jun 1, 2026

Summary

Green the Rust Core Coverage (cargo-llvm-cov) job, which was red on every PR. This branch has since been rebased onto latest main: #3074 ("separate agent action sandbox from internal workspace state") independently landed the stale-assertion realignments this PR originally carried, so those are now upstream and removed here. What remains is the part main does not have — and it keeps the coverage job parallel/fast (no --test-threads=1).

Changes

1. Env-race fix that keeps coverage parallel — tests/tools_approval_channels_raw_coverage_e2e.rs
orchestrator_tool_synthesis_covers_agent_and_integration_delegation_edges was the only test in the file not holding env_lock. It reads the process-global connection/toolkit registry while sibling tests swap OPENHUMAN_WORKSPACE under the lock; under llvm-cov's slower parallel run that intermittently trampled the integrations tool's toolkit list and dropped gmail_pro/slack_bot from the unknown-toolkit suggestion (tools_approval_channels_raw_coverage_e2e.rs:1689). Added the one-line let _lock = env_lock(); it was missing. env_lock guards this non-env global race without a blanket --test-threads=1, so the suite stays parallel.

2. Sub-agent stack overflow — src/core/cli.rs
Give tokio worker threads a 16 MiB stack. A single agent turn is a large async state machine; delegating to a sub-agent nests another full turn and overflows the default 2 MiB stack, SIGABRT-ing the JSON-RPC server mid-request (this is what crashed Playwright lane 1/4).

3. Hermetic github-reader test — tests/near90_closure_raw_coverage_e2e.rs
Add a failing git stub so the round20 reader test can't fall through to a real git clone of github.com (2.6 s hermetic vs ~120 s network fallback).

Verification

  • Locally: tools_channels_raw_coverage_e2e → 33/33 pass (parallel); affected binaries pass; core compiles clean.
  • CI: all required checks green on 657781a0Rust Core Coverage passed at ~17 min running parallel, Playwright lanes (incl. 1/4) green.

Submission Checklist

  • Tests added or updated — adds the missing env_lock guard + hermetic git stub; N/A new behavior (test/CI-stabilization).
  • Diff coverage ≥ 80%N/A: test + core-runtime-config only.
  • Coverage matrix updated — N/A: no feature change.
  • All affected feature IDs listed under ## RelatedN/A.
  • No new external network dependencies introduced — the git stub removes a real-network fallback.
  • Manual smoke checklist updated — N/A: test/CI only.
  • Linked issue closed via Closes #NNNN/A: CI test-debt.

Impact

  • CI: stabilizes Rust Core Coverage (was red on every PR) while keeping it parallel/fast.
  • Product: the only runtime change is the larger tokio worker stack in src/core/cli.rs, which prevents a sub-agent-delegation crash.

Related

sanil-23 and others added 2 commits June 1, 2026 20:10
… subagent delegation

The standalone `openhuman-core run` server builds its tokio runtime with the
default 2 MiB per-worker-thread stack. A single agent turn is already a very
large async state machine (system prompt + hundreds of tool specs + the nested
provider/tool loop); delegating to a sub-agent runs another full turn one level
down. Even with the inner sub-agent future boxed (`subagent_runner::ops`, see
tinyhumansai#2234), that nesting overflows the 2 MiB stack and aborts the whole process:

    thread 'tokio-rt-worker' (...) has overflowed its stack
    fatal runtime error: stack overflow, aborting        (SIGABRT)

This takes the JSON-RPC server down mid-request. In the Playwright web E2E lane
it manifests as `chat-harness-subagent` timing out (the orchestrator's final
text never renders) followed by a cascade of `ECONNREFUSED` failures across
every subsequent spec in the worker, because they all share the now-dead core.

Set `thread_stack_size(16 MiB)` on the serve runtime so a subagent-nested agent
turn fits comfortably.

Reproduced and verified locally by driving the real `openhuman-core` + mock +
web stack on isolated ports and running the subagent spec:
- before: core aborts ("overflowed its stack"), spec fails at ~52s
- after:  core stays alive, spec passes in 7.4s, no overflow in core.log

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… serialization)

The cargo-llvm-cov core-coverage job is red on every PR for two independent
reasons, both fixed here:

1. Stale assertions left by feature PRs (mostly tinyhumansai#3113's sync-flow redesign):
   - `toolkit_from_slug("MICROSOFT_TEAMS_*")` now returns the full toolkit
     `"microsoft_teams"`, not `"microsoft"` (3 sites).
   - memory schema/controller registries grew: legacy tree 19 -> 21,
     memory_sources 9 -> 10.
   - The `dedicated_thread` flag no longer short-circuits with a
     "temporarily disabled" message (see
     spawn_subagent::dedicated_thread_flag_no_longer_returns_disabled_error);
     two raw-coverage tests still asserted the old message.

2. Env-race flakes under llvm-cov: several raw-coverage tests set the
   process-global `OPENHUMAN_WORKSPACE` env var per-test. Under llvm-cov's
   slower instrumentation the parallel default lets them trample each other's
   workspace (flaky count/state assertions) — they pass single-threaded. Run
   the core llvm-cov suite with `--test-threads=1` (in both pr-ci.yml and
   coverage.yml) so these hermetic-only-when-serialized tests are stable.

Verified locally: the fixed assertions pass single-threaded, and the racing
tests pass under `--test-threads=1`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sanil-23 sanil-23 requested a review from a team June 1, 2026 15:11
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Increase Tokio worker thread stack size, force coverage tests to single-threaded, and update multiple E2E tests: Microsoft Teams toolkit canonicalization to microsoft_teams, schema/source counts, SpawnSubagentTool dedicated-thread output assertions, Composio enabled-status, and GitHub reader fixtures/URL patterns.

Changes

Runtime Configuration and Test Expectation Updates

Layer / File(s) Summary
Tokio Worker Thread Stack Size Configuration
src/core/cli.rs
Tokio multi-threaded runtime worker threads are configured with 16MiB stack size in run_server_command to prevent stack overflow when nested agent turns execute large async state machines.
CI Test Serialization via --test-threads=1
.github/workflows/coverage.yml, .github/workflows/pr-ci.yml
Both coverage and PR CI workflows are updated to run cargo llvm-cov with -- --test-threads=1, serializing test execution to avoid coverage instrumentation interference from process-global OPENHUMAN_WORKSPACE under parallel runs.
Toolkit Slug Canonicalization Test Expectations
tests/memory_raw_coverage_e2e.rs, tests/memory_threads_raw_coverage_e2e.rs, tests/memory_tree_sync_raw_coverage_e2e.rs
Test assertions updated to expect microsoft_teams as the canonical toolkit identifier for Microsoft Teams send slugs (handles whitespace/case variants).
SpawnSubagentTool Dedicated Thread Output Behavior
tests/tools_agent_credentials_state_raw_coverage_e2e.rs, tests/tools_composio_network_leftovers_raw_coverage_e2e.rs
Two E2E tests now assert the "temporarily disabled" message is absent from tool outputs for the dedicated_thread path and add comments documenting the changed expectation.
Memory Schema and Source Registry Count Assertions
tests/memory_threads_raw_coverage_e2e.rs
Test assertions increased: controller schema count from 19 to 21 and memory sources controller schema count from 9 to 10.
Composio Source Enabled Status Behavior
tests/memory_sources_closure_round23_raw_coverage_e2e.rs
E2E expectation updated: newly inserted Composio sources are not considered enabled until they have an active connection (enabled count changed from 1 to 0).
GitHub Reader Fixture and Fake gh Script
tests/memory_sources_readers_round21_raw_coverage_e2e.rs, tests/memory_sync_sources_raw_coverage_e2e.rs, tests/near90_closure_raw_coverage_e2e.rs, tests/memory_sync_sources_raw_coverage_e2e.rs, tests/owned_domain_raw_coverage_e2e.rs
Add failing git stub to PATH in tests, relax gh api route patterns to wildcard query forms, add specific issue/PR-comments handlers, update issue-body assertions to no longer expect inlined comments, and adapt Responses-API request payload expectation to structured content parts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • graycyrus

Poem

🐰 A rabbit hops through test suites with care,
Stacks widened so agents breathe fresh air,
Threads march single-file and fixtures align,
Teams find their proper name, tidy and fine,
Tests skip the "disabled" whisper — all is fair.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: stabilizing Rust Core Coverage CI by addressing stale assertions and adding test serialization to prevent environment-race flakes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. working A PR that is being worked on by the team. labels Jun 1, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 1, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanil-23 hey! the code looks good to me — the assertion updates are correct (Microsoft Teams toolkit slug fix across 3 test files, registry count bumps matching what's live in production, and the dedicated_thread behavior flip is the right call since the "temporarily disabled" short-circuit is gone). The --test-threads=1 serialization in both coverage.yml and pr-ci.yml is a clean fix for the global OPENHUMAN_WORKSPACE mutation race; slower but hermetic.

One thing: E2E lane 1/4 is currently failing and Frontend Coverage + Rust Core Coverage are still pending. Once CI is fully green I'll come back and approve this. Let me know if you need any help!

… (drop --test-threads=1)

Replace the blunt `--test-threads=1` coverage-job change with the per-test
`env_lock()` pattern already used elsewhere in these suites, so the llvm-cov
job keeps running tests in parallel (fast) and only the handful of tests that
mutate the process-global `OPENHUMAN_WORKSPACE` serialize against each other.

Four of the seven affected files already had an `env_lock()` helper — the
racing test in each just wasn't taking it; the other three get the same
self-contained helper. Compiles clean; the previously-flaky tests now hold the
lock across their workspace setup so parallel llvm-cov runs stay hermetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/near90_closure_raw_coverage_e2e.rs`:
- Around line 393-394: Remove the duplicated env_lock() call so the test only
acquires a single MutexGuard from ROUND20_ENV_LOCK; locate the two consecutive
lines calling env_lock() (both binding to _lock) in
tests/near90_closure_raw_coverage_e2e.rs and delete the second one so only one
call to env_lock() remains, preventing a second blocking lock on the
non-reentrant Mutex.

In `@tests/owned_domain_raw_coverage_e2e.rs`:
- Around line 878-886: The new ENV_LOCK and its env_lock() create a second,
uncoordinated mutex; remove ENV_LOCK and change env_lock() to use the existing
OWNED_DOMAIN_ENV_LOCK instead so the file shares the single process-global lock.
Specifically, delete the ENV_LOCK static and update env_lock() so it calls
OWNED_DOMAIN_ENV_LOCK.get_or_init(...).lock().unwrap_or_else(|e|
e.into_inner()), keeping the same return type and behavior but reusing the
existing OWNED_DOMAIN_ENV_LOCK.

In `@tests/worker_b_raw_coverage_e2e.rs`:
- Around line 578-579: There are two consecutive calls that bind the same name:
the duplicate let _lock = env_lock(); shadows the first and is redundant; remove
the second occurrence so only a single env_lock() acquisition remains for the
critical section (locate the duplicate let _lock = env_lock(); lines around the
test case in tests/worker_b_raw_coverage_e2e.rs and delete the second binding),
ensuring the test keeps one _lock variable to guard the environment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 00401edc-5a30-4c2c-a3ef-f901eeeec5b0

📥 Commits

Reviewing files that changed from the base of the PR and between d5d5a64 and 543b847.

📒 Files selected for processing (8)
  • src/core/cli.rs
  • tests/memory_threads_raw_coverage_e2e.rs
  • tests/memory_tree_sync_raw_coverage_e2e.rs
  • tests/near90_closure_raw_coverage_e2e.rs
  • tests/owned_domain_raw_coverage_e2e.rs
  • tests/tool_registry_approval_raw_coverage_e2e.rs
  • tests/tools_composio_network_leftovers_raw_coverage_e2e.rs
  • tests/worker_b_raw_coverage_e2e.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/tools_composio_network_leftovers_raw_coverage_e2e.rs

Comment thread tests/near90_closure_raw_coverage_e2e.rs Outdated
Comment thread tests/owned_domain_raw_coverage_e2e.rs Outdated
Comment thread tests/worker_b_raw_coverage_e2e.rs Outdated
…env_lock (drop --test-threads=1)"

This reverts commit 543b847.
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanil-23 continuation — found the remaining CI failure.

Rust Core Coverage is still red: one more stale assertion

The --test-threads=1 fix works — the env-race tests all pass now. But there's a stale assertion in a file this PR didn't touch:

thread 'round23_memory_sources_status_registry_and_readers_cover_remaining_edges'
  panicked at tests/memory_sources_closure_round23_raw_coverage_e2e.rs:223
  assertion left == right failed: left: 0, right: 1

memory_sources_closure_round23_raw_coverage_e2e.rs:223 expects a count of 1 but gets 0 — same pattern as the other stale assertions from #3113's sync-flow redesign. Update that assertion to match current behavior and this should be the last blocker.

src/core/cli.rs — thread_stack_size(16 MiB)

I missed this in the first pass since it came in via a merged branch. The change is correct: nested async state machines for subagent delegation genuinely overflow the 2 MiB default. 16 MiB is generous but within reason given the explanation in the commit message. No concerns.

env_lock detour

The approach in commit 543b847 was correctly reverted. CodeRabbit caught real bugs (double-lock deadlock in near90_closure and worker_b, uncoordinated mutexes across test binaries in owned_domain). The --test-threads=1 CI flag is the right solution — it's simpler and doesn't risk per-binary lock coordination problems.

Fix memory_sources_closure_round23_raw_coverage_e2e.rs:223 and CI should be clean.

…sync redesign)

memory_sources_closure_round23: a freshly-inserted Composio source is no
longer reported by list_enabled_by_kind until it has an active connection,
so the count is 0, not 1. Aligns the last stale raw-coverage assertion with
the behavior shipped in tinyhumansai#3113.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sanil-23
Copy link
Copy Markdown
Contributor Author

sanil-23 commented Jun 1, 2026

Thanks @graycyrus — that's exactly it. Updated memory_sources_closure_round23_raw_coverage_e2e.rs:223 to assert 0 (a freshly-inserted Composio source isn't reported as enabled until it has an active connection, per #3113). Verified locally (passes single-threaded). And agreed on all three points — --test-threads=1 over the per-binary env_lock, and the thread_stack_size fix. Should be green now.

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 1, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanil-23 hey! the code looks good to me — the prior stale assertion in memory_sources_closure_round23_raw_coverage_e2e.rs:223 is now fixed, and all the other test assertion updates + the Tokio stack bump are correct.

CI is still pending on a few jobs though (Rust Core Coverage, E2E lanes, Frontend Coverage, etc.), so i'll hold off on approval until those clear. once the coverage job turns green, i'll come back and approve this. let me know if you need any help!

Quick notes on what i checked:

  • Stale assertions all addressed: Microsoft Teams toolkit slug → "microsoft_teams", legacy tree schemas 19→21, memory_sources 9→10, dedicated_thread behavior flip, composio enabled count 1→0.
  • Workflow --test-threads=1 change is correct for serializing env-mutating coverage tests.
  • Tokio thread stack increase to 16 MiB is justified and properly commented (subagent stacking issue).
  • No breaking changes, security issues, or production behavior changes beyond the stack size (which prevents crashes).

@sanil-23
Copy link
Copy Markdown
Contributor Author

sanil-23 commented Jun 1, 2026

@graycyrus — composio fix is in and CI got past it. The remaining red is not another stale assertion — it's a flaky round21_github_reader test (and its 2 siblings memory_sync_sources / near90_closure share the pattern). Root cause:

GithubReader::list_items fetches commits via git clonegh api → real reqwest to api.github.com, in that fallback order. These tests only stub gh (and not fully). So on a CI runner with network:

  1. The real git clone --bare of tinyhumansai/openhuman succeeds → returns the repo's real commits (no commit:abc123) → assertion fails. (Confirmed: the failing runs take ~20s+ on this one test; passing runs are instant. Same commit c2af5bc7 passed in run 26767540663 and failed in 26767542022 — textbook flake.)
  2. Stubbing git alone isn't enough — the gh fallback then hits real api.github.com (the fake gh only handles api/--version, not auth status, and the fixture's per_page=30 may not match the reader's call), so it still misses abc123.

So full isolation needs faking git and completing the gh stub (auth status + matching per_page) across all three tests. That's reader-testability work I didn't want to half-do blind — happy to take a crack if you'd like, but flagging since it's pre-existing and separate from the #3113 stale-assertion fixes this PR targets. All the deterministic stale assertions are now fixed.

…tion

The github reader fetches commits via a real `git clone` first (only falling
back to the fake `gh` on error), and `fetch_github` falls back to the real
api.github.com on any gh path-mismatch. So this test was flaky: with network,
the real clone returned the repo's real commits (no abc123). Fixes:
- Add a failing `git` stub on PATH so the reader deterministically uses the
  fake `gh` (no real clone, no network).
- Glob the fake-`gh` list/comments path patterns so they match the reader's
  actual `?per_page=..&page=..` query strings (the exact patterns missed them
  and silently fell through to the real API).
- Drop the stale `## Comments` assertion: tinyhumansai#3113 stopped inlining comments into
  the rendered issue body (fetch_issue_comments is now unused); assert on the
  content read_issue still renders.

Verified: passes deterministically (3x, ~0.3s, no network).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 1, 2026
Same fix as round21, applied to the two sibling github-reader tests:
- Add a failing `git` stub on PATH so GithubReader can't reach the real
  `git clone` of github.com and deterministically uses the fake `gh`.
- Glob the fake-`gh` list/comments path patterns so they match the reader's
  actual `?per_page=..&page=..` queries instead of silently falling through
  to the real api.github.com.
- memory_sync_sources: drop the stale `## Comments` assertion (tinyhumansai#3113 stopped
  inlining comments into the issue body); assert on the rendered content.

Verified: both pass deterministically (~0.7s, no network).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two new commits reviewed (github-reader test hermetic fixes). Looks solid.

Summary:

  • Added fake git stub so the reader can't fall back to real git clone — tests now use mocked gh deterministically
  • Updated fake-gh case patterns from exact strings to glob patterns (\?*) so they match the reader's actual query strings with ?per_page=...&page=... params
  • Dropped stale ## Comments assertions (no longer inlined per #3113); now assert on the rendered issue content instead

All prior findings are resolved. Code is clean — no debug, no console.logs, changes are minimal and focused.

CI is still pending on Rust Core Coverage, Frontend Coverage, and a couple E2E lanes. Once those clear, I'll come back and APPROVE. Let me know if you run into any blockers.

sanil-23 and others added 2 commits June 1, 2026 23:49
owned_domain coverage asserted the responses fallback request's
/input/0/content equals "fallback please" (a plain string). The
OpenAI-compatible responses payload now sends structured content parts
([{text, type:"input_text"}], tinyhumansai#2748/tinyhumansai#3124), so assert that shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
round22_cron_add_tool used a hardcoded `at: 2026-05-31T00:00:00Z` schedule,
which is now in the past, so cron creation fails validation ("'at' must be
in the future"). Compute the `at` time at runtime (now + 30 days) so the
test never expires again.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/memory_sync_sources_raw_coverage_e2e.rs (1)

314-324: ⚖️ Poor tradeoff

Consider extracting the git stub creation into a shared test utility.

The identical git stub creation code (lines 314-324) appears in both tests/memory_sync_sources_raw_coverage_e2e.rs and tests/near90_closure_raw_coverage_e2e.rs:433-443. Extracting to a shared helper function would eliminate duplication.

However, given this is test fixture code with limited reuse scope, the refactor is optional and can be deferred.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/memory_sync_sources_raw_coverage_e2e.rs` around lines 314 - 324,
Extract the duplicated git stub creation into a shared test helper (e.g., fn
create_git_stub(bin: &Path) -> PathBuf in a tests helper module) and replace the
inline block that creates `git_stub` with a call to that helper; the helper
should perform the same actions (write the script with std::fs::write, and on
Unix set executable perms using std::os::unix::fs::PermissionsExt / set_mode,
then return the PathBuf to the created stub) so both
`tests::memory_sync_sources_raw_coverage_e2e` and
`tests::near90_closure_raw_coverage_e2e` call `create_git_stub` instead of
duplicating the code that assigns `git_stub`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/memory_sync_sources_raw_coverage_e2e.rs`:
- Around line 314-324: Extract the duplicated git stub creation into a shared
test helper (e.g., fn create_git_stub(bin: &Path) -> PathBuf in a tests helper
module) and replace the inline block that creates `git_stub` with a call to that
helper; the helper should perform the same actions (write the script with
std::fs::write, and on Unix set executable perms using
std::os::unix::fs::PermissionsExt / set_mode, then return the PathBuf to the
created stub) so both `tests::memory_sync_sources_raw_coverage_e2e` and
`tests::near90_closure_raw_coverage_e2e` call `create_git_stub` instead of
duplicating the code that assigns `git_stub`.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ea4a364d-c80e-40c9-9146-be3df10039fc

📥 Commits

Reviewing files that changed from the base of the PR and between 695bacc and be29029.

📒 Files selected for processing (3)
  • tests/memory_sync_sources_raw_coverage_e2e.rs
  • tests/near90_closure_raw_coverage_e2e.rs
  • tests/owned_domain_raw_coverage_e2e.rs

tinyhumansai#2952 added a debug_assert in ApprovalGate::new requiring session_id to start
with `session-` (the per-launch UUID shape) to guard against credential leaks.
The approval coverage tests init the global gate with ids like
`approval-raw-e2e-session` / `worker-b-approval-session`, which trip the assert
when that test happens to initialize the global gate first (order-dependent —
why it read as a flake). Prefix the test session_ids with `session-` so they
satisfy the guard regardless of test order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sanil-23
Copy link
Copy Markdown
Contributor Author

sanil-23 commented Jun 1, 2026

@graycyrus Rust Core Coverage is green now. Beyond the composio assertion you flagged, CI surfaced (one per run, since it stops at the first failure) a chain of #3113/#2748-era stale + flaky tests, all now fixed:

Verified the whole suite locally (--no-fail-fast --test-threads=1): only the macOS-cfg seatbelt test 'fails', which can't run on Linux. Ready for another look.

sanil-23 and others added 3 commits June 2, 2026 01:14
…tests

# Conflicts:
#	.github/workflows/coverage.yml
#	.github/workflows/pr-ci.yml
#	tests/memory_sources_closure_round23_raw_coverage_e2e.rs
#	tests/memory_sources_readers_round21_raw_coverage_e2e.rs
#	tests/memory_sync_sources_raw_coverage_e2e.rs
#	tests/memory_threads_raw_coverage_e2e.rs
#	tests/owned_domain_raw_coverage_e2e.rs
#	tests/tool_registry_approval_raw_coverage_e2e.rs
#	tests/tools_agent_credentials_state_raw_coverage_e2e.rs
#	tests/tools_composio_network_leftovers_raw_coverage_e2e.rs
#	tests/tools_composio_round22_raw_coverage_e2e.rs
#	tests/worker_b_raw_coverage_e2e.rs
The merge took main's parallel workflow, but env_lock only guards env
vars — not other process-global state like the toolkit/connection
registry exercised by tools_approval_channels. Under llvm-cov's slower
instrumentation those non-env races flake (unknown_toolkit suggestion
assertion). Restore the serialized run (proven green at 13d9cbf,
endorsed by graycyrus) while keeping main's --no-fail-fast + build jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce fix)

Replaces the --test-threads=1 serialization with a surgical lock so the
coverage job stays parallel/fast. The test read the process-global
connection/toolkit registry without holding env_lock, while sibling
tests swap OPENHUMAN_WORKSPACE under it — under llvm-cov's slower
parallel run that trampled the integrations tool's toolkit list and
dropped gmail_pro/slack_bot from the unknown-toolkit suggestion. It was
the only test in the file missing the lock. Workflows revert to main's
parallel run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sanil-23
Copy link
Copy Markdown
Contributor Author

sanil-23 commented Jun 1, 2026

Rebased on latest main — conflicts resolved, CI green ✅

Merged the latest upstream/main (which advanced past #3074 "separate agent action sandbox from internal workspace state") and resolved all 12 conflicting test files.

Conflict resolution

#3074 independently fixed the same stale raw-coverage assertions this PR targeted, and its versions are canonical (proper uuid::Uuid::new_v4() session gates, far-future cron literals, robust !output().is_empty() checks, globbed commits\?*) gh-stub matching). I took upstream/main's version for all 12 conflicts — those stale-test fixes are now upstream, so this PR no longer duplicates them.

What this PR still uniquely carries (diff vs main)

  1. tests/tools_approval_channels_raw_coverage_e2e.rsorchestrator_tool_synthesis_covers_agent_and_integration_delegation_edges was the only test in the file not holding env_lock. It reads the process-global connection/toolkit registry while sibling tests swap OPENHUMAN_WORKSPACE under the lock, so under cargo-llvm-cov's slower parallel run it intermittently saw the wrong toolkit set and dropped gmail_pro/slack_bot from the unknown-toolkit suggestion. Added the one-line let _lock = env_lock(); it was missing. This keeps the coverage job parallel and fastenv_lock covers the non-env global race that a blanket --test-threads=1 would otherwise be needed for.
  2. src/core/cli.rs — 16 MiB tokio worker stack so nested sub-agent turns don't overflow the default 2 MiB stack and SIGABRT the JSON-RPC server (fixes the Playwright lane-1/4 subagent crash).
  3. tests/near90_closure_raw_coverage_e2e.rs — failing git stub so the round20 github-reader test is hermetic (2.6 s vs ~120 s real-network fallback).

CI

All required checks green on 657781a0 (Rust Core Coverage passed at ~17 min, parallel; Playwright lanes incl. 1/4 green; one unrelated settings-feature-preferences E2E flake cleared on re-run).

@graycyrus — the earlier CHANGES_REQUESTED was against the reverted env_lock infra (which caused the double-lock deadlocks you flagged). That approach is gone; the only env-lock change now is the single missing-lock line above. Could you take another look? 🙏

@sanil-23 sanil-23 requested a review from graycyrus June 1, 2026 21:26
@sanil-23 sanil-23 merged commit 511ad84 into tinyhumansai:main Jun 1, 2026
35 of 36 checks passed
senamakel pushed a commit to senamakel/openhuman that referenced this pull request Jun 2, 2026
…tion) (tinyhumansai#3156)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants