feat(memory_sources): add Memory Sources domain — user-configurable data connectors#2893
Conversation
📝 WalkthroughWalkthroughAdds a Memory Sources domain (types, readers, registry, RPC/schemas, sync/status), frontend registry and add-source dialog, TypeScript client and tests, i18n keys, Composio reconciliation on startup/post-OAuth, and Rust e2e tests exercising CRUD, listing, reads, ingestion, and Composio registry flows. ChangesMemory Sources Feature
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant TSClient as memorySourcesService
participant CoreRpc
participant Registry
participant Reader
participant DB as MemTree
User->>Frontend: click "Add" / "Sync"
Frontend->>TSClient: addMemorySource() / syncMemorySource()
TSClient->>CoreRpc: openhuman.memory_sources_add / openhuman.memory_sources_sync
CoreRpc->>Registry: add_source() / sync_source()
Registry-->>CoreRpc: source entry / ok
CoreRpc->>Reader: reader_for(kind).list_items() / read_item()
Reader-->>DB: ingest_document(...) (via sync)
DB-->>CoreRpc: ingestion results
CoreRpc-->>TSClient: RpcOutcome
TSClient-->>Frontend: success / status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
|
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/openhuman/memory_sync/composio/bus.rs (1)
558-563:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAuto-registration is skipped for toolkits without a provider.
The early
returnin theget_providermiss path exits beforeupsert_composio_source(...), so many valid connections never appear inmemory_sources.💡 Proposed fix
- let Some(provider) = get_provider(&toolkit) else { + let provider = get_provider(&toolkit); + if provider.is_none() { tracing::debug!( toolkit = %toolkit, "[composio:bus] no provider registered for toolkit; cache refreshed, no provider hook to dispatch" ); - return; - }; + } - if let Err(e) = provider.on_connection_created(&ctx).await { - tracing::warn!( - toolkit = %toolkit, - connection_id = %connection_id, - error = %e, - "[composio:bus] provider on_connection_created failed" - ); - } else { - // Successful connection-created sync — record the - // timestamp so the periodic scheduler doesn't - // immediately re-fire for this connection. - super::periodic::record_sync_success(&toolkit, &connection_id); + if let Some(provider) = provider { + if let Err(e) = provider.on_connection_created(&ctx).await { + tracing::warn!( + toolkit = %toolkit, + connection_id = %connection_id, + error = %e, + "[composio:bus] provider on_connection_created failed" + ); + } else { + super::periodic::record_sync_success(&toolkit, &connection_id); + } }Also applies to: 580-596
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sync/composio/bus.rs` around lines 558 - 563, The early return in the composio bus path when get_provider(&toolkit) returns None prevents calling upsert_composio_source and thus skips auto-registration of valid connections; modify the branch so that when get_provider(&toolkit) is None you still call upsert_composio_source(...) to insert/update the connection into memory_sources (preserve the existing tracing::debug log), and then continue without attempting to dispatch provider hooks. Apply the same change to the analogous block around the 580-596 region so both miss-paths auto-register via upsert_composio_source while avoiding the provider dispatch.
🧹 Nitpick comments (5)
src/openhuman/memory_sources/readers/composio.rs (1)
46-65: ⚡ Quick winAdd a structured debug log in
read_itemfor flow parity.
read_itemis a new/changed flow but currently has no diagnostic logging, whilelist_itemsalready does. Adding one namespaced debug event here improves traceability when source sync behavior is investigated.Proposed patch
async fn read_item( &self, source: &MemorySourceEntry, item_id: &str, _config: &Config, ) -> Result<SourceContent, String> { let toolkit = source.toolkit.as_deref().unwrap_or("unknown"); + tracing::debug!( + toolkit = %toolkit, + item_id = %item_id, + connection_id = ?source.connection_id, + "[memory_sources:composio] read_item" + ); Ok(SourceContent { id: item_id.to_string(), title: format!("{toolkit} sync data"),As per coding guidelines: "Use
log/tracingatdebug/tracelevels for verbose diagnostics on new/changed flows in Rust; never log secrets or full PII—redact sensitive data."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sources/readers/composio.rs` around lines 46 - 65, Add a namespaced debug log at the start of the async fn read_item to mirror list_items' diagnostics: use the tracing (or log) crate at debug or trace level, include non-sensitive identifiers like toolkit and item_id, and redact or omit sensitive fields such as full connection_id (e.g., log a truncated or hashed connection_id). Target the read_item function and ensure the log message clearly states it's entering read_item for the given toolkit and item_id to aid traceability.src/openhuman/memory_sources/readers/web_page.rs (1)
92-97: ⚖️ Poor tradeoffTitle extraction is fragile.
The
extract_titlefunction uses simple string searching without handling HTML entities or attributes on the<title>tag. For example:
<title lang="en">Page</title>will fail (finds<titlebut the>search includes the attribute)- HTML entities like
<in the title won't be decoded.Consider using a proper HTML parsing library (like
scrapermentioned in the comments) or add attribute handling.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sources/readers/web_page.rs` around lines 92 - 97, The extract_title function is too fragile: update extract_title to use a robust HTML parser (e.g., the scraper crate) instead of manual string searches so it correctly handles attributes on <title> (e.g., <title lang="en">) and decodes HTML entities; locate the extract_title function in web_page.rs and replace the manual find-based logic with parsing that selects the <title> element (or falls back safely) and returns its text content with HTML entities decoded.src/openhuman/memory_sources/types.rs (2)
97-99: ⚡ Quick winConsider path validation for security.
The
Folderkind requires apathfield but does not validate it here. Ensure the folder reader (e.g.,readers/folder.rs) prevents path traversal attacks by canonicalizing paths and checking they don't escape allowed boundaries.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sources/types.rs` around lines 97 - 99, The Folder arm (SourceKind::Folder) only requires a path via require_field(&self.path, "path")? but lacks validation; update the folder reader (readers/folder.rs) to canonicalize the provided path (e.g., std::fs::canonicalize) and verify it is inside an allowed base directory (using Path::starts_with or strip_prefix to ensure it does not escape the boundary) and reject or return an error if canonicalization fails or the path is outside the allowed root to prevent path traversal.
100-111: ⚡ Quick winConsider URL validation for external source kinds.
Source kinds that use
url(GithubRepo, TwitterQuery, RssFeed, WebPage) do not validate URL format at the type level. Confirm that readers validate URLs before making network requests to prevent SSRF or invalid URL errors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/memory_sources/types.rs` around lines 100 - 111, The SourceKind branches that call require_field(&self.url, "url") (GithubRepo, RssFeed, WebPage) need to also validate that self.url is a well-formed, safe URL before any network use: inside the same validation function where require_field is used (the impl that validates SourceKind / the struct containing url), parse the value with a strict URL parser (e.g., url::Url::parse) and ensure the scheme is http or https (reject other schemes and invalid parses), returning a clear validation error; do not change TwitterQuery handling (it uses query, not url) but add a unit test for each invalid/unsupported URL case and for a valid http/https URL to ensure the new checks work.app/src/components/settings/panels/__tests__/MemoryDataPanel.test.tsx (1)
95-103: ⚡ Quick winRestore this error-path test instead of skipping it.
Skipping this case removes coverage for a user-facing resilience requirement (preset controls should remain accessible during sync errors). Please re-enable it with deterministic stubbing of the current memory-sources status path.
As per coding guidelines: “keep tests deterministic with no real network calls or time-sensitive flakes” and “prefer behavior over implementation in unit tests”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@app/src/components/settings/panels/__tests__/MemoryDataPanel.test.tsx` around lines 95 - 103, Unskip the test (change it.skip to it) and make the sync-connections/status path deterministic by stubbing the network/hook that returns memory-sources status to produce the error case ("network timeout") while keeping other dependencies normal (keep resolveConfigWith('balanced') and renderWithProviders(<MemoryDataPanel />) as-is); implement this by mocking the module or function that MemoryDataPanel uses to fetch sync connections (e.g., the memory-sources status fetch hook or API client) with jest.mock/jest.spyOn or MSW to return a rejected promise or error response, then assert the UI shows the error and that preset buttons remain accessible. Ensure the mock only affects the sync connections status endpoint and does not perform real network calls so the test is deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/intelligence/AddMemorySourceDialog.tsx`:
- Around line 319-320: In AddMemorySourceDialog, replace all hard-coded
user-visible placeholders (e.g. "/Users/you/notes", "**/*.md", URL examples,
"article", "from:user AI safety", etc.) with i18n lookups using the useT() hook
(e.g. t('addMemorySource.placeholderPath')) and pass those t(...) results into
the JSX placeholder props; add corresponding keys to the locale resource files
and ensure useT() is imported and invoked in the AddMemorySourceDialog component
so inputs like the path/file/glob/type/search placeholders call t('...') instead
of embedding literals.
- Around line 61-75: The effect currently calls setLoadingConnections(true)
synchronously in the effect body; move that state update into the async IIFE
(make it the first statement inside the async function) so state changes happen
asynchronously, and keep the existing calls to listConnections(),
setConnections(resp.connections),
setError(t('memorySources.composioListFailed')), and
setLoadingConnections(false) inside that async block; additionally, add a
mounted flag (e.g., let mounted = true; and check mounted before calling
setConnections/setError/setLoadingConnections, and set mounted = false in a
returned cleanup) to avoid updating state after unmount — target the useEffect
that references kind, listConnections, setLoadingConnections, setConnections,
and setError.
In `@app/src/components/intelligence/MemorySourcesRegistry.tsx`:
- Around line 313-321: The relative-time string literals in the time formatting
block (e.g., 'just now', `${seconds}s ago`, `${minutes}m ago`, `${hours}h ago`,
`${days}d ago`) must be replaced with i18n strings via the useT() hook used in
this component (MemorySourcesRegistry). Import/use useT() (or the existing t
instance) and replace each literal with t('...') keys (for example:
t('relative.justNow'), t('relative.secondsAgo', {count: seconds}),
t('relative.minutesAgo', {count: minutes}), etc.), formatting the numeric
interpolation through the t call; ensure you update the keys consistently and
use the same unique function/block (the relative-time formatter) so all
UI-visible strings come from i18n rather than hard-coded literals.
- Around line 120-130: Replace hard-coded toast strings in MemorySourcesRegistry
with localized strings via the useT() hook: import/use useT() in the component
(e.g. const t = useT()), then change the success title/message and error title
to use t(...) with interpolation of source.label (for example
t('syncing_with_label', { label: source.label }) and
t('progress_will_appear_shortly')) and t('sync_failed_with_label', { label:
source.label }); keep the error.message as the toast message value but ensure
the toast title and static message call t(...) instead of literal text; update
the onToast calls that reference onToast, refresh, and source.label accordingly.
In `@app/src/services/memorySourcesService.ts`:
- Around line 162-169: SOURCE_KIND_LABELS currently contains hard-coded English
UI strings; replace these values with i18n keys (e.g.,
'memorySources.kind.folder') or move the mapping into the component layer so
labels are resolved with useT(); specifically update the SOURCE_KIND_LABELS
constant in memorySourcesService (or remove it) and ensure any component that
reads it calls useT() from I18nContext to translate the keys before rendering
(use the same keys shown in the comment like 'memorySources.kind.folder',
'memorySources.kind.github_repo', etc.).
In `@src/openhuman/memory_sources/readers/folder.rs`:
- Around line 96-112: In read_item, the file is read without enforcing
MAX_FILE_SIZE (unlike list_items), allowing large reads; fix by checking the
file size via tokio::fs::metadata or std::fs::metadata on file_path (after
canonicalization) and returning an error if metadata.len() > MAX_FILE_SIZE
before calling tokio::fs::read_to_string; reference the read_item function,
MAX_FILE_SIZE constant, file_path and canonical_file/canonical_base checks and
ensure the error message matches existing style (e.g., "file too large" or
similar).
In `@src/openhuman/memory_sources/readers/github.rs`:
- Around line 23-30: The parse_github_url function currently infers owner/repo
from trailing segments and accepts extra path parts; change it to strictly
validate the URL shape: parse the input with Url (or equivalent), ensure scheme
is http or https and the host is "github.com" (allow optional "www"), split the
path into segments and require exactly two non-empty segments (owner and repo,
trimming a trailing ".git" on the repo), and return Err if the URL contains
extra segments like "tree" or "blob" or has fewer/more than two segments; update
the error message to clearly state invalid GitHub repo URL when rejecting.
- Around line 94-100: The GitHub reader currently creates the reqwest client
with reqwest::Client::new() and calls .send().await without any timeout, so wrap
the request with a timeout: either build the client with
reqwest::Client::builder().timeout(Duration::from_secs(10)).build()? and replace
reqwest::Client::new() or call .timeout(Duration::from_secs(10)) on the
RequestBuilder before .send(); add use std::time::Duration and pick a sensible
timeout (e.g., 10s) so the .send().await for the client/request will error
instead of blocking indefinitely.
In `@src/openhuman/memory_sources/readers/rss.rs`:
- Around line 35-38: The logs are printing the full url variable (using %url)
which can leak query params/credentials; update the logging in the RSS reader
calls that include "%url" (the log that currently prints url alongside max_items
and the later log at lines ~57-60) to instead log a redacted host-only context:
parse the url variable (e.g., via url::Url) and emit only the scheme+host (or
host and port) or a redacted placeholder, then use that redacted value in the
existing log messages (preserving the same log text like "[memory_sources:rss]
listing items") so sensitive query strings/credentials are never logged.
- Around line 89-95: The fetch_url function creates a reqwest client with
Client::new() and lacks a deadline; change client creation in fetch_url to use
Client::builder() with an explicit timeout (e.g., Duration::from_secs(10)) or
set a per-request timeout via the RequestBuilder::timeout call on the get()
request so slow feeds don't hang; ensure you add the std::time::Duration import
and update the client/request usage in fetch_url accordingly.
In `@src/openhuman/memory_sources/readers/twitter.rs`:
- Around line 32-36: The current reader accepts Some("") or whitespace-only
values because it only checks for None; after retrieving the query string (the
expression using source.query.as_deref() -> query) trim the string and reject
empty results: replace the simple ok_or(...) with logic that obtains the &str,
trims it, and returns an error if the trimmed string is empty (so the reader
returns the same "twitter source requires a query" error for None or
whitespace-only input). Keep the rest of the function (including _since_days /
DEFAULT_SINCE_DAYS) unchanged.
In `@src/openhuman/memory_sources/readers/web_page.rs`:
- Around line 59-75: Validate and constrain fetched URLs and response size:
parse and validate the url variable (use url::Url::parse) and return an error
unless the scheme is "http" or "https"; after sending the request with
reqwest::Client (the client/get(&url)/send() flow) check
resp.headers().get(CONTENT_LENGTH) and fail if content length exceeds a
configured max, and when reading the body avoid resp.text().await for unbounded
reads—read via a streaming API (bytes_stream or resp.chunk()/resp.take) and
enforce a maximum byte limit while accumulating into body so giant responses are
rejected; keep the existing error messages but include early rejections for
invalid schemes and oversized responses.
In `@src/openhuman/memory_sources/reconcile.rs`:
- Around line 80-83: short_id currently slices by bytes which can break on
UTF-8; change it to slice on a char boundary: compute the number of chars to
skip (e.g., id.chars().count().saturating_sub(8)), get the byte offset of that
char with id.char_indices().nth(skip).map(|(idx,_)| idx) and then return
&id[offset..] (or id if nth returns None). Update the short_id function to use
this char-aware approach so slicing is UTF-8 safe.
In `@src/openhuman/memory_sources/status.rs`:
- Around line 63-80: The query_row call in source_status that binds to (synced,
pending, last_ts) currently uses .unwrap_or((0, 0, None)), which swallows real
SQL errors and returns misleading zeros; change the code to handle the Result
from conn.query_row explicitly (via match/try/? or map_err) so SQL errors are
propagated or logged instead of masked — update the query_row usage that
constructs (synced, pending, last_ts) in function source_status and apply the
same fix to the similar block around lines 91-95, returning or bubbling up the
error (or converting it to an appropriate Result) rather than defaulting to
(0,0,None).
In `@src/openhuman/memory_sources/sync.rs`:
- Around line 24-80: Add debug-level tracing to the sync flow to improve
correlation and visibility: in sync_source, emit a tracing::debug when
queueing/spawning the task that includes source_id and kind; inside the spawned
task log a debug before dispatching the match (include which arm will run, i.e.,
sync_composio or sync_via_reader or Twitter fallback); inside the
sync_via_reader path add a debug with the number of readers/urls being processed
(reader list size) and any per-reader dispatch info; and finally emit a debug
after outcome OK with the ingested item count and after Err with the error (in
addition to existing emit_sync_stage), always attaching structured fields
source_id and kind; use tracing::debug and keep messages concise and
development-oriented while leaving emit_sync_stage logic unchanged.
In `@src/openhuman/memory_sync/composio/mod.rs`:
- Around line 60-76: The current branch returns early whenever registry_sources
is non-empty, even if filtering yields zero valid SyncTarget entries; change the
logic in the block that references registry_sources, get_composio_sync_provider,
and SyncTarget to first build a Vec<SyncTarget> by mapping/filtering
registry_sources (collect into filtered_targets), then if filtered_targets is
non-empty return Ok(filtered_targets), otherwise do not short-circuit and allow
the fallback path to run; update the tracing::debug to report
filtered_targets.len() (or only log when returning) so behavior is preserved and
empty filtered results fall through to the active Composio connections fallback.
In `@tests/memory_sources_e2e.rs`:
- Around line 485-489: The live GitHub E2E test
memory_sources_github_repo_activity_flow is currently always run and causes CI
flakiness; either mark the test ignored by default by adding the #[ignore]
attribute above the async fn memory_sources_github_repo_activity_flow or gate it
behind an explicit environment flag (e.g., check std::env::var("RUN_GITHUB_E2E")
at the start of memory_sources_github_repo_activity_flow and early-return or
call tokio::spawn/skip when not set), and update the test doc comment to note
how to run it locally; apply the same change to the other tests in the same
block (lines covering the github flow).
---
Outside diff comments:
In `@src/openhuman/memory_sync/composio/bus.rs`:
- Around line 558-563: The early return in the composio bus path when
get_provider(&toolkit) returns None prevents calling upsert_composio_source and
thus skips auto-registration of valid connections; modify the branch so that
when get_provider(&toolkit) is None you still call upsert_composio_source(...)
to insert/update the connection into memory_sources (preserve the existing
tracing::debug log), and then continue without attempting to dispatch provider
hooks. Apply the same change to the analogous block around the 580-596 region so
both miss-paths auto-register via upsert_composio_source while avoiding the
provider dispatch.
---
Nitpick comments:
In `@app/src/components/settings/panels/__tests__/MemoryDataPanel.test.tsx`:
- Around line 95-103: Unskip the test (change it.skip to it) and make the
sync-connections/status path deterministic by stubbing the network/hook that
returns memory-sources status to produce the error case ("network timeout")
while keeping other dependencies normal (keep resolveConfigWith('balanced') and
renderWithProviders(<MemoryDataPanel />) as-is); implement this by mocking the
module or function that MemoryDataPanel uses to fetch sync connections (e.g.,
the memory-sources status fetch hook or API client) with jest.mock/jest.spyOn or
MSW to return a rejected promise or error response, then assert the UI shows the
error and that preset buttons remain accessible. Ensure the mock only affects
the sync connections status endpoint and does not perform real network calls so
the test is deterministic.
In `@src/openhuman/memory_sources/readers/composio.rs`:
- Around line 46-65: Add a namespaced debug log at the start of the async fn
read_item to mirror list_items' diagnostics: use the tracing (or log) crate at
debug or trace level, include non-sensitive identifiers like toolkit and
item_id, and redact or omit sensitive fields such as full connection_id (e.g.,
log a truncated or hashed connection_id). Target the read_item function and
ensure the log message clearly states it's entering read_item for the given
toolkit and item_id to aid traceability.
In `@src/openhuman/memory_sources/readers/web_page.rs`:
- Around line 92-97: The extract_title function is too fragile: update
extract_title to use a robust HTML parser (e.g., the scraper crate) instead of
manual string searches so it correctly handles attributes on <title> (e.g.,
<title lang="en">) and decodes HTML entities; locate the extract_title function
in web_page.rs and replace the manual find-based logic with parsing that selects
the <title> element (or falls back safely) and returns its text content with
HTML entities decoded.
In `@src/openhuman/memory_sources/types.rs`:
- Around line 97-99: The Folder arm (SourceKind::Folder) only requires a path
via require_field(&self.path, "path")? but lacks validation; update the folder
reader (readers/folder.rs) to canonicalize the provided path (e.g.,
std::fs::canonicalize) and verify it is inside an allowed base directory (using
Path::starts_with or strip_prefix to ensure it does not escape the boundary) and
reject or return an error if canonicalization fails or the path is outside the
allowed root to prevent path traversal.
- Around line 100-111: The SourceKind branches that call
require_field(&self.url, "url") (GithubRepo, RssFeed, WebPage) need to also
validate that self.url is a well-formed, safe URL before any network use: inside
the same validation function where require_field is used (the impl that
validates SourceKind / the struct containing url), parse the value with a strict
URL parser (e.g., url::Url::parse) and ensure the scheme is http or https
(reject other schemes and invalid parses), returning a clear validation error;
do not change TwitterQuery handling (it uses query, not url) but add a unit test
for each invalid/unsupported URL case and for a valid http/https URL to ensure
the new checks work.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8ef641b7-c820-4ee0-aa3c-4941f152ec2b
📒 Files selected for processing (49)
app/src/components/intelligence/AddMemorySourceDialog.tsxapp/src/components/intelligence/MemorySources.tsxapp/src/components/intelligence/MemorySourcesRegistry.tsxapp/src/components/intelligence/MemorySyncConnections.test.tsxapp/src/components/intelligence/MemorySyncConnections.tsxapp/src/components/intelligence/MemoryWorkspace.tsxapp/src/components/intelligence/__tests__/MemoryWorkspace.test.tsxapp/src/components/settings/panels/__tests__/MemoryDataPanel.test.tsxapp/src/lib/i18n/chunks/ar-3.tsapp/src/lib/i18n/chunks/bn-3.tsapp/src/lib/i18n/chunks/de-3.tsapp/src/lib/i18n/chunks/en-3.tsapp/src/lib/i18n/chunks/es-3.tsapp/src/lib/i18n/chunks/fr-3.tsapp/src/lib/i18n/chunks/hi-3.tsapp/src/lib/i18n/chunks/id-3.tsapp/src/lib/i18n/chunks/it-3.tsapp/src/lib/i18n/chunks/ko-3.tsapp/src/lib/i18n/chunks/pl-3.tsapp/src/lib/i18n/chunks/pt-3.tsapp/src/lib/i18n/chunks/ru-3.tsapp/src/lib/i18n/chunks/zh-CN-3.tsapp/src/lib/i18n/en.tsapp/src/services/memorySourcesService.test.tsapp/src/services/memorySourcesService.tsapp/src/services/memorySyncService.test.tsapp/src/services/memorySyncService.tssrc/core/all.rssrc/core/jsonrpc.rssrc/openhuman/config/schema/types.rssrc/openhuman/memory_sources/mod.rssrc/openhuman/memory_sources/readers/composio.rssrc/openhuman/memory_sources/readers/folder.rssrc/openhuman/memory_sources/readers/github.rssrc/openhuman/memory_sources/readers/mod.rssrc/openhuman/memory_sources/readers/rss.rssrc/openhuman/memory_sources/readers/twitter.rssrc/openhuman/memory_sources/readers/web_page.rssrc/openhuman/memory_sources/reconcile.rssrc/openhuman/memory_sources/registry.rssrc/openhuman/memory_sources/rpc.rssrc/openhuman/memory_sources/schemas.rssrc/openhuman/memory_sources/status.rssrc/openhuman/memory_sources/sync.rssrc/openhuman/memory_sources/types.rssrc/openhuman/memory_sync/composio/bus.rssrc/openhuman/memory_sync/composio/mod.rssrc/openhuman/mod.rstests/memory_sources_e2e.rs
💤 Files with no reviewable changes (5)
- app/src/components/intelligence/MemorySources.tsx
- app/src/services/memorySyncService.test.ts
- app/src/components/intelligence/MemorySyncConnections.test.tsx
- app/src/services/memorySyncService.ts
- app/src/components/intelligence/MemorySyncConnections.tsx
graycyrus
left a comment
There was a problem hiding this comment.
Good chunk of work — the architecture is solid and the reader abstraction is clean. CodeRabbit covered the low-hanging stuff (timeouts, i18n gaps, SSRF scheme check, SQL error swallowing). Two things I want fixed before merge that weren't covered.
Blocking
TOCTOU in folder reader
read_item resolves a canonical path for the traversal check, then reads from the original file_path. Between canonicalize and read_to_string, a symlink in the source directory can be swapped to point outside the base, bypassing the guard. One-liner fix: use canonical_file for the read.
// current
let body = tokio::fs::read_to_string(&file_path).await ...
// fix
let body = tokio::fs::read_to_string(&canonical_file).await ...The PR description explicitly calls out the traversal prevention as a security property — the implementation doesn't fully deliver on that claim.
No persistent sync error state
SourceStatus has chunks_synced, chunks_pending, last_chunk_at_ms, freshness — but no last_error or sync_failed field. After a sync failure, the UI shows Idle with unchanged chunk counts, which is indistinguishable from "never synced". The MemorySyncStageChanged(Failed) event is the only signal, and it's lost on reconnect or the next 5s poll cycle.
Users who add a GitHub source with bad credentials, or an RSS feed that's down, get no persistent feedback about why nothing is updating. The Sync button just goes from spinner back to idle. This is a core promise of the feature — "live chunk count / freshness display per row" — and it silently breaks under error conditions.
Add a last_sync_error: Option<String> to SourceStatus (or a sync_state enum). Persist it somewhere the status query can return it: a lightweight in-memory map keyed by source_id updated by the spawned sync task would do, or write it to the DB. Without this, the feature ships broken for any source that can fail.
Non-blocking
gh_available() blocks the async executor
github.rs calls std::process::Command::new("gh").arg("--version").status() synchronously from an async context. It blocks the current tokio worker thread for the subprocess lifetime on every list_items/read_item call. Wrap in tokio::task::spawn_blocking or switch to tokio::process::Command.
Label format inconsistency on auto-register
When a Composio connection is first created (bus event in bus.rs), it's registered with "{toolkit} connection" — lowercase, no short ID. The reconcile path in reconcile.rs generates "{Title_case(toolkit)} · {short_id(connection_id)}". Users who connect a new integration post-launch get a different label format than existing integrations that went through reconcile. Pick one format and use it in both places.
|
|
||
| fn folder_source(path: &str) -> MemorySourceEntry { | ||
| MemorySourceEntry { | ||
| id: "src_folder".into(), |
There was a problem hiding this comment.
[major] TOCTOU: traversal guard uses canonical_file but the read here uses the original non-canonical file_path. A symlink in the source directory can be swapped between canonicalize and this call, redirecting the read outside the base.
Fix:
let body = tokio::fs::read_to_string(&canonical_file)
.await
.map_err(|e| format!("failed to read {}: {e}", canonical_file.display()))?;| pub struct SourceStatus { | ||
| pub source_id: String, | ||
| pub chunks_synced: u64, | ||
| pub chunks_pending: u64, |
There was a problem hiding this comment.
[major] SourceStatus has no field for sync errors. After a failed sync the UI shows Idle / unchanged chunk count — identical to "never synced". The MemorySyncStageChanged(Failed) event is transient; it's lost before the next 5s poll.
Add last_sync_error: Option<String> here and wire it from the spawned task in sync.rs. An in-memory DashMap<source_id, last_error> populated by sync_source and read by source_status would be the lightest-weight fix without a schema change.
| .status() | ||
| .map(|s| s.success()) | ||
| .unwrap_or(false) | ||
| } |
There was a problem hiding this comment.
[minor] std::process::Command::status() is synchronous and blocks the tokio worker thread on every list_items/read_item call. Use tokio::process::Command or wrap in spawn_blocking:
async fn gh_available() -> bool {
tokio::process::Command::new("gh")
.arg("--version")
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.status()
.await
.map(|s| s.success())
.unwrap_or(false)
}| &toolkit, | ||
| &connection_id, | ||
| &label, | ||
| ) |
There was a problem hiding this comment.
[minor] Label format here is "{toolkit} connection" (lowercase, no short ID). reconcile.rs uses "{Title_case(toolkit)} · {short_id(connection_id)}".
Connections created before the new build and connections created after it get different label formats. Use the reconcile format (title_case + short_id) here too for consistency.
graycyrus
left a comment
There was a problem hiding this comment.
CodeRabbit-style Review — PR #2893 (Memory Sources)
Overall: Well-structured domain that correctly follows project conventions. 2 blockers, 3 major issues, 2 refactor suggestions, 6 nitpicks.
Verified / Looks Good
- Path traversal guard in
folder.rs(canonicalize + starts_with) is correct in principle - GitHub URL parser correctly rejects deep links, bare org paths, non-GitHub hostnames
MemorySourceEntry.validate()enforces per-kind required fields at add/update- SSRF guard in
web_page.rsrejectsfile://,data://and non-HTTP schemes - Freshness label thresholds tested with fixed timestamps — no time flakes
upsert_composio_sourcematches byconnection_id, not id/toolkit — correct- All 13 locale chunk files include the 56 new keys
sync_via_readerprogress fires every 5 items — good UI feedback without floodingcargo checkandcargo fmt --checkpass clean- Typecheck failures (
recharts,rehype-katex) are pre-existing on main
Questions
- Was the removal of
VaultPanelandObsidianVaultSectionfromMemoryWorkspace.tsxintentional? (See blocker comment below) github.rslist_itemsfetches commits, issues, PRs in 3 serial API calls — wastokio::join!considered for concurrency?
There was a problem hiding this comment.
🚫 Blocker: TOCTOU race — read uses original path, not canonicalized
The traversal guard canonicalizes file_path at line ~103, but read_to_string at line ~121 uses the original file_path. Between the guard passing and the read, a symlink swap could redirect the read outside the canonical base.
Fix: read from canonical_file, not file_path:
// before
let body = tokio::fs::read_to_string(&file_path)
.await
.map_err(|e| format!("failed to read {}: {e}", file_path.display()))?;
// after — use the canonicalized path
let body = tokio::fs::read_to_string(&canonical_file)
.await
.map_err(|e| format!("failed to read {}: {e}", canonical_file.display()))?;There was a problem hiding this comment.
🚫 Blocker: VaultPanel and ObsidianVaultSection silently removed
This PR removes <VaultPanel onToast={onToast} /> and <ObsidianVaultSection contentRootAbs={...} /> from MemoryWorkspace. These are the Obsidian vault viewer and "View vault in Obsidian" link — they're unrelated to memory sources and were user-visible features.
The PR description makes no mention of intentionally removing them. If accidental, please restore:
<VaultPanel onToast={onToast} />
{graph && (
<ObsidianVaultSection contentRootAbs={graph.content_root_abs} onToast={onToast} />
)}If intentional, it should be a separate commit with justification.
There was a problem hiding this comment.
⚠️ Major: RSS fetch_url has no body size cap — potential OOM
The web page reader correctly caps at 10 MiB (MAX_BODY_BYTES), but fetch_url here does resp.text().await with no size guard. A hostile or misconfigured feed could buffer hundreds of MBs into memory.
Fix:
const MAX_FEED_BYTES: u64 = 5 * 1024 * 1024; // 5 MiB
let bytes = resp
.bytes()
.await
.map_err(|e| format!("failed to read feed body: {e}"))?;
if bytes.len() as u64 > MAX_FEED_BYTES {
return Err(format!(
"feed body exceeds {MAX_FEED_BYTES}-byte limit ({} bytes)",
bytes.len()
));
}
let text = String::from_utf8_lossy(&bytes).into_owned();There was a problem hiding this comment.
⚠️ Major: gh CLI invoked without timeout — can block sync task indefinitely
gh_json calls tokio::process::Command::new("gh").output().await with no timeout. If gh hangs (auth flow, network stall, rate-limit backoff), the sync task parks forever and MemorySyncStage::Completed never fires.
Fix:
async fn gh_json(args: &[&str]) -> Result<String, String> {
let output = tokio::time::timeout(
std::time::Duration::from_secs(30),
tokio::process::Command::new("gh").args(args).output(),
)
.await
.map_err(|_| "gh command timed out after 30s".to_string())?
.map_err(|e| format!("gh command failed: {e}"))?;
// ...
}There was a problem hiding this comment.
⚠️ Major: tokio::spawn result dropped — panics in sync tasks are invisible
The spawned background task's JoinHandle is dropped immediately. Any panic inside it produces no log, and at shutdown the runtime may cancel an in-flight ingest mid-way.
Fix: Track the handle or attach a panic handler:
let source_id_for_log = source_id.clone();
let handle = tokio::spawn(async move {
// ... sync body ...
});
tokio::spawn(async move {
if let Err(panic) = handle.await {
tracing::error!(
source_id = %source_id_for_log,
"[memory_sources:sync] sync task panicked: {:?}", panic
);
}
});There was a problem hiding this comment.
💡 Suggestion: Composio chunk count is per-toolkit, not per-connection
For Composio sources, source_id_prefix returns "{toolkit}:%" (e.g. gmail:%). If a user has two Gmail connections, both registry entries report the same aggregate chunk count — misleading in the UI.
Long-term fix: include connection_id in Composio chunk source IDs. Short-term: add a comment documenting the known limitation so future readers aren't confused.
There was a problem hiding this comment.
🔧 Nitpicks (low priority)
default_true()duplicate — defined again here; already exists intypes.rs. Deduplicate to a shared location.folder.rs—base_path.trim_end_matches('/')won't fire on Windows-style\paths. Low risk (macOS/Linux primary).rss.rs— RSS vs Atom detection by substring<feedis fragile if the string appears in description before the root element. Astarts_withcheck on first non-whitespace tag would be more robust.AddMemorySourceDialog.tsx— Back button uses raw←character instead of SVG icon. Minor a11y/RTL concern.MemorySourcesRegistry.tsx—SOURCE_KIND_ICONS[source.kind] ?? '📄'fallback is dead code (type isRecord<SourceKind, string>).reconcile.rs—title_caseproduces"Clickup"not"ClickUp". Cosmetic but may look odd to users.
…s, and RPC surface New domain `src/openhuman/memory_sources/` that defines user-configurable data connectors feeding memory. Sources are persisted as flat entries in `config.toml` under `[[memory_sources]]`. - Types: SourceKind (Composio/Folder/GithubRepo/TwitterQuery/RssFeed/WebPage), MemorySourceEntry with flat kind-specific fields, validation per kind - Registry: CRUD operations reading/writing Config.memory_sources - Readers: SourceReader trait with implementations for each kind - Composio: delegates to existing sync layer - Folder: walkdir + glob matching with path traversal prevention - GitHub: public repo tree listing via API, raw content fetch - Twitter: scaffold with clear "not yet configured" error - RSS: lightweight XML parser for RSS 2.0 and Atom feeds - WebPage: HTTP fetch with optional CSS selector extraction - RPC surface: list/get/add/update/remove + list_items/read_item - Controller wiring in core/all.rs 24 unit tests covering types, validation, readers, and schemas.
- Auto-register memory_sources entry when a Composio connection is created (bus.rs subscriber, non-fatal on failure) - Add reconcile module that syncs existing Composio connections into the registry on startup - Wire list_sync_targets() to prefer memory_sources registry (enabled Composio entries) over scanning all connections. Falls back to legacy full-scan when no registry entries exist.
Tests boot a real axum JSON-RPC server against an isolated workspace and exercise: add folder source → list → list_items → read_item → ingest document into memory tree → verify chunks indexed → update → disable → remove. Also tests validation rejection of bad input. Fix: declare kind-specific fields (toolkit, path, glob, url, branch, paths, query, since_days, max_items, selector) in the add/update controller schemas so the JSON-RPC param validator accepts them.
… E2E tests GitHub reader now pulls commits, issues, and PRs (not source code) from a repository. Uses `gh` CLI when available for authenticated access, falls back to unauthenticated GitHub REST API for public repos. Each item is formatted as structured markdown with metadata (author, date, state, labels, comments) suitable for memory tree ingestion. New E2E tests (JSON-RPC against a real axum server): - memory_sources_github_repo_activity_flow: add repo → list commits/ issues/PRs → read commit → ingest into memory tree → verify chunks - memory_sources_composio_registry_flow: add gmail + slack composio sources → CRUD lifecycle → list_items/read_item → disable → remove Also fixes controller schemas to declare all kind-specific fields so the JSON-RPC param validator accepts them on add/update.
New components: - memorySourcesService: typed client for openhuman.memory_sources_* RPCs (list/get/add/update/remove/list_items/read_item) - AddMemorySourceDialog: two-step picker (kind → form) for adding folder/github_repo/rss_feed/web_page/twitter_query sources - MemorySourcesRegistry: list panel with toggle/remove controls and "Add Source" button, integrated into MemoryWorkspace below the existing composio-driven MemorySources panel Composio-backed sources are filtered out of the registry view since they're already managed by the existing MemorySources component (which joins composio.list_connections with sync status). i18n: 24 new memorySources.* keys added to en.ts and all 14 locale chunk files (English placeholders for non-English locales). Tests: 6 unit tests for the service layer covering envelope unwrapping, flat-field add params, partial updates, and constant exposure.
…icker - Remove the old Composio-driven MemorySources panel; the registry-based panel is now the single source of truth for "what feeds memory" - Rename header to "Memory Sources" (was "Custom Sources") - Include Composio in the source kind picker — Add Source dialog now has 6 kinds: Integration, Folder, GitHub, RSS, Web, Twitter - For Composio kind, fetch active connections via composio.list_connections and let the user pick from a dropdown (instead of typing toolkit + id) - Composio source rows get a Sync button that calls composio_sync - Folder kind now has a "Browse…" button using <input type="file" webkitdirectory> for native folder picking in CEF - Remove the Knowledge Vaults (Experimental) panel — superseded by the Folder source kind in Memory Sources - Remove ObsidianVaultSection from the actions row
Backend (src/openhuman/memory_sources/):
- sync.rs: sync_source(source_id) dispatches by kind
- Composio → memory_sync::composio::run_connection_sync
- Folder/GitHub/RSS/WebPage → reader.list_items + reader.read_item
+ ingest_document (chunks tagged mem_src:{source_id}:{item_id})
- Twitter → returns "not configured" error
- Emits MemorySyncStageChanged events (requested/fetching/stored/
ingesting/completed/failed) keyed by source_id so the UI can
stream per-source progress
- status.rs: per-source chunk counts + freshness label via SQL
query against mem_tree_chunks (matched by source_id prefix —
toolkit-based for composio, mem_src:{id}:% for everything else)
- New RPCs: memory_sources_sync, memory_sources_status_list
registered in the controller registry
Frontend:
- memorySourcesService.ts: syncMemorySource + memorySourcesStatusList
+ SourceStatus type
- MemorySourcesRegistry.tsx: Sync button on every row (not just
composio), polls status every 5s, shows chunk count / freshness
pill / last-sync timestamp per row, dispatches the unified
memory_sources_sync RPC
Removed (consolidated into memory_sources):
- app/src/components/intelligence/MemorySources.tsx (old composio
panel) + its test
- app/src/components/intelligence/MemorySyncConnections.tsx +
test (already dead)
- app/src/services/memorySyncService.ts + test (replaced by
memorySourcesService.statusList)
Tests:
- 27 Rust unit tests (added freshness + prefix dispatch)
- 6 frontend service tests still pass
…ions Default behavior: every active Composio connection whose toolkit has a memory-sync provider (gmail, slack, github, notion, linear, clickup) now shows up as a memory source automatically — users don't have to manually add their integrations. How it works: - New `composio::scan_active_sync_targets()` always hits Composio directly (vs. `list_sync_targets` which short-circuits through the registry once it's populated) - `memory_sources::reconcile::ensure_composio_sources()` calls it and upserts one MemorySourceEntry per (toolkit, connection_id), labelled "Toolkit · <last-8-of-connection-id>" so multiple Gmail accounts stay distinguishable - Wired at startup in core::jsonrpc bootstrap (best-effort spawn) - Also called lazily on every memory_sources_list RPC so freshly- connected integrations appear without waiting for a restart
…r input ESLint react/no-unknown-property rejects 'directory' (deprecated alias). Modern Chromium/CEF accepts the standard 'webkitdirectory' alone.
Rust: - folder reader: enforce MAX_FILE_SIZE on read_item to match list_items - github reader: harden parse_github_url to reject /tree/<branch> etc and non-GitHub hosts; add 20s reqwest timeout - rss reader: 20s reqwest timeout; redact URLs in debug logs (host only) - twitter reader: trim + reject empty queries at the boundary - web_page reader: SSRF guard (only http/https), 10 MiB body cap, 20s timeout - reconcile: UTF-8 safe short_id (char-boundary slicing) - status: surface SQL errors instead of swallowing them as zero-rows - sync: debug tracing at queueing/dispatch/list-size/completion with source correlation - composio: fallback to active-connection scan when registry yields zero valid targets (not just when registry is empty) - e2e: gate live GitHub network test behind OPENHUMAN_E2E_NETWORK=1 Frontend i18n (all per-component user-visible strings): - 9 placeholder strings moved to memorySources.*Placeholder keys - 6 source kind labels moved to memorySources.kind.* keys (constant becomes SOURCE_KIND_LABEL_KEYS, resolved via t() at render sites) - Sync toast title/message + 5 relative-time labels (just now, s/m/h/d ago) moved to time.*Suffix and memorySources.sync.* keys - AddMemorySourceDialog Composio-load effect refactored to avoid setState directly in the effect body (react-hooks/set-state-in-effect) All 14 locale chunk-3 files updated to keep i18n:check parity. 27 → 31 Rust unit tests; 6 frontend service tests still pass.
…eout, panic handler - folder reader: read from `canonical_file` instead of original `file_path` to close the TOCTOU window between canonicalize() and the actual read - rss reader: add MAX_FEED_BYTES (5 MiB) cap in fetch_url() — checked from Content-Length header before buffering, then from actual byte count - github reader: wrap gh CLI .output().await in tokio::time::timeout(30s) to prevent hung sync tasks on unresponsive gh auth/server - sync: wrap inner tokio::spawn in an outer spawn that awaits the JoinHandle and logs tracing::error! on panic so task panics are never silently dropped - MemoryWorkspace: restore accidentally removed VaultPanel and ObsidianVaultSection imports + render sites; update module-level JSDoc to describe the new MemorySourcesRegistry component tree
d1c9acd to
4f4eb47
Compare
…deleted modules MemorySources.tsx and memorySyncService.ts were deleted as part of the memory_sources domain consolidation, but the test file that imported from them was not removed. This caused TypeScript compile failures in CI.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/components/intelligence/__tests__/MemoryWorkspace.test.tsx`:
- Around line 21-23: The global mock for
listMemorySources/memorySourcesStatusList is returning [] and causing tests that
expect source rows/buttons to fail; remove the hardcoded mockResolvedValue([])
from the top-level vi.mock and instead set realistic defaults inside the test
suite's beforeEach (call listMemorySources.mockResolvedValue(<defaultSources>)
and memorySourcesStatusList.mockResolvedValue(<defaultStatuses>)) so individual
tests can override those via listMemorySources.mockResolvedValueOnce(...) as
needed; update references to the mocked functions (listMemorySources and
memorySourcesStatusList) in beforeEach and in tests that need different data so
the MemoryWorkspace tests no longer assume an empty source list.
In `@app/src/components/intelligence/MemorySourcesRegistry.tsx`:
- Around line 43-44: The component currently uses a single state variable
syncingId (useState<string | null>) which cannot track multiple concurrent
syncs; change this to track multiple in-flight syncs (e.g.,
useState<Set<string>> or a string[]/Record) and update all places that read or
set syncingId (e.g., wherever setSyncingId is called—likely the sync handler
like handleSync/onSyncClick and the cleanup after dispatch completes) to add the
id when a sync starts and remove it when it finishes or errors; then update the
row rendering logic to check membership (e.g., syncingIds.has(id) or
syncingIds.includes(id)) instead of comparing to a single value so multiple rows
can show locked/spinning state simultaneously.
- Around line 48-58: The refresh implementation currently uses
listMemorySources() and memorySourcesStatusList() with catch blocks that return
empty arrays, which causes transient API failures to wipe UI state; change
refresh to call both APIs with Promise.allSettled (or handle each promise
separately), and only call setSources(list) and setStatuses(stats) when the
corresponding promise fulfilled; if a promise is rejected, retain the existing
sources/statuses (do not replace with []), and log the error via console.warn or
the existing logger; reference listMemorySources, memorySourcesStatusList,
setSources, setStatuses, and the refresh function in MemorySourcesRegistry.tsx
when applying this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5f608ece-1584-49dc-a050-f5335dd897d0
📒 Files selected for processing (28)
app/src/components/intelligence/AddMemorySourceDialog.tsxapp/src/components/intelligence/MemorySources.tsxapp/src/components/intelligence/MemorySourcesRegistry.tsxapp/src/components/intelligence/MemorySyncConnections.test.tsxapp/src/components/intelligence/MemorySyncConnections.tsxapp/src/components/intelligence/MemoryWorkspace.tsxapp/src/components/intelligence/__tests__/MemoryWorkspace.test.tsxapp/src/components/settings/panels/__tests__/MemoryDataPanel.test.tsxapp/src/lib/i18n/chunks/ar-3.tsapp/src/lib/i18n/chunks/bn-3.tsapp/src/lib/i18n/chunks/de-3.tsapp/src/lib/i18n/chunks/en-3.tsapp/src/lib/i18n/chunks/es-3.tsapp/src/lib/i18n/chunks/fr-3.tsapp/src/lib/i18n/chunks/hi-3.tsapp/src/lib/i18n/chunks/id-3.tsapp/src/lib/i18n/chunks/it-3.tsapp/src/lib/i18n/chunks/ko-3.tsapp/src/lib/i18n/chunks/pl-3.tsapp/src/lib/i18n/chunks/pt-3.tsapp/src/lib/i18n/chunks/ru-3.tsapp/src/lib/i18n/chunks/zh-CN-3.tsapp/src/lib/i18n/en.tsapp/src/services/memorySourcesService.test.tsapp/src/services/memorySourcesService.tsapp/src/services/memorySyncService.test.tsapp/src/services/memorySyncService.tssrc/core/all.rs
💤 Files with no reviewable changes (23)
- app/src/lib/i18n/chunks/pl-3.ts
- app/src/lib/i18n/chunks/pt-3.ts
- app/src/lib/i18n/chunks/it-3.ts
- app/src/lib/i18n/en.ts
- app/src/lib/i18n/chunks/en-3.ts
- app/src/components/intelligence/MemorySources.tsx
- app/src/lib/i18n/chunks/ru-3.ts
- app/src/services/memorySyncService.ts
- app/src/lib/i18n/chunks/de-3.ts
- app/src/lib/i18n/chunks/zh-CN-3.ts
- app/src/lib/i18n/chunks/ko-3.ts
- app/src/lib/i18n/chunks/ar-3.ts
- app/src/lib/i18n/chunks/hi-3.ts
- app/src/services/memorySourcesService.test.ts
- app/src/services/memorySourcesService.ts
- app/src/lib/i18n/chunks/bn-3.ts
- app/src/lib/i18n/chunks/es-3.ts
- app/src/components/intelligence/MemorySyncConnections.test.tsx
- app/src/components/intelligence/MemorySyncConnections.tsx
- src/core/all.rs
- app/src/lib/i18n/chunks/id-3.ts
- app/src/lib/i18n/chunks/fr-3.ts
- app/src/services/memorySyncService.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- app/src/components/settings/panels/tests/MemoryDataPanel.test.tsx
- app/src/components/intelligence/AddMemorySourceDialog.tsx
| const [syncingId, setSyncingId] = useState<string | null>(null); | ||
|
|
There was a problem hiding this comment.
Single syncingId cannot safely model multiple in-flight sync requests.
If users trigger sync on different rows quickly, only one row stays locked/spinning and earlier rows can be re-clicked, risking duplicate non-idempotent sync dispatches.
Suggested fix
- const [syncingId, setSyncingId] = useState<string | null>(null);
+ const [syncingIds, setSyncingIds] = useState<Set<string>>(new Set());
const handleSync = useCallback(
async (source: MemorySourceEntry) => {
- setSyncingId(source.id);
+ setSyncingIds(prev => new Set(prev).add(source.id));
try {
await syncMemorySource(source.id);
onToast?.({
type: 'success',
title: `${t('memorySources.sync.successTitle')} ${source.label}`,
message: t('memorySources.sync.successMessage'),
});
void refresh();
} catch (err) {
onToast?.({
type: 'error',
title: `${t('memorySources.sync.failedTitle')} ${source.label}`,
message: err instanceof Error ? err.message : String(err),
});
} finally {
- setSyncingId(prev => (prev === source.id ? null : prev));
+ setSyncingIds(prev => {
+ const next = new Set(prev);
+ next.delete(source.id);
+ return next;
+ });
}
},
[onToast, refresh, t]
);And when rendering rows:
- isSyncing={syncingId === source.id}
+ isSyncing={syncingIds.has(source.id)}Also applies to: 115-134
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/components/intelligence/MemorySourcesRegistry.tsx` around lines 43 -
44, The component currently uses a single state variable syncingId
(useState<string | null>) which cannot track multiple concurrent syncs; change
this to track multiple in-flight syncs (e.g., useState<Set<string>> or a
string[]/Record) and update all places that read or set syncingId (e.g.,
wherever setSyncingId is called—likely the sync handler like
handleSync/onSyncClick and the cleanup after dispatch completes) to add the id
when a sync starts and remove it when it finishes or errors; then update the row
rendering logic to check membership (e.g., syncingIds.has(id) or
syncingIds.includes(id)) instead of comparing to a single value so multiple rows
can show locked/spinning state simultaneously.
| listMemorySources().catch(err => { | ||
| console.warn('[ui-flow][memory-sources] list failed', err); | ||
| return [] as MemorySourceEntry[]; | ||
| }), | ||
| memorySourcesStatusList().catch(err => { | ||
| console.warn('[ui-flow][memory-sources] status_list failed', err); | ||
| return [] as SourceStatus[]; | ||
| }), | ||
| ]); | ||
| setSources(list); | ||
| setStatuses(stats); |
There was a problem hiding this comment.
Transient poll failures should not clear the entire registry UI state.
On any temporary API error, refresh() currently replaces sources/statuses with empty arrays, causing visible data loss/flicker and disabling valid actions.
Suggested fix
- const [list, stats] = await Promise.all([
- listMemorySources().catch(err => {
- console.warn('[ui-flow][memory-sources] list failed', err);
- return [] as MemorySourceEntry[];
- }),
- memorySourcesStatusList().catch(err => {
- console.warn('[ui-flow][memory-sources] status_list failed', err);
- return [] as SourceStatus[];
- }),
- ]);
- setSources(list);
- setStatuses(stats);
+ const [listResult, statusResult] = await Promise.allSettled([
+ listMemorySources(),
+ memorySourcesStatusList(),
+ ]);
+ if (listResult.status === 'fulfilled') setSources(listResult.value);
+ if (statusResult.status === 'fulfilled') setStatuses(statusResult.value);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@app/src/components/intelligence/MemorySourcesRegistry.tsx` around lines 48 -
58, The refresh implementation currently uses listMemorySources() and
memorySourcesStatusList() with catch blocks that return empty arrays, which
causes transient API failures to wipe UI state; change refresh to call both APIs
with Promise.allSettled (or handle each promise separately), and only call
setSources(list) and setStatuses(stats) when the corresponding promise
fulfilled; if a promise is rejected, retain the existing sources/statuses (do
not replace with []), and log the error via console.warn or the existing logger;
reference listMemorySources, memorySourcesStatusList, setSources, setStatuses,
and the refresh function in MemorySourcesRegistry.tsx when applying this change.
|
Great progress — the TOCTOU fix in folder.rs and label formatting are solid. Three things still need addressing before I can sign off:
Let me know once those are squared away. |
…nd update tests
- Add data-testid="memory-source-row-{kind}" to SourceRow <li> so tests can
assert that rows for each source kind are rendered.
- Add data-testid="memory-source-sync-{toolkit ?? kind}" to the Sync button so
per-source sync buttons are addressable from tests.
- Update MemoryWorkspace.test.tsx: switch mocks from listConnections/syncConnection
(deleted composioApi) to listMemorySources/syncMemorySource (memorySourcesService);
rename "per-connection Sync button" → "per-source Sync button"; use
queryAllByTestId for the multi-composio-row assertion.
graycyrus
left a comment
There was a problem hiding this comment.
Progress update — you fixed the TOCTOU in folder.rs and the label format inconsistency, thanks. However, two blocking issues from the prior review remain unresolved:
-
No persistent sync error state —
SourceStatusstruct is still unchanged. After a failed sync, the UI showsIdlewith unchanged chunk counts, indistinguishable from "never synced". Users get no feedback about why nothing is updating (e.g., bad credentials, unreachable feed). This is a core UX promise of the feature — addlast_sync_error: Option<String>toSourceStatusor implement per-source error tracking. Required before merge. -
gh_available()still blocks the async executor — still usingstd::process::Command::status()synchronously from async context, blocking a tokio thread per call. Wrap intokio::task::spawn_blockingor switch totokio::process::Command. Non-blocking severity but should be fixed before merge.
Also, CI is failing — PR Submission Checklist and Rust Core Coverage jobs are red. Fix those first, then the above, and I'll review again.
|
Hey @senamakel — just checking in. I see CI is still settling (Rust Core Coverage + Windows/macOS E2E tests are pending), and the two blockers from my prior review haven't been addressed yet:
I'm holding my approval until:
Once you push those fixes, just let me know and I'll do a final review pass. Thanks! |
Summary
memory_sourcesdomain — a unified registry of data connectors that feed memory: Composio integrations, local folders, GitHub repos (commits/issues/PRs), RSS feeds, web pages, Twitter queriesmemory_sources_sync) dispatches by kind: Composio → existing engine, others → reader-based ingest into the memory treememory_sources_status_list) returns chunks ingested, freshness label, last-sync timestamp; UI polls every 5sProblem
Memory ingestion was previously a black box of "sync everything Composio is connected to." Users couldn't:
The composio sync layer also had no consistent identity at the user-facing layer — "Gmail" rolled up multiple accounts and there was no API for the UI to enumerate or manage sources.
Solution
Backend:
src/openhuman/memory_sources/types.rs—SourceKindenum (composio/folder/github_repo/twitter_query/rss_feed/web_page),MemorySourceEntrywith kind-specific fields flat on the structregistry.rs— CRUD overConfig.memory_sources(TOML-persisted list)readers/{composio,folder,github,rss,twitter,web_page}.rs—SourceReadertrait withlist_items+read_itemper kind. GitHub reader pulls project activity (commits/issues/PRs viaghCLI when available, falls back to public REST API), not source code.sync.rs—sync_source(id)dispatches by kind. Composio →memory_sync::composio::run_connection_sync. Reader-backed kinds → walk items and ingest each viaingest_document(chunks taggedmem_src:{source_id}:{item_id}). EmitsMemorySyncStageChangedevents keyed by source id.status.rs— Per-source SQL query againstmem_tree_chunksfor chunk counts + freshness; toolkit-prefix matching for composio,mem_src:{id}:%for everything else.reconcile.rs— Auto-seeds the registry with active Composio connections that have a sync provider; called at startup and lazily on everymemory_sources_listRPC.Frontend:
memorySourcesService.ts— typed client for all 9 RPCs (list/get/add/update/remove/list_items/read_item/sync/status_list)AddMemorySourceDialog.tsx— two-step picker (kind → form). Folder kind has a native folder picker via<input type=\"file\" webkitdirectory>. Composio kind fetches active connections viacomposio_list_connectionsand presents them as a dropdown.MemorySourcesRegistry.tsx— single unified panel: Sync button on every row, polls status every 5s, shows chunk count / freshness pill / last-sync timestamp per rowMemorySources.tsx(composio-only panel),MemorySyncConnections.tsx,memorySyncService.ts— all consolidated under the new domainSubmission Checklist
Impact
Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
Validation Run
Validation Blocked
Behavior Changes
Parity Contract
Duplicate / Superseded PR Handling
Summary by CodeRabbit
New Features
Tests
Documentation