fix(webui): improve objects list performance for branches with uncommitted deletes#10241
fix(webui): improve objects list performance for branches with uncommitted deletes#10241caoergou wants to merge 3 commits intotreeverse:masterfrom
Conversation
…itted deletes When displaying objects on a branch, use the committed-only ref (branch@) instead of the full branch ref to list objects. This avoids scanning the staging area twice - once for objects.list and once for refs.changes. For branches with many uncommitted deletes, this significantly improves page load performance. The '@' modifier tells lakeFS to only return committed data, skipping the staging area scan entirely. Fixes treeverse#10008
| // Use committed-only ref (branch@) for branches to avoid scanning staging area twice | ||
| // This significantly improves performance when there are many uncommitted deletes | ||
| const listRef = reference.type === RefTypeBranch ? reference.id + '@' : reference.id; | ||
| return objects.list(repo.id, listRef, path, after, config.pre_sign_support_ui); |
There was a problem hiding this comment.
Using branch@ for objects.list means uncommitted additions no longer appear in the objects list.
mergeResults only injects missing removed entries and common_prefix entries into the merged view (line 53 of mergeResults.ts) — it doesn't inject added entries that are absent from the committed listing.
Repro: upload a new file to a branch without committing → it won't show in the objects tab.
…listing When using committed-only ref (branch@) for objects.list to improve performance, uncommitted added files are not returned by the API. This fix updates mergeResults to also include 'added' type changes in the missingItems filter, ensuring newly uploaded files are shown in the objects list. Fixes the issue reported by @UdiBen: uncommitted additions no longer appear in the objects list when using branch@ for performance.
|
Hi @UdiBen, Thank you for the detailed feedback! I understand the issue now. Problem: Using Fix: I've updated Change: // Before
.filter((change) => change.type === 'removed' || change.path_type === 'common_prefix')
// After
.filter((change) => change.type === 'removed' || change.type === 'added' || change.path_type === 'common_prefix')This approach:
Please let me know if this addresses your concern or if there are other issues to address. |
|
@caoergou the fix looks good!
|
When using committed-only ref (branch@) for objects.list, newly uploaded (staged) files may be alphabetically after the last committed file and were incorrectly excluded from the objects list. Root cause: the lastResultPath filter in mergeResults applied to 'added' items, but staged files can sort after committed files (e.g. 'test-upload.txt' after 'lakes.parquet'). changesData is already paginated via the 'after' cursor, so 'added' items do not need this extra restriction. Fix: exempt 'added' items from the lastResultPath restriction. The filter now only applies to 'removed' entries and common_prefix entries. Also apply prettier formatting to objects.jsx.
|
Hi @UdiBen, I've addressed both failing checks: 1. Prettier formatting ✅Ran 2. E2E test failures ✅Root cause: The When using .filter((change) => lastResultPath && change.path <= lastResultPath)was excluding them entirely, causing the E2E tests to fail. Fix: Exempt // Before
.filter((change) => lastResultPath && change.path <= lastResultPath)
// After
.filter((change) => change.type === 'added' || (lastResultPath && change.path <= lastResultPath))All 23 existing unit tests pass. |
|
@caoergou My concern is with the approach of splitting the listing into two independently-paginated sources (committed objects via branch@ + changes via diff) and merging them client-side. These two streams have different page boundaries, and mergeResults only sees one page at a time with no cross-page state. This makes it fundamentally hard to get right. For example, here's a test case that is failing with the fix: My suggestion is to add few tests that cover the altered functionality:
Once we are covered we will have a better confident with merging |
|
@UdiBen After deeper investigation, I need to revise my earlier proposal. My "skip tombstones" suggestion was incorrect. Tombstones (removed entries) are essential for The real bottleneck is in
Why client-side merge doesn't work well:
Potential backend optimization directions (need further investigation):
However, these are significant architectural changes that require discussion. My recommendation:
I'm happy to help investigate the backend optimization options if that's the desired direction. What do you think? |
Problem
When displaying objects on a branch in the WebUI, the page loads slowly when there are many uncommitted deletes. This is because both
objects.list(branch)andrefs.changes(branch)scan the same staging area entries.Solution
Use the committed-only ref (
branch@) when listing objects for branches. The@modifier tells lakeFS to only return committed data, skipping the staging area scan entirely.Changes
objects.jsxto usereference.id + '@'for branch references when listing objectsreference.typeto the dependency arrayPerformance Impact
For branches with many uncommitted changes:
Testing
Fixes #10008