fix(webui): improve objects list performance for branches with uncommitted deletes by caoergou · Pull Request #10241 · treeverse/lakeFS

caoergou · 2026-03-07T08:06:11Z

Problem

When displaying objects on a branch in the WebUI, the page loads slowly when there are many uncommitted deletes. This is because both objects.list(branch) and refs.changes(branch) scan the same staging area entries.

Solution

Use the committed-only ref (branch@) when listing objects for branches. The @ modifier tells lakeFS to only return committed data, skipping the staging area scan entirely.

Changes

Modified objects.jsx to use reference.id + '@' for branch references when listing objects
Added reference.type to the dependency array

Performance Impact

For branches with many uncommitted changes:

Before: Scans staging area twice (objects.list + refs.changes)
After: Scans staging area once (only refs.changes)

Testing

ESLint passes
All existing tests pass (72 tests)

Fixes #10008

…itted deletes When displaying objects on a branch, use the committed-only ref (branch@) instead of the full branch ref to list objects. This avoids scanning the staging area twice - once for objects.list and once for refs.changes. For branches with many uncommitted deletes, this significantly improves page load performance. The '@' modifier tells lakeFS to only return committed data, skipping the staging area scan entirely. Fixes treeverse#10008

UdiBen

@caoergou Thanks for the contribution!
I added one comment - LMK if it's unclear

UdiBen · 2026-03-11T13:46:27Z

+            // Use committed-only ref (branch@) for branches to avoid scanning staging area twice
+            // This significantly improves performance when there are many uncommitted deletes
+            const listRef = reference.type === RefTypeBranch ? reference.id + '@' : reference.id;
+            return objects.list(repo.id, listRef, path, after, config.pre_sign_support_ui);


Using branch@ for objects.list means uncommitted additions no longer appear in the objects list.
mergeResults only injects missing removed entries and common_prefix entries into the merged view (line 53 of mergeResults.ts) — it doesn't inject added entries that are absent from the committed listing.

Repro: upload a new file to a branch without committing → it won't show in the objects tab.

@UdiBen

…listing When using committed-only ref (branch@) for objects.list to improve performance, uncommitted added files are not returned by the API. This fix updates mergeResults to also include 'added' type changes in the missingItems filter, ensuring newly uploaded files are shown in the objects list. Fixes the issue reported by @UdiBen: uncommitted additions no longer appear in the objects list when using branch@ for performance.

caoergou · 2026-03-12T01:48:55Z

Hi @UdiBen,

Thank you for the detailed feedback! I understand the issue now.

Problem: Using branch@ for objects.list means uncommitted additions are not returned, and mergeResults only injects removed entries and common_prefix entries into the merged view.

Fix: I've updated mergeResults.ts to also include added type changes in the missingItems filter. This ensures that newly uploaded (but uncommitted) files are added to the objects list when using committed-only ref.

Change:

// Before
.filter((change) => change.type === 'removed' || change.path_type === 'common_prefix')

// After
.filter((change) => change.type === 'removed' || change.type === 'added' || change.path_type === 'common_prefix')

This approach:

Keeps the performance improvement (using branch@ to avoid scanning staging area twice)
Ensures uncommitted added files are shown in the objects list
Maintains the existing behavior for removed and changed entries

Please let me know if this addresses your concern or if there are other issues to address.

UdiBen · 2026-03-12T19:08:52Z

@caoergou the fix looks good!
2 failing checks:

prettier fails (you can run npm run format from webui)
E2E test fails. If you need assistance with the fix or running E2E locally LMK

When using committed-only ref (branch@) for objects.list, newly uploaded (staged) files may be alphabetically after the last committed file and were incorrectly excluded from the objects list. Root cause: the lastResultPath filter in mergeResults applied to 'added' items, but staged files can sort after committed files (e.g. 'test-upload.txt' after 'lakes.parquet'). changesData is already paginated via the 'after' cursor, so 'added' items do not need this extra restriction. Fix: exempt 'added' items from the lastResultPath restriction. The filter now only applies to 'removed' entries and common_prefix entries. Also apply prettier formatting to objects.jsx.

caoergou · 2026-03-20T01:16:47Z

Hi @UdiBen,

I've addressed both failing checks:

1. Prettier formatting ✅

Ran npm run format from webui/ — reformatted mergeResults.ts and objects.jsx.

2. E2E test failures ✅

Root cause: The lastResultPath filter in mergeResults.ts was incorrectly applied to added items (staged files not yet committed).

When using branch@ for objects.list, staged files don't appear in the committed listing. They are supposed to be injected via mergeResults from the diff. However, staged files may sort after the last committed file — for example, test-upload.txt sorts after lakes.parquet from the sample data, so the filter:

.filter((change) => lastResultPath && change.path <= lastResultPath)

was excluding them entirely, causing the E2E tests to fail.

Fix: Exempt added items from the lastResultPath restriction. The changesData is already paginated using the same after cursor as objects.list, so added items won't overflow page boundaries.

// Before
.filter((change) => lastResultPath && change.path <= lastResultPath)

// After
.filter((change) => change.type === 'added' || (lastResultPath && change.path <= lastResultPath))

All 23 existing unit tests pass.

UdiBen · 2026-03-25T14:21:28Z

@caoergou
Thanks for tackling this, I really appreciate the effort - the performance issue with many uncommitted deletes is real and worth solving.

My concern is with the approach of splitting the listing into two independently-paginated sources (committed objects via branch@ + changes via diff) and merging them client-side. These two streams have different page boundaries, and mergeResults only sees one page at a time with no cross-page state. This makes it fundamentally hard to get right.

For example, here's a test case that is failing with the fix:

  describe('pagination: added entries must not duplicate across pages', () => {
      it('does not inject the same added entry on two consecutive pages', () => {
          // Simulates two consecutive pages of committed-only listing (branch@).
          // An added file "data/file-49999-added" appears in changesData for BOTH pages
          // because the changes cursor overlaps at page boundaries.
          // mergeResults must not return it on both pages — that causes duplicates in the UI.

          // --- Page N: committed objects end at "data/file-49958" ---
          const pageNResults: Entry[] = [
              { path: 'data/file-49869', path_type: 'object' },
              { path: 'data/file-49900', path_type: 'object' },
              { path: 'data/file-49958', path_type: 'object' },
          ];
          const pageNChanges: ChangesData = {
              results: [
                  { path: 'data/file-49869', type: 'removed', path_type: 'object' },
                  { path: 'data/file-49900', type: 'removed', path_type: 'object' },
                  { path: 'data/file-49958', type: 'removed', path_type: 'object' },
                  { path: 'data/file-49999-added', type: 'added', path_type: 'object' },
              ],
          };

          const mergedPageN = mergeResults(pageNResults, pageNChanges, false);

          // --- Page N+1: committed objects start after "data/file-49958" ---
          const pageN1Results: Entry[] = [
              { path: 'data/file-49959', path_type: 'object' },
              { path: 'data/file-50000', path_type: 'object' },
              { path: 'data/file-5047', path_type: 'object' },
          ];
          const pageN1Changes: ChangesData = {
              results: [
                  { path: 'data/file-49959', type: 'removed', path_type: 'object' },
                  { path: 'data/file-49999-added', type: 'added', path_type: 'object' },
                  { path: 'data/file-50000', type: 'removed', path_type: 'object' },
                  { path: 'data/file-5047', type: 'removed', path_type: 'object' },
              ],
          };

          const mergedPageN1 = mergeResults(pageN1Results, pageN1Changes, false);

          // Count how many times the added file appears across both pages
          const pageNCount = mergedPageN.filter((r) => r.path === 'data/file-49999-added').length;
          const pageN1Count = mergedPageN1.filter((r) => r.path === 'data/file-49999-added').length;
          const totalAppearances = pageNCount + pageN1Count;

          // The file must appear AT MOST once across both pages (0 is acceptable if the
          // implementation doesn't inject added entries; >1 means duplication bug)
          expect(totalAppearances).toBeLessThanOrEqual(1);
      });
  });

My suggestion is to add few tests that cover the altered functionality:

mergeResults with added entry injection (added file between two committed files on same page / added file on last page after all committed objects etc.)
Pagination edge cases (Single committed file + many added files / No committed objects but added files exist etc.)

Once we are covered we will have a better confident with merging

caoergou · 2026-03-26T02:05:05Z

@UdiBen After deeper investigation, I need to revise my earlier proposal.

My "skip tombstones" suggestion was incorrect. Tombstones (removed entries) are essential for FilterTombstoneIterator to correctly exclude deleted files from the merged listing. Without them, committed-but-deleted files would incorrectly appear.

The real bottleneck is in listStagingAreaWithoutCompaction():

It scans the entire staging area regardless of pagination
Even if the API only returns 100 results, all staging entries are processed
Pagination is applied after the merge, not during staging scan

Why client-side merge doesn't work well:

Even if we only inject added entries, pagination boundaries between objects.list and refs.changes are misaligned
This causes duplicates or missing entries across pages

Potential backend optimization directions (need further investigation):

Push down pagination cursor to staging scan (only scan path > after)
Lazy merging similar to merge-sort iteration
Optimize staging storage for range queries

However, these are significant architectural changes that require discussion.

My recommendation:

This PR addresses a real pain point, but the client-side merge approach has fundamental limitations
A proper solution likely requires backend optimization
I suggest we pause this PR and open a discussion issue to gather more input from maintainers

I'm happy to help investigate the backend optimization options if that's the desired direction.

What do you think?

github-actions Bot added the area/UI Improvements or additions to UI label Mar 7, 2026

arielshaqed added the contributor label Mar 7, 2026

arielshaqed requested review from a team and UdiBen March 7, 2026 15:35

UdiBen requested changes Mar 11, 2026

View reviewed changes

caoergou requested a review from UdiBen March 20, 2026 03:13

caoergou mentioned this pull request Mar 26, 2026

Proposal: Branch Read Optimization #10061

Open

UdiBen closed this Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(webui): improve objects list performance for branches with uncommitted deletes#10241

fix(webui): improve objects list performance for branches with uncommitted deletes#10241
caoergou wants to merge 3 commits intotreeverse:masterfrom
caoergou:feat-10008-webui-performance

caoergou commented Mar 7, 2026

Uh oh!

UdiBen left a comment

Uh oh!

UdiBen Mar 11, 2026 •

edited

Loading

Uh oh!

caoergou commented Mar 12, 2026

Uh oh!

UdiBen commented Mar 12, 2026

Uh oh!

caoergou commented Mar 20, 2026

Uh oh!

UdiBen commented Mar 25, 2026

Uh oh!

caoergou commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

caoergou commented Mar 7, 2026

Problem

Solution

Changes

Performance Impact

Testing

Uh oh!

UdiBen left a comment

Choose a reason for hiding this comment

Uh oh!

UdiBen Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caoergou commented Mar 12, 2026

Uh oh!

UdiBen commented Mar 12, 2026

Uh oh!

caoergou commented Mar 20, 2026

1. Prettier formatting ✅

2. E2E test failures ✅

Uh oh!

UdiBen commented Mar 25, 2026

Uh oh!

caoergou commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UdiBen Mar 11, 2026 •

edited

Loading