Skip to content

Conversation

@MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Nov 7, 2025

Summary

Reduce the memory footprint of NotebookIndex from O(lines) to O(cells).

The NotebookIndex stores a mapping from rows in the concatenated notebook document
to the relative row numbers within each cell and from absolute line number to the corresponding cell.

The way this is implemented today is by having a vector with one entry for every absolute row number
where the value is the index of the cell. While this allows O(1) lookup, it does require an entry for every single line.

This PR rewrites our representation (and uses one Vec instead of two) to only store the
start line number per cell and use a binary search to find to which cell a line number belongs.
This makes lookups slightly slower (from O(1) to O(log(n)) but reduces memory usage
from O(rows) to O(cells).

I noticed this improvement when working on ty's notebook support but decided to extract it into its own PR to make reviewing easier.

Test Plan

Existing tests

@MichaReiser MichaReiser added the internal An internal refactor or improvement label Nov 7, 2025
@MichaReiser MichaReiser force-pushed the micha/refactor-notebook-index branch from c7340bd to fcd67f2 Compare November 7, 2025 16:31
@MichaReiser MichaReiser changed the title Reduce memory footprint of notebooks Reduce notebook memory footprint Nov 7, 2025
@MichaReiser MichaReiser marked this pull request as ready for review November 7, 2025 16:31
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@MichaReiser MichaReiser merged commit 36cce34 into main Nov 11, 2025
38 checks passed
@MichaReiser MichaReiser deleted the micha/refactor-notebook-index branch November 11, 2025 09:43
.map(|(row, cell)| (OneIndexed::from_zero_indexed(row), *cell))
/// Returns an iterator over the starting rows of each cell (1-based).
///
/// This yields one entry per Python cell (skipping over Makrdown cell).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This yields one entry per Python cell (skipping over Makrdown cell).
/// This yields one entry per Python cell (skipping over Markdown cell).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I pushed the fix to #21175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants