Skip to content

Make Rust htmldiff optional with pure Python fallback#34

Open
jpvelez wants to merge 1 commit intotrevorcampbell:mainfrom
switchbox-data:optional-rust-htmldiff
Open

Make Rust htmldiff optional with pure Python fallback#34
jpvelez wants to merge 1 commit intotrevorcampbell:mainfrom
switchbox-data:optional-rust-htmldiff

Conversation

@jpvelez
Copy link
Copy Markdown

@jpvelez jpvelez commented Mar 14, 2026

Problem

The Rust/PyO3 extension module (htmldiff) requires a Rust toolchain and maturin to build from source. There are no precompiled wheels on PyPI for common platforms:

  • Apple Silicon (darwin-arm64)
  • Python >= 3.13

This means pip install website_diff fails for many users unless they have Rust installed. Additionally, maturin is listed as a runtime dependency even though it's only needed at build time.

What this PR does

Adds a pure Python fallback so website_diff works out of the box on any platform, while still using the Rust extension when it's available.

Specifically:

  1. website_diff/_htmldiff_py.py — Pure Python reimplementation of the HTML tokenizer (html.rs) and Wu-Manber-Myers O(NP) diff algorithm (wu.rs, builder.rs). ~180 lines.
  2. website_diff/htmldiff.py — Wrapper module that tries to import the Rust extension first, falls back to Python:
    try:
        from website_diff._htmldiff_rs import _htmldiff
    except ImportError:
        from website_diff._htmldiff_py import _htmldiff
  3. Rust module renamed from website_diff.htmldiffwebsite_diff._htmldiff_rs (in pyproject.toml and src/lib.rs).
  4. maturin removed from runtime dependencies (it's already in [build-system] requires).

No changes to page.py or any other consumer — import website_diff.htmldiff as hd continues to work unchanged.

Correctness

The pure Python implementation produces byte-identical output to the Rust version. Tested on:

  • Simple HTML fragments with text changes
  • A real 235 KB Quarto report HTML file with dollar-value changes in tables and prose

Performance

Benchmarked on a 235 KB Quarto-rendered HTML report (macOS, Apple Silicon):

Implementation Time
Rust (PyO3) 2 ms
Pure Python 46 ms
Slowdown ~20x

The 44 ms difference is negligible in practice. HTML text diffing accounts for <1% of website_diff's total runtime — image diffing (Pillow), SVG pre-rendering (cairosvg), and Plotly/Altair conversion (selenium/vl-convert) dominate execution time by orders of magnitude.

Why not just remove Rust?

This PR makes Rust optional rather than removing it, so existing users with Rust toolchains get the same behavior. But the Rust component is simple enough (~200 lines across 4 files) that removing it entirely would also be reasonable — the pure Python version is a faithful, tested drop-in.

The Rust/PyO3 extension module requires a Rust toolchain and maturin to
build from source, and has no precompiled wheels for platforms like Apple
Silicon (darwin-arm64) or Python >= 3.13.

This commit adds a pure Python fallback that produces byte-identical
output and is used automatically when the compiled extension is
unavailable.

Changes:
- Add website_diff/_htmldiff_py.py (pure Python implementation)
- Add website_diff/htmldiff.py (wrapper: tries Rust, falls back to Python)
- Rename Rust module from website_diff.htmldiff to website_diff._htmldiff_rs
- Remove maturin from runtime dependencies (it's build-time only)

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant