Make Rust htmldiff optional with pure Python fallback#34
Open
jpvelez wants to merge 1 commit intotrevorcampbell:mainfrom
Open
Make Rust htmldiff optional with pure Python fallback#34jpvelez wants to merge 1 commit intotrevorcampbell:mainfrom
jpvelez wants to merge 1 commit intotrevorcampbell:mainfrom
Conversation
The Rust/PyO3 extension module requires a Rust toolchain and maturin to build from source, and has no precompiled wheels for platforms like Apple Silicon (darwin-arm64) or Python >= 3.13. This commit adds a pure Python fallback that produces byte-identical output and is used automatically when the compiled extension is unavailable. Changes: - Add website_diff/_htmldiff_py.py (pure Python implementation) - Add website_diff/htmldiff.py (wrapper: tries Rust, falls back to Python) - Rename Rust module from website_diff.htmldiff to website_diff._htmldiff_rs - Remove maturin from runtime dependencies (it's build-time only) Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Rust/PyO3 extension module (
htmldiff) requires a Rust toolchain and maturin to build from source. There are no precompiled wheels on PyPI for common platforms:This means
pip install website_difffails for many users unless they have Rust installed. Additionally,maturinis listed as a runtime dependency even though it's only needed at build time.What this PR does
Adds a pure Python fallback so
website_diffworks out of the box on any platform, while still using the Rust extension when it's available.Specifically:
website_diff/_htmldiff_py.py— Pure Python reimplementation of the HTML tokenizer (html.rs) and Wu-Manber-Myers O(NP) diff algorithm (wu.rs,builder.rs). ~180 lines.website_diff/htmldiff.py— Wrapper module that tries to import the Rust extension first, falls back to Python:website_diff.htmldiff→website_diff._htmldiff_rs(inpyproject.tomlandsrc/lib.rs).maturinremoved from runtimedependencies(it's already in[build-system] requires).No changes to
page.pyor any other consumer —import website_diff.htmldiff as hdcontinues to work unchanged.Correctness
The pure Python implementation produces byte-identical output to the Rust version. Tested on:
Performance
Benchmarked on a 235 KB Quarto-rendered HTML report (macOS, Apple Silicon):
The 44 ms difference is negligible in practice. HTML text diffing accounts for <1% of
website_diff's total runtime — image diffing (Pillow), SVG pre-rendering (cairosvg), and Plotly/Altair conversion (selenium/vl-convert) dominate execution time by orders of magnitude.Why not just remove Rust?
This PR makes Rust optional rather than removing it, so existing users with Rust toolchains get the same behavior. But the Rust component is simple enough (~200 lines across 4 files) that removing it entirely would also be reasonable — the pure Python version is a faithful, tested drop-in.