Harfrust for Text Shaping #400

taj-p · 2025-08-19T03:51:10Z

Migrate text shaping from Swash to Harfrust

This PR migrates Parley’s shaping engine from Swash to Harfrust, while continuing to use Swash for text analysis and some example rendering.

The goal is to align Parley’s shaping behavior with HarfBuzz parity via Harfrust to improve correctness across complex scripts (for example, see the improved and updated RTL Arabic snapshots).

The changes are largely internal and should have no (besides .synthesis(), I believe) external API changes.

What changed

Shaping pipeline
- Replaced Swash shaping with Harfrust
- Continued using Swash for text analysis (segmentation, clustering, word boundaries).
Layout and run storage
- Internal run data stores Harfrust results and variation coordinates.
- Parley handling of variation coordinates for variable fonts.

Follow-ups

We're going to next look at using ICU4X for text analysis.

cc @conor-93

taj-p · 2025-08-19T03:52:57Z

parley/src/shape.rs

+
+        // Create harfrust shaper
+        // TODO: cache this upstream?
+        let shaper_data = harfrust::ShaperData::new(&font_ref);


I'm not sure what the intention behind ShaperData and Shaper is. Should we be caching these per font upstream somewhere or allocating them as needed here?

See: harfbuzz/harfrust#57. I believe that ideally all of ShaperData, ShaperInstance and ShapePlan should be cached. Exactly how best to do this I'm not sure.

That's a brilliant reference. Thank you. I'll spend some time later today thinking about how we might better cache these structures.

Internally, swash uses an LRU cache for this kind of data. I think we should do the same here but I'm happy to defer that to a follow up.

Note, we just added a ShapePlanKey type to HarfRust that is meant to assist with caching shape plans.

Ah, perfect. Yes, that makes sense. If possible, could I please look at moving these to an LRU cache separately to this PR?

I'd like to add benchmarks to Parley to test different caching approaches during development, but I think this PR is already quite large.

Absolutely. Let's do caching in a separate PR.

taj-p · 2025-08-19T07:19:52Z

parley/src/layout/data.rs

+            .unwrap_or(2048);
+
+        let (underline_offset, underline_size) = {
+            let post = font.post().unwrap(); // TODO: Handle invalid font?


How do we want to handle invalid fonts or these missing tables?

HarfBuzz has hb_ot_metrics_get_position_with_fallback() that uses fallback approximations for many of these metrics. I don’t think HarfRust has any of this, but maybe you can check HarfBuzz code and use similar fallback.

Thank you very much for this comment. I ended up using HarfBuzz defaults in 06a9650

taj-p · 2025-08-19T07:33:13Z

parley/src/layout/data.rs

+impl FontMetrics {
+    fn from(font: &skrifa::FontRef<'_>) -> Self {
+        use skrifa::raw::{TableProvider, tables::os2::SelectionFlags};
+        // NOTE: This _does not_ copy harfrust's metrics behaviour (https://github.com/harfbuzz/harfrust/blob/a38025fb336230b492366740c86021bb406bcd0d/src/hb/glyph_metrics.rs#L55-L60).


Instead, this copies the behaviour from Swash (choosing typographic metrics only if the relevant fs selection bit is set). I was thinking we could punt that behavioural change to another discussion/PR since it will result in significant changes to visual output?

I'd recommend just using skrifa for this. In particular, it handles things like metrics variations using the MVAR table and hides a lot of the nastiness. Take a look at https://docs.rs/skrifa/latest/skrifa/metrics/struct.Metrics.html

So much better!! Used Skrifa in 06a9650

FYI: Skrifa calculates strikeout thickness differently to swash.

Swash uses post.underline_size()

https://github.com/dfrg/swash/blob/356c725e8e5800a0fecceddc047cbaacd73cb622/src/metrics.rs#L185

Skrifa uses the os2 table:

https://github.com/googlefonts/fontations/blob/830f97905764e0c285c39e6360819770ab19ec80/skrifa/src/metrics.rs#L168

The difference has meant that some snapshots needed to be updated.

When in doubt, it's usually safe to assume that skrifa is more correct :)

taj-p · 2025-08-19T07:52:46Z

parley/tests/snapshots/base_level_alignment_rtl-start.png

I confirmed with an Arabic speaker that the new snapshot is an improvement on the deleted. The deleted snapshot was apparently almost illegible.

nicoburns

A bit of a surface-level review, but I've left some notes.

nicoburns · 2025-08-19T11:14:15Z

parley/src/swash_convert.rs

@@ -66,3 +58,163 @@ const SCRIPT_TAGS: [[u8; 4]; 157] = [
    *b"Tirh", *b"Ugar", *b"Vaii", *b"Wara", *b"Wcho", *b"Xpeo", *b"Xsux", *b"Yezi", *b"Yiii",
    *b"Zanb", *b"Zinh", *b"Zyyy", *b"Zzzz",
 ];
+
+const HARFRUST_SCRIPT_TAGS: [harfrust::Script; 157] = [


https://docs.rs/harfrust/latest/harfrust/struct.Script.html#method.from_iso15924_tag may be relevant here.

Originally, I opted not to use this method because a single lookup over an array is faster than this implementation, but I think you're right. We should go the simpler route for now and add in this complexity if we find it worthwhile. Since this will be re-evaluated in the near future (with the adoption of ICU4X for text analysis), I also think simpler is better for now.

Implemented in c9cc74d

nicoburns · 2025-08-19T11:17:48Z

parley/src/shape.rs

+
+        // Create harfrust shaper
+        // TODO: cache this upstream?
+        let shaper_data = harfrust::ShaperData::new(&font_ref);


See: harfbuzz/harfrust#57. I believe that ideally all of ShaperData, ShaperInstance and ShapePlan should be cached. Exactly how best to do this I'm not sure.

parley/src/shape.rs

nicoburns · 2025-08-19T11:26:03Z

parley/src/shape.rs

+        // Use the entire segment text including newlines
+        for (i, ch) in segment_text.chars().enumerate() {
+            // Ensure that each cluster's index matches the index into `infos`. This is required
+            // for efficient cluster lookup within `data.rs`.
+            buffer.add(ch, i as u32);


I think this can just be buffer.push_str(segment_text). But perhaps this API willl be needed for text that's pre-segmented with icu4x?

This is a good suggestion. It's what I initially tried, but I found the .add approach to produce a simpler system.

Using .push_str produces different cluster indices than the existing implementation. I've updated the docs in ef0bb64 and 04e76a8 to help explain why we want to enumerate the chars instead of use push_str. Essentially, adding clusters like this to the buffer means that the cluster indices directly map to the corresponding character information from swash in infos.

.push_str could theoretically work, but it would require an additional mapping from cluster index to infos index within push_run of data.rs. Right now, push_run is pretty complex and when I inspected push_run, it doesn't seem much cheaper than iterative .add, so I'd prefer keep that complexity out of this initial PR.

Hmm.... interesting. I see that push_str is defined as:

fn push_str(&mut self, text: &str) { if !self.ensure(self.len + text.chars().count()) { return; } for (i, c) in text.char_indices() { self.info[self.len] = GlyphInfo { glyph_id: c as u32, cluster: i as u32, ..GlyphInfo::default() }; self.len += 1; } }

The key difference here seems to be that it's calling char_indices() rather than .chars().enumerate() so I think you end up with a the "cluster" being the byte offset of the char rather than the index of the char.

Might be nice if we could get a method added to harfrust that does .chars().enumerate() whilst also also doing the single allocation for the required space in the buffer?

As of HarfRust 0.1.2, UnicodeBuffer now has a reserve method.

As of HarfRust 0.1.2, UnicodeBuffer now has a reserve method.

Nice! Updated to 0.1.2 and used reserve in 61fccc4

parley/src/shape.rs

parley/src/layout/data.rs

nicoburns · 2025-08-20T21:45:37Z

parley/src/layout/data.rs

+    glyphs: &mut Vec<Glyph>,
+    scale_factor: f32,
+    glyph_infos: &[harfrust::GlyphInfo],
+    glyph_positions: &[harfrust::GlyphPosition],
+    char_infos: &[(swash::text::cluster::CharInfo, u16)],
+    char_indices_iter: I,
+) -> f32 {
+    let mut char_indices_iter = char_indices_iter.peekable();
+    let mut cluster_start_char = char_indices_iter.next().unwrap();


It would be useful to document the parameters of this function. In particular:

Which parameters are inputs and which are outputs

How the mutable parameters get mutated

Good idea 🙏 . Updated in a0920d1

README.md

dfrg

It’s very exciting to finally have robust shaping in parley! LGTM

xorgy · 2025-08-21T13:31:09Z

I'm giving this a try with some samples. :+ )

xorgy · 2025-08-21T13:55:53Z

Works great with my programs using Parley, and immediately does something that has been missing for me (automatic whole number fractions with U+2044 Fraction Slash e.g. ‘37⁄64’), flawlessly. :+ )
It is a little bit slower, but not so much slower than Swash that it noticeably changes the interaction with the overall programs (and the way I am calling Parley in these programs is not particularly efficient).

dfrg · 2025-08-21T14:30:20Z

Glad to see your fractions are working now :)

Caching the shaper data structures should improve the performance significantly and I suspect the end result will be much faster than Swash.

tomcur · 2025-08-21T15:20:57Z

parley/src/layout/data.rs

+        // HarfBuzz returns glyphs in visual order, so we need to process them as such while
+        // maintaining logical ordering of clusters.


Should this say HarfRust instead?

Suggested change

// HarfBuzz returns glyphs in visual order, so we need to process them as such while

// maintaining logical ordering of clusters.

// HarfRust returns glyphs in visual order, so we need to process them as such while

// maintaining logical ordering of clusters.

xorgy · 2025-08-21T15:22:49Z

Very pleased. :+ )

harfcap.webm

nicoburns · 2025-08-21T20:48:25Z

CI job seemed to timeout for some reason. I'm going to retry it.

StewartCanva and others added 13 commits July 16, 2025 17:06

test changes

b6b0d0a

Compiling and running but not working

8cad993

linebreaking different but actually linebreaking

21383f7

passing font variations

f9f1aab

push variations, but hardcoded order

e7d0667

push actual axis normalized and in correct order

beafd28

script conversion

3b17133

working tree commits

2eee448

some cleanup

912865c

more cleanup + todos

7ebe0fb

more todo

0736ae7

Squashed commits of harfrust-migration2

56df45e

Merge remote-tracking branch 'origin/main' into tajp/harfrust-migration3

d061792

taj-p commented Aug 19, 2025

View reviewed changes

.

a97cfff

taj-p force-pushed the tajp/harfrust-migration3 branch from 21d7d77 to a97cfff Compare August 19, 2025 04:00

Clippy

33d74be

taj-p force-pushed the tajp/harfrust-migration3 branch from 9dc5e50 to 38c593b Compare August 19, 2025 06:51

.

e478d1d

taj-p force-pushed the tajp/harfrust-migration3 branch from 38c593b to e478d1d Compare August 19, 2025 06:54

taj-p marked this pull request as ready for review August 19, 2025 07:12

taj-p requested a review from dfrg August 19, 2025 07:17

taj-p commented Aug 19, 2025

View reviewed changes

nicoburns reviewed Aug 19, 2025

View reviewed changes

taj-p added 4 commits August 20, 2025 05:07

Don't collect variations

2f118f5

Fix documentation

ef0bb64

Use optional unicode buffer

c1ad1a9

Use Feature::new

317882c

nicoburns reviewed Aug 20, 2025

View reviewed changes

parley/src/layout/data.rs Outdated Show resolved Hide resolved

nicoburns reviewed Aug 20, 2025

View reviewed changes

taj-p added 8 commits August 21, 2025 08:25

Move scratch clusters to LayoutDataContext

efb6e2d

Use harfrust coords

b8c1b51

Use Skrifa for font metrics

06a9650

Use clusters reverse

39b3e72

Update snapshots

f456338

Process clusters documentation

a0920d1

cargo fmt

a63ae9e

.

4600ff6

taj-p commented Aug 21, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

2643008

taj-p commented Aug 21, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update README.md

0a6da9b

taj-p requested a review from dfrg August 21, 2025 07:35

dfrg approved these changes Aug 21, 2025

View reviewed changes

tomcur reviewed Aug 21, 2025

View reviewed changes

HarfBuzz -> HarfRust

4495cbd

taj-p enabled auto-merge August 21, 2025 18:33

taj-p added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025

nicoburns added this pull request to the merge queue Aug 21, 2025

Merged via the queue into main with commit 36c92c4 Aug 21, 2025
24 checks passed

nicoburns deleted the tajp/harfrust-migration3 branch August 21, 2025 20:54

		// HarfBuzz returns glyphs in visual order, so we need to process them as such while
		// maintaining logical ordering of clusters.

Harfrust for Text Shaping #400

Harfrust for Text Shaping #400

Conversation

taj-p commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migrate text shaping from Swash to Harfrust

What changed

Follow-ups

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taj-p Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicoburns left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taj-p Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dfrg left a comment

Choose a reason for hiding this comment

Uh oh!

xorgy commented Aug 21, 2025

Uh oh!

xorgy commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taj-p commented Aug 19, 2025 •

edited

Loading

taj-p Aug 19, 2025 •

edited

Loading

taj-p Aug 19, 2025 •

edited

Loading

xorgy commented Aug 21, 2025 •

edited

Loading

tomcur Aug 21, 2025 •

edited

Loading

xorgy commented Aug 21, 2025 •

edited

Loading