Skip to content

Harfrust for Text Shaping #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Aug 21, 2025
Merged

Harfrust for Text Shaping #400

merged 35 commits into from
Aug 21, 2025

Conversation

taj-p
Copy link
Contributor

@taj-p taj-p commented Aug 19, 2025

Migrate text shaping from Swash to Harfrust

This PR migrates Parley’s shaping engine from Swash to Harfrust, while continuing to use Swash for text analysis and some example rendering.

The goal is to align Parley’s shaping behavior with HarfBuzz parity via Harfrust to improve correctness across complex scripts (for example, see the improved and updated RTL Arabic snapshots).

The changes are largely internal and should have no (besides .synthesis(), I believe) external API changes.

What changed

  • Shaping pipeline

    • Replaced Swash shaping with Harfrust
    • Continued using Swash for text analysis (segmentation, clustering, word boundaries).
  • Layout and run storage

    • Internal run data stores Harfrust results and variation coordinates.
    • Parley handling of variation coordinates for variable fonts.

Follow-ups

  • We're going to next look at using ICU4X for text analysis.

cc @conor-93


// Create harfrust shaper
// TODO: cache this upstream?
let shaper_data = harfrust::ShaperData::new(&font_ref);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the intention behind ShaperData and Shaper is. Should we be caching these per font upstream somewhere or allocating them as needed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: harfbuzz/harfrust#57. I believe that ideally all of ShaperData, ShaperInstance and ShapePlan should be cached. Exactly how best to do this I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a brilliant reference. Thank you. I'll spend some time later today thinking about how we might better cache these structures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, swash uses an LRU cache for this kind of data. I think we should do the same here but I'm happy to defer that to a follow up.

Note, we just added a ShapePlanKey type to HarfRust that is meant to assist with caching shape plans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, perfect. Yes, that makes sense. If possible, could I please look at moving these to an LRU cache separately to this PR?

I'd like to add benchmarks to Parley to test different caching approaches during development, but I think this PR is already quite large.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Let's do caching in a separate PR.

@taj-p taj-p force-pushed the tajp/harfrust-migration3 branch from 21d7d77 to a97cfff Compare August 19, 2025 04:00
@taj-p taj-p force-pushed the tajp/harfrust-migration3 branch from 9dc5e50 to 38c593b Compare August 19, 2025 06:51
@taj-p taj-p force-pushed the tajp/harfrust-migration3 branch from 38c593b to e478d1d Compare August 19, 2025 06:54
@taj-p taj-p marked this pull request as ready for review August 19, 2025 07:12
@taj-p taj-p requested a review from dfrg August 19, 2025 07:17
.unwrap_or(2048);

let (underline_offset, underline_size) = {
let post = font.post().unwrap(); // TODO: Handle invalid font?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we want to handle invalid fonts or these missing tables?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HarfBuzz has hb_ot_metrics_get_position_with_fallback() that uses fallback approximations for many of these metrics. I don’t think HarfRust has any of this, but maybe you can check HarfBuzz code and use similar fallback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this comment. I ended up using HarfBuzz defaults in 06a9650

impl FontMetrics {
fn from(font: &skrifa::FontRef<'_>) -> Self {
use skrifa::raw::{TableProvider, tables::os2::SelectionFlags};
// NOTE: This _does not_ copy harfrust's metrics behaviour (https://github.com/harfbuzz/harfrust/blob/a38025fb336230b492366740c86021bb406bcd0d/src/hb/glyph_metrics.rs#L55-L60).
Copy link
Contributor Author

@taj-p taj-p Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, this copies the behaviour from Swash (choosing typographic metrics only if the relevant fs selection bit is set). I was thinking we could punt that behavioural change to another discussion/PR since it will result in significant changes to visual output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend just using skrifa for this. In particular, it handles things like metrics variations using the MVAR table and hides a lot of the nastiness. Take a look at https://docs.rs/skrifa/latest/skrifa/metrics/struct.Metrics.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better!! Used Skrifa in 06a9650

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Skrifa calculates strikeout thickness differently to swash.

Swash uses post.underline_size()

https://github.com/dfrg/swash/blob/356c725e8e5800a0fecceddc047cbaacd73cb622/src/metrics.rs#L185

Skrifa uses the os2 table:

https://github.com/googlefonts/fontations/blob/830f97905764e0c285c39e6360819770ab19ec80/skrifa/src/metrics.rs#L168

The difference has meant that some snapshots needed to be updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When in doubt, it's usually safe to assume that skrifa is more correct :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed with an Arabic speaker that the new snapshot is an improvement on the deleted. The deleted snapshot was apparently almost illegible.

Copy link
Contributor

@nicoburns nicoburns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a surface-level review, but I've left some notes.

@@ -66,3 +58,163 @@ const SCRIPT_TAGS: [[u8; 4]; 157] = [
*b"Tirh", *b"Ugar", *b"Vaii", *b"Wara", *b"Wcho", *b"Xpeo", *b"Xsux", *b"Yezi", *b"Yiii",
*b"Zanb", *b"Zinh", *b"Zyyy", *b"Zzzz",
];

const HARFRUST_SCRIPT_TAGS: [harfrust::Script; 157] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@taj-p taj-p Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I opted not to use this method because a single lookup over an array is faster than this implementation, but I think you're right. We should go the simpler route for now and add in this complexity if we find it worthwhile. Since this will be re-evaluated in the near future (with the adoption of ICU4X for text analysis), I also think simpler is better for now.

Implemented in c9cc74d


// Create harfrust shaper
// TODO: cache this upstream?
let shaper_data = harfrust::ShaperData::new(&font_ref);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: harfbuzz/harfrust#57. I believe that ideally all of ShaperData, ShaperInstance and ShapePlan should be cached. Exactly how best to do this I'm not sure.

Comment on lines 311 to 315
// Use the entire segment text including newlines
for (i, ch) in segment_text.chars().enumerate() {
// Ensure that each cluster's index matches the index into `infos`. This is required
// for efficient cluster lookup within `data.rs`.
buffer.add(ch, i as u32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can just be buffer.push_str(segment_text). But perhaps this API willl be needed for text that's pre-segmented with icu4x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion. It's what I initially tried, but I found the .add approach to produce a simpler system.

Using .push_str produces different cluster indices than the existing implementation. I've updated the docs in ef0bb64 and 04e76a8 to help explain why we want to enumerate the chars instead of use push_str. Essentially, adding clusters like this to the buffer means that the cluster indices directly map to the corresponding character information from swash in infos.

.push_str could theoretically work, but it would require an additional mapping from cluster index to infos index within push_run of data.rs. Right now, push_run is pretty complex and when I inspected push_run, it doesn't seem much cheaper than iterative .add, so I'd prefer keep that complexity out of this initial PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... interesting. I see that push_str is defined as:

    fn push_str(&mut self, text: &str) {
        if !self.ensure(self.len + text.chars().count()) {
            return;
        }

        for (i, c) in text.char_indices() {
            self.info[self.len] = GlyphInfo {
                glyph_id: c as u32,
                cluster: i as u32,
                ..GlyphInfo::default()
            };
            self.len += 1;
        }
    }

The key difference here seems to be that it's calling char_indices() rather than .chars().enumerate() so I think you end up with a the "cluster" being the byte offset of the char rather than the index of the char.

Might be nice if we could get a method added to harfrust that does .chars().enumerate() whilst also also doing the single allocation for the required space in the buffer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of HarfRust 0.1.2, UnicodeBuffer now has a reserve method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of HarfRust 0.1.2, UnicodeBuffer now has a reserve method.

Nice! Updated to 0.1.2 and used reserve in 61fccc4

Comment on lines +620 to +628
glyphs: &mut Vec<Glyph>,
scale_factor: f32,
glyph_infos: &[harfrust::GlyphInfo],
glyph_positions: &[harfrust::GlyphPosition],
char_infos: &[(swash::text::cluster::CharInfo, u16)],
char_indices_iter: I,
) -> f32 {
let mut char_indices_iter = char_indices_iter.peekable();
let mut cluster_start_char = char_indices_iter.next().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to document the parameters of this function. In particular:

  • Which parameters are inputs and which are outputs
  • How the mutable parameters get mutated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea 🙏 . Updated in a0920d1

@taj-p taj-p requested a review from dfrg August 21, 2025 07:35
Copy link
Collaborator

@dfrg dfrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s very exciting to finally have robust shaping in parley! LGTM

@xorgy
Copy link
Member

xorgy commented Aug 21, 2025

I'm giving this a try with some samples. :+ )

@xorgy
Copy link
Member

xorgy commented Aug 21, 2025

Works great with my programs using Parley, and immediately does something that has been missing for me (automatic whole number fractions with U+2044 Fraction Slash e.g. ‘37⁄64’), flawlessly. :+ )
It is a little bit slower, but not so much slower than Swash that it noticeably changes the interaction with the overall programs (and the way I am calling Parley in these programs is not particularly efficient).

@dfrg
Copy link
Collaborator

dfrg commented Aug 21, 2025

Glad to see your fractions are working now :)

Caching the shaper data structures should improve the performance significantly and I suspect the end result will be much faster than Swash.

Comment on lines 427 to 428
// HarfBuzz returns glyphs in visual order, so we need to process them as such while
// maintaining logical ordering of clusters.
Copy link
Member

@tomcur tomcur Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say HarfRust instead?

Suggested change
// HarfBuzz returns glyphs in visual order, so we need to process them as such while
// maintaining logical ordering of clusters.
// HarfRust returns glyphs in visual order, so we need to process them as such while
// maintaining logical ordering of clusters.

@xorgy
Copy link
Member

xorgy commented Aug 21, 2025

Very pleased. :+ )

harfcap.webm

@taj-p taj-p enabled auto-merge August 21, 2025 18:33
@taj-p taj-p added this pull request to the merge queue Aug 21, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2025
@nicoburns nicoburns added this pull request to the merge queue Aug 21, 2025
@nicoburns
Copy link
Contributor

CI job seemed to timeout for some reason. I'm going to retry it.

Merged via the queue into main with commit 36c92c4 Aug 21, 2025
24 checks passed
@nicoburns nicoburns deleted the tajp/harfrust-migration3 branch August 21, 2025 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants