-
Notifications
You must be signed in to change notification settings - Fork 44
Harfrust for Text Shaping #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
// Create harfrust shaper | ||
// TODO: cache this upstream? | ||
let shaper_data = harfrust::ShaperData::new(&font_ref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the intention behind ShaperData
and Shaper
is. Should we be caching these per font upstream somewhere or allocating them as needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: harfbuzz/harfrust#57. I believe that ideally all of ShaperData
, ShaperInstance
and ShapePlan
should be cached. Exactly how best to do this I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a brilliant reference. Thank you. I'll spend some time later today thinking about how we might better cache these structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally, swash uses an LRU cache for this kind of data. I think we should do the same here but I'm happy to defer that to a follow up.
Note, we just added a ShapePlanKey
type to HarfRust that is meant to assist with caching shape plans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, perfect. Yes, that makes sense. If possible, could I please look at moving these to an LRU cache separately to this PR?
I'd like to add benchmarks to Parley to test different caching approaches during development, but I think this PR is already quite large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Let's do caching in a separate PR.
21d7d77
to
a97cfff
Compare
9dc5e50
to
38c593b
Compare
38c593b
to
e478d1d
Compare
parley/src/layout/data.rs
Outdated
.unwrap_or(2048); | ||
|
||
let (underline_offset, underline_size) = { | ||
let post = font.post().unwrap(); // TODO: Handle invalid font? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we want to handle invalid fonts or these missing tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HarfBuzz has hb_ot_metrics_get_position_with_fallback()
that uses fallback approximations for many of these metrics. I don’t think HarfRust has any of this, but maybe you can check HarfBuzz code and use similar fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this comment. I ended up using HarfBuzz defaults in 06a9650
parley/src/layout/data.rs
Outdated
impl FontMetrics { | ||
fn from(font: &skrifa::FontRef<'_>) -> Self { | ||
use skrifa::raw::{TableProvider, tables::os2::SelectionFlags}; | ||
// NOTE: This _does not_ copy harfrust's metrics behaviour (https://github.com/harfbuzz/harfrust/blob/a38025fb336230b492366740c86021bb406bcd0d/src/hb/glyph_metrics.rs#L55-L60). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, this copies the behaviour from Swash (choosing typographic metrics only if the relevant fs selection bit is set). I was thinking we could punt that behavioural change to another discussion/PR since it will result in significant changes to visual output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend just using skrifa for this. In particular, it handles things like metrics variations using the MVAR table and hides a lot of the nastiness. Take a look at https://docs.rs/skrifa/latest/skrifa/metrics/struct.Metrics.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much better!! Used Skrifa in 06a9650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Skrifa calculates strikeout thickness differently to swash.
Swash uses post.underline_size()
https://github.com/dfrg/swash/blob/356c725e8e5800a0fecceddc047cbaacd73cb622/src/metrics.rs#L185
Skrifa uses the os2 table:
The difference has meant that some snapshots needed to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When in doubt, it's usually safe to assume that skrifa is more correct :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed with an Arabic speaker that the new snapshot is an improvement on the deleted. The deleted snapshot was apparently almost illegible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a surface-level review, but I've left some notes.
parley/src/swash_convert.rs
Outdated
@@ -66,3 +58,163 @@ const SCRIPT_TAGS: [[u8; 4]; 157] = [ | |||
*b"Tirh", *b"Ugar", *b"Vaii", *b"Wara", *b"Wcho", *b"Xpeo", *b"Xsux", *b"Yezi", *b"Yiii", | |||
*b"Zanb", *b"Zinh", *b"Zyyy", *b"Zzzz", | |||
]; | |||
|
|||
const HARFRUST_SCRIPT_TAGS: [harfrust::Script; 157] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/harfrust/latest/harfrust/struct.Script.html#method.from_iso15924_tag may be relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I opted not to use this method because a single lookup over an array is faster than this implementation, but I think you're right. We should go the simpler route for now and add in this complexity if we find it worthwhile. Since this will be re-evaluated in the near future (with the adoption of ICU4X for text analysis), I also think simpler is better for now.
Implemented in c9cc74d
|
||
// Create harfrust shaper | ||
// TODO: cache this upstream? | ||
let shaper_data = harfrust::ShaperData::new(&font_ref); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: harfbuzz/harfrust#57. I believe that ideally all of ShaperData
, ShaperInstance
and ShapePlan
should be cached. Exactly how best to do this I'm not sure.
parley/src/shape.rs
Outdated
// Use the entire segment text including newlines | ||
for (i, ch) in segment_text.chars().enumerate() { | ||
// Ensure that each cluster's index matches the index into `infos`. This is required | ||
// for efficient cluster lookup within `data.rs`. | ||
buffer.add(ch, i as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be buffer.push_str(segment_text)
. But perhaps this API willl be needed for text that's pre-segmented with icu4x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good suggestion. It's what I initially tried, but I found the .add
approach to produce a simpler system.
Using .push_str
produces different cluster indices than the existing implementation. I've updated the docs in ef0bb64 and 04e76a8 to help explain why we want to enumerate the chars instead of use push_str
. Essentially, adding clusters like this to the buffer means that the cluster indices directly map to the corresponding character information from swash in infos
.
.push_str
could theoretically work, but it would require an additional mapping from cluster index to infos
index within push_run
of data.rs
. Right now, push_run
is pretty complex and when I inspected push_run
, it doesn't seem much cheaper than iterative .add
, so I'd prefer keep that complexity out of this initial PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... interesting. I see that push_str
is defined as:
fn push_str(&mut self, text: &str) {
if !self.ensure(self.len + text.chars().count()) {
return;
}
for (i, c) in text.char_indices() {
self.info[self.len] = GlyphInfo {
glyph_id: c as u32,
cluster: i as u32,
..GlyphInfo::default()
};
self.len += 1;
}
}
The key difference here seems to be that it's calling char_indices()
rather than .chars().enumerate()
so I think you end up with a the "cluster" being the byte offset of the char rather than the index of the char.
Might be nice if we could get a method added to harfrust
that does .chars().enumerate()
whilst also also doing the single allocation for the required space in the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of HarfRust 0.1.2, UnicodeBuffer now has a reserve method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of HarfRust 0.1.2, UnicodeBuffer now has a reserve method.
Nice! Updated to 0.1.2 and used reserve in 61fccc4
glyphs: &mut Vec<Glyph>, | ||
scale_factor: f32, | ||
glyph_infos: &[harfrust::GlyphInfo], | ||
glyph_positions: &[harfrust::GlyphPosition], | ||
char_infos: &[(swash::text::cluster::CharInfo, u16)], | ||
char_indices_iter: I, | ||
) -> f32 { | ||
let mut char_indices_iter = char_indices_iter.peekable(); | ||
let mut cluster_start_char = char_indices_iter.next().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to document the parameters of this function. In particular:
- Which parameters are inputs and which are outputs
- How the mutable parameters get mutated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea 🙏 . Updated in a0920d1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s very exciting to finally have robust shaping in parley! LGTM
I'm giving this a try with some samples. :+ ) |
Works great with my programs using Parley, and immediately does something that has been missing for me (automatic whole number fractions with U+2044 Fraction Slash e.g. ‘37⁄64’), flawlessly. :+ ) |
Glad to see your fractions are working now :) Caching the shaper data structures should improve the performance significantly and I suspect the end result will be much faster than Swash. |
parley/src/layout/data.rs
Outdated
// HarfBuzz returns glyphs in visual order, so we need to process them as such while | ||
// maintaining logical ordering of clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say HarfRust
instead?
// HarfBuzz returns glyphs in visual order, so we need to process them as such while | |
// maintaining logical ordering of clusters. | |
// HarfRust returns glyphs in visual order, so we need to process them as such while | |
// maintaining logical ordering of clusters. |
Very pleased. :+ ) harfcap.webm |
CI job seemed to timeout for some reason. I'm going to retry it. |
Migrate text shaping from Swash to Harfrust
This PR migrates Parley’s shaping engine from Swash to Harfrust, while continuing to use Swash for text analysis and some example rendering.
The goal is to align Parley’s shaping behavior with HarfBuzz parity via Harfrust to improve correctness across complex scripts (for example, see the improved and updated RTL Arabic snapshots).
The changes are largely internal and should have no (besides
.synthesis()
, I believe) external API changes.What changed
Shaping pipeline
Layout and run storage
Follow-ups
cc @conor-93