Skip to content

Improve forward compatibility with a notion of minimal modifier set #51

@MDLC01

Description

@MDLC01
Collaborator

Motivation

Currently, when selecting a variant from a set of modifiers, the first variant from the list that contains all the modifiers, and a minimal amount of additional modifiers, is chosen.12 This means using non-fully qualified names when referring to a symbol might cause breakage when Codex is updated. For example, consider the following symbol:

arrow
  .l ←
  .r →
  .r.bar ↦

arrow.bar resolves to arrow.r.bar, which is ↦. Now, suppose a new version of Codex changes the symbol to the following:

arrow
  .l ←
  .l.bar ↤
  .r →
  .r.bar ↦

Now, arrow.bar will resolve to arrow.l.bar, which is ↤.

Essentially, this means adding new variants in the middle of the variant list can cause unexpected breakage. As of now, there is no policy regarding what constitute a breaking change when it comes to fallbacks.

Proposed solution

The intuitive idea of this solution is to make explicit what parts of the fully qualified form (i.e., which modifiers) can be omitted.

As before, each variant has a set of modifiers. Hereafter, we refer to this set of modifiers as the fully qualified form, denoted M full . Additionally, a variant can define a minimal modifier set, denoted M min , which is a subset of the fully qualified form (i.e., M min M full ). When selecting a variant for a set of modifiers M , the same process as before is applied, with the additional constraint that the selected variant's minimal modifier set is included in M (i.e., M min M ).

The current behavior corresponds to having M min = for all variants. With this proposal, the default would become M min = M full , with lots of manual overrides to allow for backward compatibility and more leniency.

This improves forward-compatibility by explicitly specifying which fallbacks can be relied on, and which can't. Ideally, there could even be an automated way of detecting breakages. This is currently not feasible, because most breakages are not implicitly guaranteed to be future proof.3

Other benefits

As well as improving forward compatibility, this proposal can be the source of documentation improvements. Indeed, the current documentation45 only presents fully qualified names. For some common symbols, this can be problematic. For example, sym.errorbar.square.stroked can be accessed through simply sym.errorbar, but the documentation does not reflect that. With this proposal, the documentation can present M min as the main symbol name and M full as the fully qualified name, when M min M max .

As mentioned previously, this proposal would make it possible to detect breakages automatically, because it clarifies which non-fully qualified variants are legal and clearly defined, and which are not.

Footnotes

  1. This is actually not defined in Codex, but in Typst. Making the variant selection part of Codex is the topic of Resolve modifiers #30.

  2. https://github.com/typst/typst/blob/d199546f9fe92b2d380dc337298fdca3e6fca8c8/crates/typst-library/src/foundations/symbol.rs#L387-L420

  3. For example, sym.angle.top currently resolves to sym.angle.spheric.top (⦡), but this is more a side effect of the fact that there is no bare sym.angle.top symbol than a conscious decision, and shouldn't be relied upon.

  4. https://typst.app/docs/reference/symbols/sym/

  5. https://typst.app/docs/reference/symbols/emoji/

Activity

added
metaDiscussion about the structure of this repo
proposalThis may still need discussion
on Feb 19, 2025
T0mstone

T0mstone commented on Feb 20, 2025

@T0mstone
Collaborator

Syntax idea: .modifier? for a non-required modifier.

knuesel

knuesel commented on Feb 21, 2025

@knuesel
Collaborator

I think there are two problems with the current behavior:

  1. The backward-compatibility issue described above

  2. Bad readability of source code: currently it's hard to interpret Typst code such as $ arrow.bar $ without executing the compiler. You basically have to run a whole algorithm in your head:

    1. Consider all variants of arrow that include bar
    2. Keep only those with the minimal number of other modifiers
    3. Pick the first one according to the order in which they are declared in Typst

The above proposal improves on problem 1 (by putting some restrictions on the valid ways of inputting a variant, we make breakage less likely), but the fundamental issue remains...

To really fix the issue, we should require that variants cannot have conflicting definitions: this means that a given set of modifiers cannot match two variants. For example (using @T0mstone's notation), if we have arrow.r?.bar, we could later add arrow.l.bar but not arrow.l?.bar.

Intuitive specification

The whole behavior can be specified intuitively with "aliases":

  1. When defining a variant, modifiers marked with ? are optional so the defining s.x?.y?.zcorresponds to four aliases: s.x.y.z, s.x.z, s.y.z and s.z.
  2. The order of modifiers doesn't matter so s.x.z is the same as s.z.x.
  3. Different variants cannot share an alias.

(These aliases are only used for resolving a variant. It's still a single variant, displayed as a single entry on the symbol page, but the entry would show s.x?.y?.z to document which modifiers can be omitted.)

I think this solves both problems:

  1. backward-compatibility: users can refer to variants only through valid aliases, and when we define a new variant it cannot share an alias with an existing variant.

  2. Readability: if the code says s.x.y and I know a variant that matches this set of modifiers, I know it's the right one. No need to check what other variants exist in case there would be another match.

It also preserves nice properties: modifiers are commutative, users can "build" their symbol by trying modifiers, and they can leave out optional modifiers.

Formal specification

  1. A variant V specifies a set of required modifiers V req and a set of optional modifiers V opt . The set of all valid ways of referring to V is

R V = { M   :   V req M V req V opt }

  1. It is not allowed for two variants V 1 and V 2 to share a valid reference:

V 1 V 2 R V 1 R V 2 = .

To resolve a set of modifiers M , we take the first and only V such that M R V .

MDLC01

MDLC01 commented on Feb 21, 2025

@MDLC01
CollaboratorAuthor

As I originally wrote on Discord, from the user's perspective, your idea is really just a rephrasing of mine, with M min = V req , and M full = V req V opt . Nothing prevents us from adding your second constraint to my proposal. In fact, I think we should if we end up implementing it.

Moreover, I think this can be expressed in a simplified way to the user: each variant has required and optional modifiers, which makes it possible to allow using non-fully qualified names when it makes sense; however, no two variants can share the same set of required modifiers in order to prevent ambiguity. There might be some approximations, but this is what most users need to know understand the variant selection system.

In the end, I think what we are discussing here is essentially an implementation detail which would not be observable to the end user.1

Footnotes

  1. The only observable difference that was noted was in the symbol list, but in your proposal, aliases are "still a single variant, displayed as a single entry on the symbol page", so the end result would be the same.

knuesel

knuesel commented on Feb 21, 2025

@knuesel
Collaborator

Yes I'm just adding this constraint and proposing another formulation. But the constraint is a bit tricky: "no two variants can share the same set of required modifiers" is not sufficient. For example s.x?.y and s.x.y.z? have different sets of required modifiers, but still they should not be allowed together since they would both match s.x.y.

(I think the alias formulation expresses the constraint correctly and in a way that's more concrete for users, but it's just one possible formulation.)

MDLC01

MDLC01 commented on Feb 21, 2025

@MDLC01
CollaboratorAuthor

"no two variants can share the same set of required modifiers" is not sufficient.

This is what I meant by "There might be some approximations, but this is what most users need to know understand the variant selection system." Even if the phrasing is not complete (as in, correct, but missing some information), we just need the users to understand the general idea, and they can try the rest by themselves.

T0mstone

T0mstone commented on Jun 17, 2025

@T0mstone
Collaborator

The discussion in #89 brought my attention back to this. I think it's a very good idea, especially because of how it lets us enforce forward-compatibility. (Edit: Yes, it says this in the issue title. My bad lol, I didn't re-read that and only remembered the "minimal modifier set" part)

With #46 now merged, I think now would be a good time for someone (possibly me) to write a candidate PR to revive the discussion around this.

I'm not really sure whether the change would be breaking or not.
In the current system, every modifier is optional and the first of the shortest best matches is taken:

codex/src/shared.rs

Lines 183 to 229 in a5428cb

#[test]
fn best_match() {
// 1. more modifiers in common with self
assert_eq!(
ModifierSet::from_raw_dotted("a.b").best_match_in(
[
(ModifierSet::from_raw_dotted("a.c"), 1),
(ModifierSet::from_raw_dotted("a.b"), 2),
]
.into_iter()
),
Some(2)
);
// 2. fewer modifiers in general
assert_eq!(
ModifierSet::from_raw_dotted("a").best_match_in(
[
(ModifierSet::from_raw_dotted("a"), 1),
(ModifierSet::from_raw_dotted("a.b"), 2),
]
.into_iter()
),
Some(1)
);
// the first rule takes priority over the second
assert_eq!(
ModifierSet::from_raw_dotted("a.b").best_match_in(
[
(ModifierSet::from_raw_dotted("a"), 1),
(ModifierSet::from_raw_dotted("a.b"), 2),
]
.into_iter()
),
Some(2)
);
// among multiple best matches, the first one is returned
assert_eq!(
ModifierSet::default().best_match_in(
[
(ModifierSet::from_raw_dotted("a"), 1),
(ModifierSet::from_raw_dotted("b"), 2)
]
.into_iter()
),
Some(1)
);
}

Strictly speaking, this is entirely incompatible with the requirement of unique aliases1, so the change would be breaking, but maybe there is a way to choose the non-required modifiers in such a way that it exactly lines up with the current system? This is be something that a PR could iron out.

Footnotes

  1. As the author and main proponent of Symbol Aliases #27, I'd like us to choose a different word here; Maybe "abbreviation"?

MDLC01

MDLC01 commented on Jun 22, 2025

@MDLC01
CollaboratorAuthor

What do you mean by "unique aliases"?

T0mstone

T0mstone commented on Jun 23, 2025

@T0mstone
Collaborator

I mean @knuesel's last rule.

linked a pull request that will close this issue on Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    metaDiscussion about the structure of this repoproposalThis may still need discussion

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @knuesel@T0mstone@MDLC01

      Issue actions

        Improve forward compatibility with a notion of minimal modifier set · Issue #51 · typst/codex