Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle old forms of HTML for math/TeX #1004

Open
gnprice opened this issue Oct 18, 2024 · 0 comments
Open

Handle old forms of HTML for math/TeX #1004

gnprice opened this issue Oct 18, 2024 · 0 comments
Labels
a-content Parsing and rendering Zulip HTML content, notably message contents
Milestone

Comments

@gnprice
Copy link
Member

gnprice commented Oct 18, 2024

When a Zulip message's content contains math expressed in TeX / LaTeX, we parse that far enough to show something useful; that was #359.

It looks like our parsing doesn't currently handle the way that Zulip arranged these HTML elements a few years ago, though. The #917 / #190 survey of content features not yet implemented in this app found several kinds of KaTeX-related elements that we don't currently parse; the latest example was this message from 2022-03-30, just after the server-5 release.

The relevant summary output from the #917 script looks like:

  • <span class="katex-display">
    Oldest message: 202662; newest message: 1355972 (19 messages)
  • <p>
    Oldest message: 176412; newest message: 908053 (65 messages)
  • <span class="katex">
    Oldest message: 308073; newest message: 426840 (3 messages)

(That <p> description doesn't look so specific; but all 10 of its most recent examples begin with <p><span class="katex-display">.)

The full list of affected message IDs in the public history of chat.zulip.org is:

  • <span class="katex-display">: 1355972, 839155, 615277, 615279, 615288, 615289, 615290, 615291, 615292, 615293, 615294, 615295, 615299, 615300, 615301, 615302, 215775, 223277, 202662
  • <p>: 908053, 875669, 848652, 852955, 838329, 841233, 841235, 844939, 815780, 817208, 798513, 798549, 796253, 772484, 779661, 770472, 750006, 750010, 750011, 750013, 742408, 713902, 667886, 659094, 660230, 649081, 636927, 632189, 615276, 615283, 615285, 605460, 606931, 564826, 564827, 560121, 564176, 564188, 564225, 515347, 510803, 503162, 472443, 437329, 396045, 397486, 397487, 393929, 355267, 338001, 306411, 308530, 293709, 296076, 258091, 258619, 222489, 224107, 204863, 205218, 194241, 182555, 176412, 176418, 176421
  • <span class="katex">: 426840, 308073, 308405

To resolve this issue, we'll want to look at those messages and identify how they differ from what we currently parse, and then adjust the parsing to cover them.

The main part of the work in fixing this will be verifying that the fix is correct. That means

  • visiting several of the example messages, including both the oldest and newest from each of the three groups;
  • posting screenshots of what the app shows for each of those messages;
  • adding test cases based on those messages (but simplified down as much as possible) to exercise all the new logic;
  • rerunning the script from Systematically survey message content for unimplemented features. #917, and verifying that none of these three groups appear anymore in the output.

This is a low-priority issue because I believe current versions of the Zulip server don't generate these elements for new messages, and haven't done so for a couple of years.

Related issues

When we eventually handle #46, we'll want it to cover these old-style messages too. But in the meantime we just want them to be covered by the UI we added in #359.

@gnprice gnprice added the a-content Parsing and rendering Zulip HTML content, notably message contents label Oct 18, 2024
@gnprice gnprice added this to the Post-launch milestone Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-content Parsing and rendering Zulip HTML content, notably message contents
Projects
Status: No status
Development

No branches or pull requests

1 participant