Select Extension priority #3590

calculuschild · 2025-01-15T18:37:31Z

Describe the feature
Custom extensions always take priority over default tokens. A way to specify where in the parsing sequence an extension should run so certain default tokens don't get overridden.

Why is this feature necessary?
Sometimes there are ambiguous cases in Markdown where the result depends on which token gets parsed first. Example, tables without a starting pipe:

<div> header
|:---:|
cell

This is not be parsed as a table, because HTML is parsed first. However if someone writes a custom table extension for example, suddenly tables take priority and this is rendered as a table. The user would want the extension to execute before the default table tokenizer but after the HTML tokenizer.

Describe alternatives you've considered
It is possible to work around this by, in the extension tokenizer, pre-parsing the string and checking if any higher-priority tokens appear. However this adds extra steps and results in the same line potentially being parsed multiple times.

I'm picturing something like breaking the lexer.blockTokens() and lexer.inlineTokens() functions into an array of tokenizers and calling each one in sequence, and a user making an extension would somehow be able to inject their extension into this array (added to the front by default), though I imagine calling each tokenizer from an array instead would cause some slowdown.

In fact I think that was the original approach to the Extensions feature when I was making it, some kind of list of tokenizers, but the slowdown was too much. But now that it's been out for a while I am starting to run up against this limitation more and more.

The text was updated successfully, but these errors were encountered:

UziTech · 2025-01-16T20:57:52Z

We could try doing something like that now that most of the lexer's work has been moved to the tokenizers. I would still want to make sure it isn't slowed down for anyone not using extensions.

One other way this could be handled now is by using the provideLexer hook so we could create a different lexer that could be used that has more functionality but is slower.

UziTech · 2025-01-18T19:39:09Z

#3594 is one way I can see this being done. Initial checks seem to not slow it down to much but I think it can be improved a lot.

UziTech added the proposal label Jan 16, 2025

UziTech linked a pull request Jan 18, 2025 that will close this issue

feat: tokenizer extension position #3594

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select Extension priority #3590

Select Extension priority #3590

calculuschild commented Jan 15, 2025 •

edited

Loading

UziTech commented Jan 16, 2025

UziTech commented Jan 18, 2025

Select Extension priority #3590

Select Extension priority #3590

Comments

calculuschild commented Jan 15, 2025 • edited Loading

UziTech commented Jan 16, 2025

UziTech commented Jan 18, 2025

calculuschild commented Jan 15, 2025 •

edited

Loading