Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select Extension priority #3590

Open
calculuschild opened this issue Jan 15, 2025 · 2 comments · May be fixed by #3594
Open

Select Extension priority #3590

calculuschild opened this issue Jan 15, 2025 · 2 comments · May be fixed by #3594
Labels

Comments

@calculuschild
Copy link
Contributor

calculuschild commented Jan 15, 2025

Describe the feature
Custom extensions always take priority over default tokens. A way to specify where in the parsing sequence an extension should run so certain default tokens don't get overridden.

Why is this feature necessary?
Sometimes there are ambiguous cases in Markdown where the result depends on which token gets parsed first. Example, tables without a starting pipe:

<div> header
|:---:|
cell

This is not be parsed as a table, because HTML is parsed first. However if someone writes a custom table extension for example, suddenly tables take priority and this is rendered as a table. The user would want the extension to execute before the default table tokenizer but after the HTML tokenizer.

Describe alternatives you've considered
It is possible to work around this by, in the extension tokenizer, pre-parsing the string and checking if any higher-priority tokens appear. However this adds extra steps and results in the same line potentially being parsed multiple times.

I'm picturing something like breaking the lexer.blockTokens() and lexer.inlineTokens() functions into an array of tokenizers and calling each one in sequence, and a user making an extension would somehow be able to inject their extension into this array (added to the front by default), though I imagine calling each tokenizer from an array instead would cause some slowdown.

In fact I think that was the original approach to the Extensions feature when I was making it, some kind of list of tokenizers, but the slowdown was too much. But now that it's been out for a while I am starting to run up against this limitation more and more.

@UziTech
Copy link
Member

UziTech commented Jan 16, 2025

We could try doing something like that now that most of the lexer's work has been moved to the tokenizers. I would still want to make sure it isn't slowed down for anyone not using extensions.

One other way this could be handled now is by using the provideLexer hook so we could create a different lexer that could be used that has more functionality but is slower.

@UziTech UziTech linked a pull request Jan 18, 2025 that will close this issue
8 tasks
@UziTech
Copy link
Member

UziTech commented Jan 18, 2025

#3594 is one way I can see this being done. Initial checks seem to not slow it down to much but I think it can be improved a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants