You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, I have been experimenting with harper lately, for example if you pass markdown content the content is parsed by a parser and converted into AST, do we need such parsing?
I'm really not sure if this is more efficient than doing a full syntax tree and then getting the word position based on that. Just sharing an idea as I thought this simplify a lot of the code.
The text was updated successfully, but these errors were encountered:
Harper's parsing infrastructure is admittedly poorly documented at the moment, so I'll try to explain it enough to answer your question here. Expect a proper guide on it in the future.
So, I have been experimenting with harper lately, for example if you pass markdown content the content is parsed by a parser and converted into AST, do we need such parsing?
To directly answer your question: yes, and it takes negligible time. The Markdown library we use is really fast (I think it actually might be the fastest CommonMark implementation out there), so it consumes a trivial percentage of our execution time, while significantly improving Harper's internal document model.
Your implementation, while interesting, is not spec compliant, and recompiling and running so many regex expressions every time is quite slow. I intend to properly support MDX in the future, but in the meantime you can probably get significantly better results by using your same Regex stripping inside the Markdown parser (whose code you can find here). A cheap solution would involve making a copy of that file and pasting your stripping inside.
If you would like to parse MDX properly (which would give Harper the best internal document model and therefore significantly better linting) you just have to implement the Parser trait, which can be done by wrapping another existing parser, including one generated by Treesitter.
P.S. I'm so glad you're using Harper for your project. I'm honored. We've got significant JS API improvements on the way, so stay tuned!
So, I have been experimenting with harper lately, for example if you pass markdown content the content is parsed by a parser and converted into AST, do we need such parsing?
Alternatively, I wrote a function clean the markup and replace it with spaces, and then run its as plain text, here is my version https://github.com/websiddu/harper/blob/master/harper-wasm/src/lib.rs#L21
This implementation is currently live on https://stubby.io/
I'm really not sure if this is more efficient than doing a full syntax tree and then getting the word position based on that. Just sharing an idea as I thought this simplify a lot of the code.
The text was updated successfully, but these errors were encountered: