Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: make cmark functions callable #28

Open
dmurdoch opened this issue Dec 12, 2023 · 11 comments · May be fixed by #29
Open

Feature request: make cmark functions callable #28

dmurdoch opened this issue Dec 12, 2023 · 11 comments · May be fixed by #29

Comments

@dmurdoch
Copy link

I'd like to insert a filter into the parsing process: parse the markdown input, execute my filter, render to output format. I think this would be relatively straightforward if I was writing the filter in C.

Doing this would require that the cmark header files be made available, and entry points be registered using R_RegisterCCallable, and probably some more things I don't know about.

@jeroen
Copy link
Member

jeroen commented Dec 12, 2023

About where in the commonmark C code would the filter have to be invoked?

@dmurdoch
Copy link
Author

If it was being called from package code, it would go here: https://github.com/dmurdoch/commonmark/blob/58d435515ea2dc1da08abe186e27b4d461d639f6/src/wrapper.c#L88, just after parsing, before rendering. When I wrote the request, I was thinking my code would duplicate most of R_render_markdown, but it would be even better if that function had another argument to provide a filter, and called it if one was provided.

I don't think an R function would work here; any filter would need access to functions like cmark_iter_new, cmark_iter_get_node, etc.

@dmurdoch
Copy link
Author

dmurdoch commented Dec 12, 2023

I've just committed a first version of a small demo to my fork of commonmark on branch filter: install_github("dmurdoch/commonmark@filter"). This doesn't register any of the cmark functions so it wouldn't be usable from another package, but this code uses it with code added to the package on that branch:

library(commonmark)
md <- readLines("https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md")[1:10]

tex <- markdown_latex(md, filter=c("commonmark", "latex_sourcepos"))

cat(tex)

Currently the filter just prints the TEXT nodes, but I'll be adding in sourcepos info.

@dmurdoch
Copy link
Author

I've updated the filter to insert macros indicating the source location. The script from the last comment now prints something like this:

\section{\sourcepos{1:3}CHANGES IN knitr VERSION 1.46}

\subsection{\sourcepos{3:4}NEW FEATURES}

\begin{itemize}
\item \sourcepos{5:3}Added a new chunk option \texttt{tab.cap} 
...

The feature request is that commonmark should allow me to put something like this in my own package.

@jeroen
Copy link
Member

jeroen commented Dec 13, 2023

If it is really just about supporting sourcepos for latex, maybe your solution is a bit overly complex.

I tried something simpler based on onjecting latex comments after each newline: 2f032ad

Can you test if this is something that could be workable?

install_github("jeroen/commonmark@latex_sourcepos")
library(commonmark)
md <- readLines("https://raw.githubusercontent.com/yihui/knitr/master/NEWS.md")[1:10]
markdown_latex(md, sourcepos = TRUE)

@dmurdoch
Copy link
Author

I'll give it a try. At first glance, it looks as though that would solve my immediate problem.

An advantage of the approach I proposed is that it could be used for other problems, similar to the way filters are used in Pandoc to allow simple Commonmark extensions to be implemented. For example, when I wrote RmdConcord the hope was that it would replace pdf_document, which uses Pandoc markdown. I didn't try to implement all of the extensions in Pandoc markdown, but I did add one that allows \pagebreak or \newpage to be interpreted as page breaks. This page: https://pandoc.org/lua-filters.html gives some other examples.

@jeroen
Copy link
Member

jeroen commented Dec 13, 2023

I see that it is powerful but I'm a little worried about the can of worms that we open if we want to support callable C filters.
Given that this is a widely used package and I have less maintenance time than I used to, I would prefer to keep things simple is possible.

@dmurdoch
Copy link
Author

Sure, that makes sense. I haven't had a chance to check your earlier sourcepos patch yet, but I should get to that today.

@dmurdoch
Copy link
Author

That patch works very well for me. I was worried it could affect the LaTeX (adding the space, commenting out the newline) but it is very easy for me to remove the comments after processing them, so LaTeX will be fine.

@jeroen
Copy link
Member

jeroen commented Dec 14, 2023

OK great. If you want we can still change the format to have more spaces or different braces, I just through a comment would be the easiest way to add metadata without affecting the content.

Also does the sourcepos comment appear everywhere in the document where you need it? I think it is inserted before every blank line now, but we could try to inject it in more places if needed.

@jeroen jeroen linked a pull request Dec 14, 2023 that will close this issue
@dmurdoch
Copy link
Author

The format is fine. For my purposes (forward and reverse linking from an Rmd file to a .pdf file) all I'm using is the starting line number. If it's off by a bit it doesn't really matter.

Adding it at the end of every line would be helpful if people write really long paragraphs, or cases where knitr has inserted or removed a lot of lines.

As I said, my code removes the comment before sending the result to LaTeX, so it shouldn't cause parsing issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants