Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas around pandoc #11

Open
maelle opened this issue Oct 4, 2018 · 8 comments
Open

Ideas around pandoc #11

maelle opened this issue Oct 4, 2018 · 8 comments
Labels
enhancement ✨ New feature or request

Comments

@maelle
Copy link
Member

maelle commented Oct 4, 2018

Ideas from @baptiste on Twitter

"just wondering though – have you considered embedding this step as part of a pandoc filter toolchain, as an alternative to the pandocfilters package? Would allow processing the AST in R with the full power of xml2 etc., but the xml_md return step would not be necessary.

and in fact, i believe that with such a toolchain an alternative to knitr would process chunks once they've been parsed into xml, and update the AST with the results. (i suggested this a while back, but as Yihui said this wasn't possible with pandoc back when knitr started)

... one advantage being a more robust handling of inline code, which is currently extracted by regexs in knitr. Having the full structured AST before running code chunks also allows greater flexibility for pre- and post-processing with custom markup, etc.

this knitr alternative may help with the elusive un-knit function to merge changes done to the output: since chunks and inline code are tagged as such in the input AST, they can be filtered out when diff-ing the output AST and its commented version containing the tracked changes."

@maelle
Copy link
Member Author

maelle commented Oct 4, 2018

R package pandocfilters https://cran.r-project.org/web/packages/pandocfilters/index.html 👀

@maelle
Copy link
Member Author

maelle commented Oct 4, 2018

@baptiste I'm not sure I understand how one would go from XML to md? Via pandoc?

Are you interested in helping write a minimal working example?

@maelle maelle added the enhancement ✨ New feature or request label Oct 4, 2018
@noamross
Copy link

noamross commented Oct 4, 2018

Pandoc represents its AST in internal structures, which can be manipulated via Haskell or Lua. It makes the tree available to other programs as JSON, so to do this you'd either want to convert the JSON to an R list (as the R package does), convert it to XML, or work with it via jq or some other JS processor.

Looks like there's a Haskell example here: https://github.com/cdupont/R-pandoc

@maelle
Copy link
Member Author

maelle commented Oct 4, 2018

so you wouldn't convert the (R)md to XML first?

@noamross
Copy link

noamross commented Oct 4, 2018

It comes down to a couple of things: First, if you want pandoc extensions in you markdown, and second, whether Rmd markup, which has some stuff that isn't exactly markdown, survives the conversion. It seems that Rmd chunk headers and inline code survives with the header and initial r just prepended to the code block when using pandoc, not sure about cmark. After that it's a matter of what format is the most amenable to modifying - JSON, an R List, or XML. XML via xpath is really powerful, but you might prefer the others.

@baptiste
Copy link

baptiste commented Oct 4, 2018

@maelle i'm keen, but broke my right arm last weekend so typing is a bit of a struggle

@baptiste
Copy link

baptiste commented Oct 4, 2018

I think a first step would be to make a minimally-interesting dummy Rmd example, and run it through

  • knitr
  • cmark
  • pandoc

to have specific ASTs to inspect in the form of R list, json, xml, to fully compare their features.

The next step would be to mimic the knitting step by isolating from the input AST those code bits that need to be run (lots of details to consider here, but knitr has it well figured out).

Last step is merging the output produced with the AST.
From there I think pandoc is the most natural tool, as it allows many output formats.

The idea of merging "track-changes" made to an output manuscript would be a variation on this, where in merging changes to the AST one would also look at a diff of the text nodes.

@maelle
Copy link
Member Author

maelle commented Oct 10, 2018

@baptiste I am very sorry that you broke your right arm 😱

I haven't had a chance to look at this yet but hope to do it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants