Skip to content
Laci Szakács edited this page Feb 22, 2015 · 3 revisions

Turning R objects into Pandoc's markdown

Summary: Extending the pander package with new methods to render R objects easily in markdown. Refactor pandoc.table.

Description: pander contains helper functions and a generic S3 method to render various types of R objects into markdown format that can be converted further to HTML, PDF, docx, odt and other document formats based on pandoc. The package can be used as a standalone tool for literate programming building on the traditions of brew, or can be also used inside of e.g. knitr or other markdown-based tools to render nice tables and other textual forms. To see the currently supported classes, run:

> methods(pander)
 [1] pander.anova*           pander.aov*             pander.aovlist*        
 [4] pander.call*            pander.cast_df*         pander.character*      
 [7] pander.clogit*          pander.coxph*           pander.CrossTable*     
[10] pander.data.frame*      pander.Date*            pander.default*        
[13] pander.density*         pander.describe*        pander.evals*          
[16] pander.factor*          pander.formula*         pander.ftable*         
[19] pander.function*        pander.glm*             pander.htest*          
[22] pander.image*           pander.list*            pander.lm*             
[25] pander.lme*             pander.logical*         pander.matrix*         
[28] pander.microbenchmark*  pander.mtable*          pander.NULL*           
[31] pander.numeric*         pander.option           pander.POSIXct*        
[34] pander.POSIXlt*         pander.prcomp*          pander.rapport*        
[37] pander.return           pander.rlm*             pander.sessionInfo*    
[40] pander.smooth.spline*   pander.stat.table*      pander.summary.aov*    
[43] pander.summary.aovlist* pander.summary.glm*     pander.summary.lm*     
[46] pander.summary.prcomp*  pander.survdiff*        pander.survfit*        
[49] pander.table*           pander.ts*              pander.zoo*            

   Non-visible functions are asterisked

In short, pander is like summary, but returning markdown. It can already deal with a bunch of R objects, but defaults to list for "unknown" classes. It would be great if the general pander method could support more and more R classes with elegant markdown tables, as currently much of hacks are required to build nicely formatted tables of e.g. models inside of knitr documents.

Similar packages like xtable provides an alternative solution by already supporting more R classes, but e.g. xtable can only return HTML or LaTeX syntax, which is out of scope in general markdown documents to be converted to various file formats. Another similar solution can be found bundled with knitr, but kable is a very simple table generator by design, so pander can be a great add-on. In most cases, implementing the pander method of an R class is simply writing a wrapper around summary or checking the examples of xtable, stargazer or broom, but some classes require special treatment (like CrossTable) due to the limitations of markdown. For example Pandoc's markdown does not support row and column-spanning.

pander already supports bunch of options to globally fine-tune the appearance of tables and even plots. One can e.g. specify the max. width of cells or the table, so that pander would automatically split wide cells/tables into multi-line texts if supported by the chosen markdown table style, and pander can also return vectors in rather elegant way by adding comma between the elements with "and" in front of the last element etc. Other unique features are highlighting some parts of the table, adding significance stars to those or that all base/graphics, lattice and gplot2 plots can be fine-tuned by global options with extremely similar look and feel, so the users can define the general style of the plot and then create charts with any popular plotting directory. Please find more details on GH.

Another way to improve the usage of pander would be to improve error handling, as sometime (especially with nested R code chunks) it is really hard to debug a literate programming document without a deep understanding of pander internals. Logging chunk-based events would be also highly desirable (processing time, resulted object, if cache was used or not) in debugging. This latter is partially done in the object returned by Pandoc.brew and some related features are already implemented in the log branch.

Markdown is increasingly popular in the R community, so this general "R to markdown" tool could help bunch of useRs to produce elegant and platform independent textual reports.

Potential tasks in a nutshell:

  • create new pander methods for not yet supported R classes
  • implement new global options for tables and plots
  • refactor pandoc.table to handle the variety of R objects (named vectors, tables, 2D tables, 3D crosstables and ftable objects) by transforming those to a standard format first instead of the currently active continuous checks and workarounds
  • refactor Pandoc.brew (forked from brew)
  • improve error handling and logging facilities

Skills required: literate programming experiences, so decent markdown and R experience is needed. In more details:

  • Pandoc's markdown syntax and the pandoc command line,
  • previous experience with brew or pander packages,
  • at least a basic git knowledge (e.g. branching) and experience with GitHub.

Test: Fork the package on GH and create a pull-request implementing a method for tables::tabular. Quick and dirty solution:

> library(tables)
> pander(as.matrix(tabular(as.factor(am) ~ (mpg+hp+qsec) * (mean+median), data = mtcars)))

------------- ----- ------ ----- ------ ----- ------
               mpg          hp          qsec        

as.factor(am) mean  median mean  median mean  median

      0       17.15  17.3  160.3  175   18.18 17.82 

      1       24.39  22.8  126.8  109   17.36 17.02 
------------- ----- ------ ----- ------ ----- ------

Please note that pander.tabular should work with any number of variables with even complex table layouts.

Mentor: Gergely Daróczi ([@](mailto:daroczig {at} rapporter {dot} net)) and László Szakács ([@](mailto:cocinerox {at} gmail {dot} com)) as backup mentor

Disclaimer: this proposal was partially covered in 2013 and 2014. Looking forward to work with such talented student again this year.

Clone this wiki locally