How to handle empty lines in md #8

maelle · 2018-09-04T14:03:11Z

It's full of empty lines due to knitr rendering it from Rmd I guess. On GitHub it renders well. But when I try to parse it I cannot get the structure that's in the .Rmd: the table is either separated in different blocks, or if I remove empty lines, it gets glued to the rest of the README.

rmd <- "https://raw.githubusercontent.com/ropensci/drake/master/README.Rmd"

md <- "https://raw.githubusercontent.com/ropensci/drake/master/README.md"


library("magrittr")
rmd %>%
  readLines() %>%
  commonmark::markdown_xml(extensions = TRUE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#>  [1] <thematic_break/>
#>  [2] <heading level="2">\n  <text>output:</text>\n  <softbreak/>\n  <tex ...
#>  [3] <html_block>&lt;!-- README.md is generated from README.Rmd. Please  ...
#>  [4] <code_block info="{r knitrsetup, echo = FALSE}">knitr::opts_chunk$s ...
#>  [5] <code_block info="{r mainexample, echo = FALSE}">suppressMessages(s ...
#>  [6] <html_block>&lt;center&gt;\n&lt;img src="https://ropensci.github.io ...
#>  [7] <html_block>&lt;table class="table"&gt;&lt;thead&gt;&lt;tr class="h ...
#>  [8] <heading level="1">\n  <text>The drake R package </text>\n  <html_i ...
#>  [9] <paragraph>\n  <code>drake</code>\n  <text> — or, Data Frames in R  ...
#> [10] <heading level="1">\n  <text>What gets done stays done.</text>\n</h ...
#> [11] <paragraph>\n  <text>Too many data science projects follow a </text ...
#> [12] <list type="ordered" start="1" delim="period" tight="true">\n  <ite ...
#> [13] <paragraph>\n  <text>It is hard to avoid restarting from scratch.</ ...
#> [14] <html_block>&lt;center&gt;\n&lt;a href="https://twitter.com/fossilo ...
#> [15] <paragraph>\n  <text>With </text>\n  <code>drake</code>\n  <text>,  ...
#> [16] <list type="ordered" start="1" delim="period" tight="true">\n  <ite ...
#> [17] <heading level="1">\n  <text>How it works</text>\n</heading>
#> [18] <paragraph>\n  <text>To set up a project, load your packages,</text ...
#> [19] <code_block info="{r mainpackages}">library(drake)\nlibrary(dplyr)\ ...
#> [20] <paragraph>\n  <text>load your custom functions,</text>\n</paragraph>
#> ...

md %>%
  readLines() %>%
  commonmark::markdown_xml(extensions = FALSE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#>  [1] <html_block>&lt;!-- README.md is generated from README.Rmd. Please  ...
#>  [2] <html_block>&lt;center&gt;\n</html_block>
#>  [3] <html_block>&lt;img src="https://ropensci.github.io/drake/images/in ...
#>  [4] <html_block>&lt;/center&gt;\n</html_block>
#>  [5] <html_block>&lt;table class="table"&gt;\n</html_block>
#>  [6] <html_block>&lt;thead&gt;\n</html_block>
#>  [7] <html_block>&lt;tr class="header"&gt;\n</html_block>
#>  [8] <html_block>&lt;th align="left"&gt;\n</html_block>
#>  [9] <paragraph>\n  <text>Release</text>\n</paragraph>
#> [10] <html_block>&lt;/th&gt;\n</html_block>
#> [11] <html_block>&lt;th align="left"&gt;\n</html_block>
#> [12] <paragraph>\n  <text>Usage</text>\n</paragraph>
#> [13] <html_block>&lt;/th&gt;\n</html_block>
#> [14] <html_block>&lt;th align="left"&gt;\n</html_block>
#> [15] <paragraph>\n  <text>Development</text>\n</paragraph>
#> [16] <html_block>&lt;/th&gt;\n</html_block>
#> [17] <html_block>&lt;/tr&gt;\n</html_block>
#> [18] <html_block>&lt;/thead&gt;\n</html_block>
#> [19] <html_block>&lt;tbody&gt;\n</html_block>
#> [20] <html_block>&lt;tr class="odd"&gt;\n</html_block>
#> ...

md %>%
  readLines() %>%
  .[. != ""] %>%
  commonmark::markdown_xml(extensions = FALSE) %>%
  xml2::read_xml()
#> {xml_document}
#> <document xmlns="http://commonmark.org/xml/1.0">
#> [1] <html_block>&lt;!-- README.md is generated from README.Rmd. Please e ...
#> [2] <html_block>&lt;center&gt;\n&lt;img src="https://ropensci.github.io/ ...

Created on 2018-09-04 by the reprex package (v0.2.0).

maelle · 2018-09-04T14:06:34Z

For context, I'm trying to parse READMEs that GitHub considers to be the preferred README https://developer.github.com/v3/repos/contents/#get-the-readme and anyway I must be missing something, surely if GitHub can render this table there is a way for me to correctly parse the Markdown file. 🤔

maelle · 2018-09-05T08:47:29Z

possibly related commonmark/commonmark-spec#490

maelle · 2018-09-05T09:13:16Z

For my very specific use case I'll use regex to extract the html of the 1st table but it seems suboptimal of course!

maelle mentioned this issue Sep 4, 2018

Parsing actual READMEs with html tables ropensci/codemetar#183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle empty lines in md #8

How to handle empty lines in md #8

maelle commented Sep 4, 2018 •

edited

Loading

maelle commented Sep 4, 2018

maelle commented Sep 5, 2018

maelle commented Sep 5, 2018

How to handle empty lines in md #8

How to handle empty lines in md #8

Comments

maelle commented Sep 4, 2018 • edited Loading

maelle commented Sep 4, 2018

maelle commented Sep 5, 2018

maelle commented Sep 5, 2018

maelle commented Sep 4, 2018 •

edited

Loading