Skip to content

yne/html2json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convert any XML/HTML to JsonML using yxml

BUILD

make html2json

USAGE

cat test/basic.html | ./html2json | jq .[1].lang
"en"
# send json to a frontend (example: GTK)
curl https://news.ycombinator.com/rss | ./html2json | ./json2gtk

FORMAT

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <title>Basic Example</title>
    <link rel="stylesheet" />
  </head>
  <body id="home">
    <input type="text"/>
    <p>content</p>
  </body>
</html>
// doctype is ommited
["html",{"lang":"en"},[
    ["head", {}, [
        ["meta", {"charset": "utf-8"} ],
        ["title", {}, ["Basic Example"] ],
        ["link", {"rel": "stylesheet"} ]
    ]],
    ["body", {"id": "home"}, [
        ["input", {"type": "text"}],
        ["p", {}, ["content"]]
    ]]
]]

HTML5 support (WIP)

yxml was added XHTML and HTML5 using:

  • migrate yxml_ret_t to bitfield enum so multiple state can be returned (example : parsing > in <p hidden> will return ATTREND|ELEMSTART)
  • accept lowercase <!doctype
  • read <script>, <style> content as raw data until matching closing tag id found
  • accept unquoted attribute value <form method=GET>
  • accept value-less attribute <p hidden id=p>
  • handle void elements as self-closed (<img> will internaly generate <img></img>), so alwo ignore end-tag of void elements (ex: </img>)