Skip to content

JackMordaunt/odin-html

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html

This package offers a simple HTML parser, motivated by a desire to query the DOM and extract information from it.

The current parser is NOT spec compliant, and is not guaranteed to work on all HTML input. This may change.

usage

package main

import html "../"
import "core:fmt"

main :: proc() {
	doc := html.parse("<html><ul><li>one</li><li>two</li><li>three</li></ul></html>")
	defer html.document_delete(doc)

	iter := html.node_iterator_from_document(doc)

	for node in html.node_iterator_depth_first(&iter) {
		fmt.println(html.node_to_string(node))
	}
}

All strings on the Node are a slice into the original input string. The dynamic arrays for the attributes and children can be deleted with [html.document_delete].

roadmap

  • record parse errors
  • spec compliance
    • respect content model: eg special hadling for <script>, <pre>, etc
  • stream in source data with a reader
  • support unicode input instead of just ascii

potholes

Special tags like <script> are not handled specially. Such a tag is expected to have it's inner HTML be raw text. This version of the parser will parse script content as HTML.

About

HTML Parsing library in Odin.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages