Skip to content

Friendly and forgiving HTML5/XML5 parser that supports React JSX, and uses zero-copy techniques

License

Notifications You must be signed in to change notification settings

holloway/xml-zero.js

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

xml-zero.js

Friendly and forgiving HTML5/XML5 parser that supports React JSX, and uses zero-copy techniques to allow parsing large files efficiently.

Most markup parsers convert a string of markup into a nested map(hash/dict) of keys and values, with each of these allocated as separate variables in memory. This means that a 10MB XML file may balloon to 100MB of memory.

A different technique would be to retain the original string and generate an index of string offsets. Because these offsets are just numbers they can be packed more efficiently (a tutorial on zero-copy approaches).

This software is beta and it doesn't yet work

Features

  • Fault tolerant like HTML5/XML5.
    • Valueless-attributes like HTML5 / XML5 eg <input multiple type=file>
    • Attribute values may be quoted (E.g. <tag "some key"=false/> ) or not
    • React JSX attributes and in text (not executed of course, but they're parsed as distinct node types).
    • Multiple root nodes. Doesn't care about well-formedness. GIGO.
  • Minimising memory use through Zero-Copy techniques.
  • Tiny, no dependencies, and can run in Web Workers (e.g. doesn't use DOM APIs).
  • Safer by removing SGML cruft.
    No support for external DTD resolution, or nested entity expansion. Only default entities in XML, NCRs, and HTML5 named entities are supported.
  • Lots of tests.

Out of scope

  • Complete W3C DOM (at least for now) although we will follow their API naming conventions where reasonable.
  • HTML5 implied tags (e.g. won't automatically create tags such as <html>, <head>, <tbody>, ...etc).

Install

npm install xml-zero-lexer

npm install xml-zero-beautify

npm install whats-the-damage

(more packages to come, but i'm making it modular)

Progress

  • Lexer (2.6KB no dependencies, minified and gzipped)
  • Beautifier (4KB all dependencies, minified and gzipped)
  • What's The Damage benchmarker that measures time/memory/CPU of scripts
  • A W3C DOM-like API
  • Editable XML (by way of making new strings and leaving the original untouched, so it's still immutable)

References

About

Friendly and forgiving HTML5/XML5 parser that supports React JSX, and uses zero-copy techniques

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published