Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of documentation. #110

Open
ruipgil opened this issue Jul 22, 2014 · 17 comments
Open

Lack of documentation. #110

ruipgil opened this issue Jul 22, 2014 · 17 comments

Comments

@ruipgil
Copy link

ruipgil commented Jul 22, 2014

There's a lack of documentation, even the example of the README is outdated or isn't explained correctly.

@aredridel
Copy link
Owner

Yes indeed! Needs some TLC.

Any aspects you want to see first?

@ruipgil
Copy link
Author

ruipgil commented Jul 22, 2014

Since the project is used widely used with JSDOM, an annotated (JSDOM) example should always be up to date.
Also, you could use tests as examples, to make sure everything works fine. More of an usage test, than an unit test. And with this kind of tests you'd only need to redirect people to the source code of the example.

@eGavr
Copy link

eGavr commented Aug 6, 2014

Can you give a real work example of using your tool in nodejs without jQuery?

@aredridel
Copy link
Owner

The v1.0.1 README now has an example.

@eGavr
Copy link

eGavr commented Aug 9, 2014

Thank you, but it seems, that it is not simple example. Why is ti so difficult? Lots of code for such a simple example...

Can I do something like this:

var parser = require('parse5');
var html = '<p>blah</p>';

console.log(parser.parse(html));

and after console.log receive the full DOM tree?

@danyaPostfactum
Copy link
Collaborator

HTML5 does not contain any DOM implementation. So, you have to provide it.
If you just need DOM tree:

var HTML5 = require('html5');
var jsdom = require('jsdom');

var DOMImplementation = jsdom.level(3).DOMImplementation;
var parser = new HTML5.DOMParser(new DOMImplementation());

var document = parser.parse('<p>I am a very small HTML document</p>');

console.log(document.getElementsByTagName("p")[0].textContent);

Also, take a look at SAXParser:

var HTML5 = require('html5');

var parser = new HTML5.SAXParser();

parser.contentHandler = {
    startDocument: function() {},
    endDocument: function() {},
    startElement: function(uri, localName, qName, atts) {
        console.log('Start of <' + localName + '> element');
    },
    endElement: function(uri, localName, qName) {
        console.log('End of <' + localName + '> element');
    },
    characters: function(ch, start, length) {
        console.log('Characters: ' + ch);
    }
};

parser.parse('<p>I am a very small HTML document</p>');
Start of <html> element
Start of <head> element
End of <head> element
Start of <body> element
Start of <p> element
Characters: I am a very small HTML document
End of <p> element
End of <body> element
End of <html> element

@eGavr
Copy link

eGavr commented Aug 9, 2014

Great! I think that SAXParser is that what I need!

BUT!

<p>I am a very small HTML document</p>

Where is the html, head element in the input etc?

Can I receive the info exactly about the input?

@danyaPostfactum
Copy link
Collaborator

Where is the html, head element in the input etc?

Parser creates all these elements according to HTML spec (browsers do the same).
You can use fragment parsing algorithm:

parser.parseFragment('<p>I am a very small HTML document</p>', 'body');

Fragment parsing was broken. I fixed it right now, so you need to pull latest change (still not sure i fixed the bug properly).

Can I receive the info exactly about the input?

No, you receive repaired, well-formed output. This parser may create, forbid, reparent elements etc according to the HTML5 parsing specification.

@eGavr
Copy link

eGavr commented Aug 9, 2014

var HTML5 = require('html5');

var parser = new HTML5.SAXParser();

parser.contentHandler = {
    startDocument: function() {console.log('!!!!')},
    endDocument: function() {console.log('????')},
    startElement: function(uri, localName, qName, atts) {
        console.log("qNAme == ", qName)
        console.log(atts)
        console.log('Start of <' + localName + '> element');
    },
    endElement: function(uri, localName, qName) {
        console.log('End of <' + localName + '> element');
    },
    characters: function(ch, start, length) {
        console.log('Characters: ' + ch);
    }
};

parser.parseFragment('<p>I am a very small HTML document</p>', 'body');

This code doesn't work! I'm sorry ) Probably there is a silly mistake I haven't noticed! Can you help me?

@eGavr
Copy link

eGavr commented Aug 9, 2014

Can you give the information about contentHandler?
Now I know these ones startDocument, endDocument, startElement, endElement, characters!

Are there anything else?

@danyaPostfactum
Copy link
Collaborator

This code works. You should pull this fix: 4ff67be
This is not available via npm.

Are there anything else?

No. There is a lexicalHandler, that can handle comments, doctype, cdata sections. But this feature is not implemented yet (but it is very easy to do).

@eGavr
Copy link

eGavr commented Aug 9, 2014

Are you going to do this?)
And can you say an approximate date of the release with these changes?

I mean, it would be great if you could combine contentHandler and lexicalHandler into one Handler!

This way, everybody will be able to create the DOM tree of HTML code in manner as they want!

@aredridel
Copy link
Owner

Start an issue for 'em -- this one's about docs! -- and we'll go from there.

@aredridel
Copy link
Owner

And that fix is shipped in v1.0.3

@danyaPostfactum
Copy link
Collaborator

Lexical handler now can be defined:

parser.lexicalHandler = {
    comment: function(data) {
        console.log('Comment: ' + data);
    },
    startDTD: function(name, publicIdentifier, systemIdentifier) {
        console.log('Doctype: ' + name);
    },
    endDTD: function() {}
};

contentHandler is required, while lexicalHandler is optional.

http://www.saxproject.org/apidoc/org/xml/sax/ContentHandler.html
http://www.saxproject.org/apidoc/org/xml/sax/ext/LexicalHandler.html

everybody will be able to create the DOM tree of HTML code in manner as they want!

Right. With SAXParser they are able.

@eGavr
Copy link

eGavr commented Aug 10, 2014

Is it in v1.0.3?

@aredridel
Copy link
Owner

Yep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants