Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handler #111

Open
eGavr opened this issue Aug 9, 2014 · 15 comments
Open

Improve handler #111

eGavr opened this issue Aug 9, 2014 · 15 comments

Comments

@eGavr
Copy link

eGavr commented Aug 9, 2014

Besides, what about this situation:

<tag></tag>

and

<tag/>

?

It seems, the contentHandler parses them just in the same way! Yes, they are identical for a browser, but in the point of view of parsing they are not identical, are they?

@aredridel
Copy link
Owner

They are -- the HTML5 parser only concerns itself with parsing to construct a DOM.

@eGavr
Copy link
Author

eGavr commented Aug 10, 2014

Are you going to fix this situation?

@aredridel
Copy link
Owner

Does it need to be fixed? What's the use-case?

@eGavr
Copy link
Author

eGavr commented Aug 10, 2014

Yes!
For example, when I want to transform my DOM tree back to html!

In the cases I've shown above, SAXparser parses them in the same way!
It is a little bit unfair on the hand of your SAXparser.

@danyaPostfactum
Copy link
Collaborator

SAXParser notifies about element start and element end, not about start tag and end tag. That's all.

Yes, they are identical for a browser, but in the point of view of parsing they are not identical

If you really need low-level parsing info, you can use Tokenizer.

For example, when I want to transform my DOM tree back to html!

There is a limited set of VOID elements, so it is easy to serialize. http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#serialising-html-fragments

@danyaPostfactum
Copy link
Collaborator

Example of producing HTML from SAX events:
https://gist.github.com/danyaPostfactum/ee94c3bf88b99fb94c4b
Example:

var SAXParser = require('html5').SAXParser;
var HtmlSerializer = require('./HtmlSerializer').HtmlSerializer;

var outStream = require('fs').createWriteStream("out.html");

var parser = new SAXParser();
var serializer = new HtmlSerializer(outStream);

parser.contentHandler = parser.lexicalHandler = serializer;

parser.parse('...');

@eGavr
Copy link
Author

eGavr commented Aug 11, 2014

But how can I understand whether the tag is self closing?

@danyaPostfactum
Copy link
Collaborator

Just check it's name matches one of area, base, basefont, bgsound, br, col, embed, frame, hr, img, input, keygen, link, menuitem, meta, param, source, track or wbr element.

@eGavr
Copy link
Author

eGavr commented Aug 11, 2014

But if someone is so bad person and want to parse an invalid input?

<bra/>?, for example?

@eGavr
Copy link
Author

eGavr commented Aug 11, 2014

Thank you for the list of self closing text!

@danyaPostfactum
Copy link
Collaborator

<bra/>, for example?

According to spec, it will be interpreted as <bra>. You can check this in your browser.

Thank you for the list of self closing text!

See http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#serialising-html-fragments

@eGavr
Copy link
Author

eGavr commented Aug 11, 2014

But I can try to parse this situation :

<br></br>

It will be for browser - <br>,but what will I receive after serialization?

In two cases I will receive the same, but two inputs were not the same.

Maybe it is necessary to add a parameter into one of your contentHandler's method, it will be true if the tag is self closing?

@danyaPostfactum
Copy link
Collaborator

Parser will ignore </br> tag.

In two cases I will receive the same, but two inputs were not the same.

Yes, invalid markup will be repaired. I already said about it. Even valid input markup may not match serialized output. Could you explain how do you want to use parser? Probably you need another tool.

@eGavr
Copy link
Author

eGavr commented Aug 11, 2014

For example, I want to check the validity of input or as in my case I want to compare to HTML!
For me it is necessary to check the HTMLs as they are!

@danyaPostfactum
Copy link
Collaborator

For example, I want to check the validity of input

This parser is used in http://ace.c9.io/build/kitchen-sink.html (select HTML mode) for syntax checking.

parser.errorHandler = {
    error: function(message, location, code) {
        // Parse error
    }
};

For me it is necessary to check the HTMLs as they are!

Not sure what do you mean. I guess you have to write your own parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants