Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax <select> parser #10557

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

josepharhar
Copy link
Contributor

@josepharhar josepharhar commented Aug 13, 2024

This patch makes the parser allow additional tags in <select> besides <option>, <optgroup>, and <hr>, mostly by removing the "in select" and "in select in table" parser modes.

In order to replicate the behavior where opening a <select> tag within another open <select> tag inserts a </select> close tag, a traversal through the stack of open elements was added which I borrowed from the <button> part of the parser.

This patch also changes the processing model to make <select> look through all its descendants in the DOM tree for <option> elements, rather than just children and optgroup children which conform to the content model. This is needed for compat reasons because there are websites which put other tags in between their <select> and <option>s which would break with this parser change unless we also update this processing model. More context here and here.

Fixes #10310

(See WHATWG Working Mode: Changes for more details.)


/form-elements.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/parsing.html ( diff )

@annevk
Copy link
Member

annevk commented Aug 14, 2024

Why does this also make a bunch of changes to the processing model of select elements? If that needs to be part of this change that should really be better motivated in the commit message.

@josepharhar
Copy link
Contributor Author

Why does this also make a bunch of changes to the processing model of select elements? If that needs to be part of this change that should really be better motivated in the commit message.

The changes in the processing model do support the new proposed content model for select, but they also mitigate compat issues for websites which are already putting tags in between <select> and <option> in their HTML.

For example, without these changes to the processing model, the following <select> would not register as having any options at all:

<select>
  <div>
    <option>...</option>
    ...
  </div>
</select>

In my compat analysis, I found a lot of websites which are doing this, so in order to ship the parser changes separately from customizable select in chrome, I plan to ship the parser changes and these processing model changes together, because otherwise there would be too much breakage.

I'm happy to put them in a separate PR if you want, or keep them here and update the commit message (sorry for not putting it in there). Which would you prefer?

@annevk
Copy link
Member

annevk commented Aug 16, 2024

Thanks! Given that rationale I think it's good to couple the changes, but that should be in the commit message as well.

@zcorpan
Copy link
Member

zcorpan commented Aug 19, 2024

This doesn't define optional tags for </option> and </optgroup> correctly.

The definition for "have a particular element in select scope" may be needed for that, but should be changed to be similar to "have a particular element in button scope" (but for select).

In particular, allow these without a parse error:

<select>
<optgroup>
<option>
<optgroup>
</select>

The second <optgroup> should pop the option and optgroup

<select>
<option><p>
<option>
</select>

This should generate implied end tags and pop the option.

<select>
<optgroup>
<hr>
<option>
<hr>
</select>

The hrs should pop the optgroup and option in a select.

See how the parser deals with <ruby><rtc><rt>, I think that can be used as a model for select.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Show resolved Hide resolved
source Show resolved Hide resolved
@josepharhar
Copy link
Contributor Author

Thanks! Given that rationale I think it's good to couple the changes, but that should be in the commit message as well.

Done.

This doesn't define optional tags for </option> and </optgroup> correctly.

@zcorpan thanks for the feedback! I did some experimenting and added some logic to the start tags for option, optgroup, and hr. What do you think?

@zcorpan
Copy link
Member

zcorpan commented Aug 22, 2024

  • Need to check that a select element is in scope so that parsing of option/optgroup tags outside of select doesn't change. Example <option><p>1</option>2<option>3
  • Need to add select to the "in scope" list so that <div><select></div><option> doesn't close the select.
  • Should report parse errors when unexpected elements are popped. Compare with what the spec does for rt/rtc in ruby. Example <select><option><span>1</option>2<option>3

@josepharhar
Copy link
Contributor Author

  • Need to check that a select element is in scope so that parsing of option/optgroup tags outside of select doesn't change. Example <option><p>1</option>2<option>3

I thought that this is covered by "While the stack of open elements has an option element in select scope". What exactly should I change?

  • Need to add select to the "in scope" list so that <div><select></div><option> doesn't close the select.

Done

  • Should report parse errors when unexpected elements are popped. Compare with what the spec does for rt/rtc in ruby. Example <select><option><span>1</option>2<option>3

I'm guessing this is from "If the current node is not now a rtc element or a ruby element, this is a parse error," right?

Should I add "If the current node is not now an option element, this is a parse error" after "While the stack of open elements has an option element in select scope, pop an element from the stack of open elements"?

@zcorpan
Copy link
Member

zcorpan commented Aug 23, 2024

The "in select scope" I think should be removed altogether since it assumes the stack will not have other elements when in a select, which is no longer the case. Use the normal "in scope" instead.

High-level of what I think should happen: when parsing option or optgroup start tag: check that select is in scope, check that option or optgroup is in scope, generate implied end tags (except for optgroup when handling <option>), check the current node or the stack of open elements again, report a parse error if appropriate, then pop elements off the stack until the option or optgroup has been popped, then insert the new element.

I can look into this more next week and suggest more specific changes.

@josepharhar
Copy link
Contributor Author

High-level of what I think should happen: when parsing option or optgroup start tag: check that select is in scope, check that option or optgroup is in scope, generate implied end tags (except for optgroup when handling <option>), check the current node or the stack of open elements again, report a parse error if appropriate, then pop elements off the stack until the option or optgroup has been popped, then insert the new element.

Thanks! I gave this a try

@josepharhar
Copy link
Contributor Author

@zcorpan how does the latest text look?

Copy link
Member

@zcorpan zcorpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the option/optgroup cases are right after these changes.

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
source Outdated
Comment on lines 129474 to 129485
<ol>
<li><p><span>Generate implied end tags</span> except for <code>option</code> and
<code>optgroup</code> elements.</p></li>

<li><p>If the <span>current node</span> is not now an <code>option</code> or
<code>optgroup</code> element, this is a <span>parse error</span>.</p></li>

<li><p>While the <span>stack of open elements</span> <span data-x="has an element in scope">has
an <code>option</code> element in scope</span> or <span data-x="has an element in scope">has an
<code>optgroup</code> element in scope</span>, pop an element from the <span>stack of open
elements</span>.</p></li>
</ol>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should report a parse error if there's an element between the optgroup and option in the stack:

<select>
 <optgroup>
  <div>
   <option>
 <optgroup>

This can be done by popping the option element if that is the current node (after generating implied end tags except for option and optgroup), then generate implied end tags except for optgroup elements, parse error if the current node is not an optgroup element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. how does it look?

source Outdated Show resolved Hide resolved
@josepharhar
Copy link
Contributor Author

This doesn't define optional tags for </option> and </optgroup> correctly.

I was talking to @mfreed7 about the changes we've made in the PR so far, and I feel like I couldn't provide a good explanation for why we are making the parser not support cases like these:

  • <hr> inside an <option>
  • nested <optgroup>s

Is it just compat reasons? Is there a good justification?

This patch makes the parser allow additional tags in <select> besides
<option>, <optgroup>, and <hr>, mostly by removing the "in select" and
"in select in table" parser modes.

In order to replicate the behavior where opening a <select> tag within
another open <select> tag inserts a </select> close tag, a traversal
through the stack of open elements was added which I borrowed from the
<button> part of the parser.

This will need test changes to be implemented in html5lib.

Fixes whatwg#10310
@zcorpan
Copy link
Member

zcorpan commented Sep 11, 2024

It would be a breaking change from what is conforming HTML today, and break compat for sites that omit </option> and </optgroup> tags. It's the same as why we can't allow certain elements in p.

@josepharhar
Copy link
Contributor Author

Generating all implied end tags with no exceptions in the optgroup case disallows nested optgroups, but I guess I'm ok with that since we can make additional changes to allow it later.

I'll apply the changes in here so far to the chromium implementation and start working on html5lib tests so we can get this merged.

@zcorpan
Copy link
Member

zcorpan commented Sep 18, 2024

Great.

hr no longer closing option and optgroup is a behavior change from what the parser currently does, which I'm somewhat uncomfortable with. It makes currently valid HTML invalid. @hsivonen @annevk WDYT?

@annevk
Copy link
Member

annevk commented Sep 19, 2024

I'm not a big fan of breaking conforming HTML. Especially as there doesn't appear to be a compelling reason from reading the thread.

I recommend not using "grandfathering" by the way, it has a rather bad history: https://en.wikipedia.org/wiki/Grandfather_clause

Copy link
Member

@hsivonen hsivonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should retain existing behavior for hr both for imaginable Web compat and for the principle of avoiding breaking existing conforming HTML.

I'm marking this as Request changes for that reason, but I admit that I'm not confident from looking at the spec diff alone that I can understand all other implications of this spec change, so there may well be other issues.

@josepharhar
Copy link
Contributor Author

Ok, I pushed a change to disallow hr in option. How does it look now?

@josepharhar
Copy link
Contributor Author

@zcorpan @hsivonen are there any other changes you think should be made?

@josepharhar
Copy link
Contributor Author

I just re-added some cases to option and optgroup parsing to continue supporting auto-closing tags outside of a select tag to support cases like this, which I found in a test case here:

const select = document.createElement('select');
select.innerHTML = '<option>one<option>two';

source Outdated Show resolved Hide resolved
source Outdated Show resolved Hide resolved
@@ -129438,6 +129444,8 @@ document.body.appendChild(text);

<li><p>If the <span>current node</span> is an <code>optgroup</code> element, then pop that node
from the <span>stack of open elements</span>.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "Otherwise" steps should only pop an option element if it's the current node to match the current spec.

     <li><p>If the <span>current node</span> is an <code>option</code> element, then pop the
     <span>current node</span> off the <span>stack of open elements</span>.</p></li>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I just realized that I messed up this part and copied it from the end tag steps instead of the start tag steps, as well as the formatting step for the option start tag. I just rewrote this part here, how does it look? 33370b7

Copy link
Member

@dbaron dbaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments I just noticed:

source Show resolved Hide resolved
source Show resolved Hide resolved
Comment on lines +129305 to +129309
<li><p><span>Generate implied end tags</span> except for <code>optgroup</code>
elements.</p></li>

<li><p>If the <span>stack of open elements</span> <span data-x="has an element in scope">has an
<code>option</code> element in scope</span>, then this is a <span>parse error</span>.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hr" should be changed to match what "optgroup" does (when select is in scope)

If the stack of open elements has a select element in scope , then:

Generate implied end tags .

If the stack of open elements has an option element in scope or has an optgroup element in scope , then this is a parse error .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...to more closely match the current spec. However, if there's a desire to change this for #9247 then the current text (which is like "option") is better. cc @annevk

@@ -129413,8 +129413,6 @@ document.body.appendChild(text);
<ol>
<li><p>If the <span>current node</span> is an <code>option</code> element, then pop the
<span>current node</span> off the <span>stack of open elements</span>.</p></li>

<li><p><span>Reconstruct the active formatting elements</span>, if any.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I re-added it to the "otherwise" case.

Do you think we should also run this in the not-"otherwise" case when we are in a select tag?

If we parse this html:

<select><div><i></div><option>option

We will get this resulting innerHTML if we reconstruct in both cases:

<select><div><i></i></div><i><option>option</option></i></select><i>

Otherwise if we only reconstruct in the "otherwise" case, we get this:

<select><div><i></i></div><option><i>option</i></option></select><i>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the same thing in both cases for consistency sounds somewhat appealing to me

source Outdated Show resolved Hide resolved
source Outdated

<li><p><span>Reconstruct the active formatting elements</span>, if any.</p></li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-added, but leaving this open to make sure we match the same behavior for option and optgroup based on the outcome of the other thread

Comment on lines +54153 to +54154
<p>For each <var>ancestor</var> of <var>insertedOption</var>'s <span
data-x="ancestor">ancestors</span>:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to define ordering here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be good to clarify. I've seen "in tree order" appended to loops like this before, but I'm not sure that's appropriate here - I want to start at the parent of insertedOption, and then keep going up from there. I'm worried that saying "in tree order" could imply the opposite.

Should I say "starting from the parent of insertedOption"?

josepharhar added a commit to josepharhar/html5lib-tests that referenced this pull request Oct 11, 2024
This PR updates the tree-construction dat files for the HTML change
which will allow additional tags within <select>:
whatwg/html#10557
@josepharhar
Copy link
Contributor Author

I created an html5lib PR to update tests: html5lib/html5lib-tests#178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

HTML parser changes for customizable <select>
7 participants