Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate XPaths for children of BODY with id, class, or role attributes #1797

Draft
wants to merge 8 commits into
base: trunk
Choose a base branch
from

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 13, 2025

This is a follow-up to #1790 which fixed #1787.

It is now possible for a generated XPath may not uniquely identify an element in a page. Consider the following example:

  1. A theme uses a DIV for header and footer, <div id="header" role="banner"> and <div id="footer" role="contentinfo">, which are direct children of the BODY.
  2. There is an element of the same tag name in the exact same relative position in both of these DIV:
<html>
  <body>
    <div id="header" role="banner">
      <img src="header-image.jpg">
    </div>
    <div id="footer" role="contentinfo">
      <img src="footer-logo.png">
    </div>
  </body>
</html>

In this case, both of the IMG tags here will have the same XPath: /HTML/BODY/DIV/*[1][self::IMG].

I did a search for <div[^>]*role=["']?banner in WPDirectory and the only example I could find with a header and footer being direct children of BODY is the Prose theme (2,000 active installs and not updated in 12 years): https://themes.trac.wordpress.org/browser/prose/1.1/header.php

Granted, this search assumes that a theme is using the right ARIA roles. It wouldn't account for someone using <div id="header">.

As noted in #1790 (comment):

we could instead look at other ways to disambiguate children of the BODY, such as to capture the id, class, and role attributes to include with the XPath for that level alone.

This is experimented on in this PR, by adding the id, class, and role attributes to the XPaths generated for children of the BODY element. So in the example above, instead of both elements having the same XPath of /HTML/BODY/DIV/*[1][self::IMG] they would then be distinct:

  1. /HTML/BODY/DIV[@id='header'][@role='banner']/*[1][self::IMG]
  2. /HTML/BODY/DIV[@id='footer'][@role='contentinfo']/*[1][self::IMG]

The key changes in this PR are found in plugins/optimization-detective/class-od-html-tag-processor.php. All the other changes are adjustments to the tests.

@westonruter westonruter added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) labels Jan 13, 2025
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 65.85366% with 14 lines in your changes missing coverage. Please review.

Project coverage is 57.39%. Comparing base (6ca5c4b) to head (385d3bf).

Files with missing lines Patch % Lines
...mization-detective/class-od-html-tag-processor.php 65.85% 14 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            trunk    #1797   +/-   ##
=======================================
  Coverage   57.38%   57.39%           
=======================================
  Files          84       84           
  Lines        6517     6549   +32     
=======================================
+ Hits         3740     3759   +19     
- Misses       2777     2790   +13     
Flag Coverage Δ
multisite 57.39% <65.85%> (+<0.01%) ⬆️
single 34.34% <0.00%> (-0.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@westonruter
Copy link
Member Author

westonruter commented Jan 13, 2025

Oh, I just discovered that the Image block with a lightbox is involved here. With the changes in this PR applied, WordPress outputs a div.wp-lightbox-overlay.zoom element at wp_footer via block_core_image_print_lightbox_overlay():

<div
        class="wp-lightbox-overlay zoom"
        data-wp-interactive="core/image"
        data-wp-context='{}'
        data-wp-bind--role="state.roleAttribute"
        data-wp-bind--aria-label="state.currentImage.ariaLabel"
        data-wp-bind--aria-modal="state.ariaModal"
        data-wp-class--active="state.overlayEnabled"
        data-wp-class--show-closing-animation="state.showClosingAnimation"
        data-wp-watch="callbacks.setOverlayFocus"
        data-wp-on--keydown="actions.handleKeydown"
        data-wp-on-async--touchstart="actions.handleTouchStart"
        data-wp-on--touchmove="actions.handleTouchMove"
        data-wp-on-async--touchend="actions.handleTouchEnd"
        data-wp-on-async--click="actions.hideLightbox"
        data-wp-on-async-window--resize="callbacks.setOverlayStyles"
        data-wp-on-async-window--scroll="actions.handleScroll"
        tabindex="-1"
>
    <button type="button" aria-label="Close" style="fill: var(--wp--preset--color--contrast)" class="close-button">
        <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" width="20" height="20" aria-hidden="true" focusable="false"><path d="m13.06 12 6.47-6.47-1.06-1.06L12 10.94 5.53 4.47 4.47 5.53 10.94 12l-6.47 6.47 1.06 1.06L12 13.06l6.47 6.47 1.06-1.06L13.06 12Z"></path></svg>
    </button>
    <div class="lightbox-image-container">
        <figure data-wp-bind--class="state.currentImage.figureClassNames" data-wp-bind--style="state.figureStyles">
            <img class="od-missing-dimensions" data-od-added-class data-od-xpath="/HTML/BODY/DIV[@class=&#039;wp-lightbox-overlay zoom&#039;]/*[2][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]" data-wp-bind--alt="state.currentImage.alt" data-wp-bind--class="state.currentImage.imgClassNames" data-wp-bind--style="state.imgStyles" data-wp-bind--src="state.currentImage.currentSrc">
        </figure>
    </div>
    <div class="lightbox-image-container">
        <figure data-wp-bind--class="state.currentImage.figureClassNames" data-wp-bind--style="state.figureStyles">
            <img class="od-missing-dimensions" data-od-added-class data-od-xpath="/HTML/BODY/DIV[@class=&#039;wp-lightbox-overlay zoom&#039;]/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]" data-wp-bind--alt="state.currentImage.alt" data-wp-bind--class="state.currentImage.imgClassNames" data-wp-bind--style="state.imgStyles" data-wp-bind--src="state.enlargedSrc">
        </figure>
    </div>
    <div class="scrim" style="background-color: var(--wp--preset--color--base)" aria-hidden="true"></div>
    <style data-wp-text="state.overlayStyles"></style>
</div>

Note the XPath for the IMG in the overlay:

/HTML/BODY/DIV[@class='wp-lightbox-overlay zoom']/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

Without the disambiguation, this would be:

/HTML/BODY/DIV/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

This could be confused with an Image block appearing somewhere especially when DIV is used to wrap the header, main content, footer, or sidebar.

Important

If this change is made to add the disambiguation predicates to /HTML/BODY/DIV this will result in invalidating all existing XPaths for currently stored URL Metrics.

To ease this migration, we may want to add an od_is_xpath_equal( $xpath1, $xpath2 ) helper function which would check to see if a provided XPath contains the old node index up to the children of the BODY. If so, the comparison can fall back to stripping out both the node indices as well as any attribute predicates. The logic added to the OD_Element constructor to convert the XPath could be moved to this new helper function:

// Convert old XPath scheme.
$xpath = preg_replace(
'#^/\*\[1\]\[self::HTML\]/\*\[2\]\[self::BODY\]/\*\[\d+\]\[self::([a-zA-Z0-9:_-]+)\]#',
'/HTML/BODY/$1',
$this->data['xpath']
);
if ( is_string( $xpath ) && '' !== $xpath ) {
$this->data['xpath'] = $xpath;
}

Then anywhere that we compare XPaths we should use this helper function as opposed to using ===.

Update: This is not so straightforward. For example, \OD_URL_Metric_Group_Collection::is_element_positioned_in_any_initial_viewport() is provided an XPath as the argument. The XPath is then fetched from a hash lookup of the mappings returned by \OD_URL_Metric_Group_Collection::get_all_elements_positioned_in_any_initial_viewport().

This is also the case for code that uses \OD_URL_Metric_Group_Collection::get_xpath_elements_map() or \OD_URL_Metric_Group_Collection::get_all_element_max_intersection_ratios(), and so on.

Up until now, the XPath has been very constrained in its format, so it has been a reliably fixed string for comparisons. Changing the XPath construction has impacts that need to be carefully evaluated.

@felixarntz
Copy link
Member

@westonruter I'm not sure I fully understand.

Oh, I just discovered that the Image block with a lightbox is involved here. With the changes in this PR applied, WordPress outputs a div.wp-lightbox-overlay.zoom element at wp_footer via block_core_image_print_lightbox_overlay():

Are you saying this is only inserted with this change? Why that? OD shouldn't alter which DOM elements are inserted by any features right?

Note the XPath for the IMG in the overlay:

/HTML/BODY/DIV[@class='wp-lightbox-overlay zoom']/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

Without the disambiguation, this would be:

/HTML/BODY/DIV/*[3][self::DIV]/*[1][self::FIGURE]/*[1][self::IMG]

This could be confused with an Image block appearing somewhere especially when DIV is used to wrap the header, main content, footer, or sidebar.

So are you saying without this PR merged, there would be a risk of a duplicate? If so, it would be a good indicator that the risk we assessed as small on the other PR isn't actually that small and we should cater for it.

Important

If this change is made to add the disambiguation predicates to /HTML/BODY/DIV this will result in invalidating all existing XPaths for currently stored URL Metrics.

Why that? Would it invalidate even URL metrics that have an index (how they're currently stored, without the recent trunk changes)? As far as I understand, the URL metrics with an index don't have that problem, so I'd argue the disambiguation only applies to the URL metrics without indices on the body children.

@westonruter
Copy link
Member Author

Are you saying this is only inserted with this change? Why that? OD shouldn't alter which DOM elements are inserted by any features right?

No, I'm saying that this was already being added to the page by Gutenberg. I wasn't aware that Gutenberg was outputting lightbox elements at wp_footer, so it breaks the premise I had that all content is inside of DIV.wp-site-blocks or DIV#page.

So are you saying without this PR merged, there would be a risk of a duplicate? If so, it would be a good indicator that the risk we assessed as small on the other PR isn't actually that small and we should cater for it.

Correct.

Why that? Would it invalidate even URL metrics that have an index (how they're currently stored, without the recent trunk changes)? As far as I understand, the URL metrics with an index don't have that problem, so I'd argue the disambiguation only applies to the URL metrics without indices on the body children.

What I mean is that URL Metrics currently stored use the node index, for example /*[1][self::HTML]/*[2][self::BODY]/*[3][self::DIV]/*[2][self::IMG]. Prior to this PR, with #1790, such old XPaths were migrated to the new schema via OD_Element so this would instead be /HTML/BODY/DIV/*[2][self::IMG]. However, if we start introducing element attributes to disambiguate /HTML/BODY/DIV then we won't be able to so easily migrate since we won't know what the attributes are on /HTML/BODY/DIV. Like, we wouldn't know how to convert /*[1][self::HTML]/*[2][self::BODY]/*[3][self::DIV] to /HTML/BODY/DIV[@id='page']. That is, unless we capture the attributes of this element as we iterate over the document and then use them just in time.

@felixarntz
Copy link
Member

@westonruter Makes sense. So maybe we need to leave old data intact for now then and only migrate when we have all the context we need (i.e. in a request to the relevant URL). Since #1790 is not in any release yet, at least we don't have to worry about migrating from that. But we probably should fix it before this upcoming release.

This complicates things of course, but given we already found an exception in Core itself, this is probably going to happen more. Any plugin may add some div in wp_footer, e.g. for modals etc.

@westonruter
Copy link
Member Author

Makes sense. So maybe we need to leave old data intact for now then and only migrate when we have all the context we need (i.e. in a request to the relevant URL). Since #1790 is not in any release yet, at least we don't have to worry about migrating from that. But we probably should fix it before this upcoming release.

@felixarntz Well, I mean, "migration" will happen eventually as URL Metrics turn stale and get purged from the site. It would be ideal if a change to the XPath structure didn't result in this, however. It would be as if someone installed the plugin anew without any data collected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature
Projects
None yet
2 participants