Update inaccurate sizes query for 2024 #108

joemcgill · 2024-03-29T19:57:20Z

This is an update to the previous query created in 2022 to evaluate the impact of inaccurate image sizes attributes on WordPress sites using HTTPArchive data.

The main changes from the original query are:

Updates the query to use the new httparchive.all.pages table.
Reports percentages at every 10th percentile rather than only 10, 25, 50, 75, and 90.

See: https://github.com/GoogleChromeLabs/wpp-research/blob/main/sql/2022/12/inaccurate-sizes-attribute-impact.sql

Query results

percentile	client	sizesAbsoluteError	sizesRelativeError	idealSizesSelectedResourceEstimatedPixels	actualSizesEstimatedWastedLoadedPixels	relativeSizesEstimatedWastedLoadedPixels	idealSizesSelectedResourceEstimatedBytes	actualSizesEstimatedWastedLoadedBytes	relativeSizesEstimatedWastedLoadedBytes
10	desktop	0	0.00%	22500	0	0.00%	3217.294704	0	0.00%
20	desktop	0	0.00%	38700	0	0.00%	6348.266602	0	0.00%
30	desktop	30	2.56%	60000	0	0.00%	10174	0	0.00%
40	desktop	92	21.15%	81510	0	0.00%	15106.58125	0	0.00%
50	desktop	168	46.75%	90000	0	0.00%	22073.49726	0	0.00%
60	desktop	287	76.47%	150000	959	0.83%	33262	283.5713028	0.83%
70	desktop	403	117.39%	240000	76464	77.78%	51765	11312.8272	77.78%
80	desktop	600	190.91%	360000	258560	221.92%	84253	37755.31432	221.92%
90	desktop	934	344.44%	559872	633600	611.11%	166977.4965	114393.7302	611.11%
10	mobile	0	0.00%	32400	0	0.00%	5159.87395	0	0.00%
20	mobile	13	0.00%	62500	0	0.00%	10203.16195	0	0.00%
30	mobile	31	8.43%	90000	0	0.00%	16965.07937	0	0.00%
40	mobile	50	12.50%	147456	0	0.00%	26480.16	0	0.00%
50	mobile	72	19.21%	230400	0	0.00%	40172.8267	0	0.00%
60	mobile	136	29.50%	320000	0	0.00%	59071.69489	0	0.00%
70	mobile	192	66.67%	421888	27900	13.78%	85978.125	3595.061728	13.78%
80	mobile	244	115.22%	589824	227500	124.84%	130197.1576	26660.31658	124.84%
90	mobile	360	168.66%	786432	504000	440.83%	242668	77222.4375	440.83%

This is an update to the previous query created in 2022 to evaluate the impact of inaccurate image sizes attributes on WordPress sites using HTTPArchive data. The main changes from the original query are: - Updates the query to use the new `httparchive.all.pages` table. - Reports percentages at every 10th percentile rather than only 10, 25, 50, 75, and 90. See: https://github.com/GoogleChromeLabs/wpp-research/blob/main/sql/2022/12/inaccurate-sizes-attribute-impact.sql

joemcgill · 2024-03-29T19:59:59Z

@felixarntz I've taken a first pass at trying to update the previous query from #19 here. I assume that creating a new query in the 2024 folder is preferred to directly editing the previous query. I've not run this query directly, but have validated that it will run. I'm lookin forward to your feedback.

Also, the metrics that are being processed from the payload are coming from this custom-metric definition for responsive images: https://github.com/HTTPArchive/custom-metrics/blob/main/dist/responsive_images.js

felixarntz

@joemcgill I left some suggestions regarding the query. While I realize much of it is based on the original query from 2022, I think it's worth simplifying and optimizing the query to focus on the data we currently care about (which FWIW also makes the query cheaper and faster to execute).

At the moment it includes some data points that aren't really important for the optimization of the sizes attribute.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

joemcgill · 2024-04-02T01:40:28Z

@felixarntz I've updated the query in 932f5f1 to make use of the custom_metrics column (I have no way of verifying that the structure is right due to quota limits) and return only data columns that I think we'll find useful:

percentile
client
sizesAbsoluteError
sizesRelativeError
idealSizesSelectedResourceEstimatedPixels
actualSizesEstimatedWastedLoadedPixels
idealSizesSelectedResourceEstimatedBytes
actualSizesEstimatedWastedLoadedBytes

It may also be useful to get a count of pages contained in each group. What do you think?

felixarntz

A bit more technical feedback on the query.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

felixarntz · 2024-04-05T18:42:57Z

sql/2024/04/inaccurate-sizes-attribute-impact.sql

+      imgData.idealSizesSelectedResourceEstimatedPixels
+      imgData.actualSizesEstimatedWastedLoadedPixels,
+      imgData.idealSizesSelectedResourceEstimatedBytes
+      imgData.actualSizesEstimatedWastedLoadedBytes,


Replying to your feedback in #108 (comment), I do think we should also calculate the relative (%) values here, since aggregating those 4 fields individually per percentile doesn't carry a lot of meaningful information. The pixel and byte numbers are entirely dependent on the size of the respective images, and because we aggregate them individually per percentile at the end of the query, we don't get to see any relationship between them. Aggregating individually only tells us how many pixels/bytes are wasted between the smallest and largest pictures. I think relative values would be more helpful for this, because of course larger images lead to larger waste.

So I would suggest to return here:

sizesRelativeWastedLoadedPixels: actualSizesEstimatedWastedLoadedPixels / idealSizesSelectedResourceEstimatedPixels

sizesRelativeWastedLoadedBytes: actualSizesEstimatedWastedLoadedBytes / idealSizesSelectedResourceEstimatedBytes

And then get percentiles for those two.

FWIW, this is similar with sizesAbsoluteError and sizesRelativeError. The latter is probably more helpful in measuring eventual success, as absolute numbers are skewed by larger images and larger viewports. We may want to return all of the data (both absolute and relative), but doing that would serve different purposes. To me the relative data appears more useful.

I guess I was assuming each quantiles would be relative to each other, so we could easily calculate the % based on these numbers if we decided we wanted to. Perhaps I'm misunderstanding how each of these APPROX_QUANTILES values relate to each other.

I'm happy to add back relative values to the query, but am unsure how to best do so without being able to do some trial and error on the query itself. Would it be something like this?

CREATE TEMPORARY FUNCTION GET_IMG_SIZES_ACCURACY(custom_metrics STRING) RETURNS ARRAY<STRUCT<hasSrcset BOOL, hasSizes BOOL, sizesAbsoluteError INT64, sizesRelativeError FLOAT64, idealSizesSelectedResourceEstimatedPixels INT64, actualSizesEstimatedWastedLoadedPixels INT64, relativeSizesEstimatedWastedLoadedPixels FLOAT64, idealSizesSelectedResourceEstimatedBytes FLOAT64, actualSizesEstimatedWastedLoadedBytes FLOAT64, relativeSizesEstimatedWastedLoadedBytes FLOAT64>> AS ( ARRAY( SELECT AS STRUCT CAST(JSON_EXTRACT_SCALAR(image, '$.hasSrcset') AS BOOL) AS hasSrcset, CAST(JSON_EXTRACT_SCALAR(image, '$.hasSizes') AS BOOL) AS hasSizes, CAST(JSON_EXTRACT_SCALAR(image, '$.sizesAbsoluteError') AS INT64) AS sizesAbsoluteError, CAST(JSON_EXTRACT_SCALAR(image, '$.sizesRelativeError') AS FLOAT64) AS sizesRelativeError, CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedPixels') AS INT64) AS idealSizesSelectedResourceEstimatedPixels, CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedPixels') AS INT64) AS actualSizesEstimatedWastedLoadedPixels, SAFE_DIVIDE( CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedPixels') AS INT64), CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedPixels') AS INT64) ) AS relativeSizesEstimatedWastedLoadedPixels, CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedBytes') AS FLOAT64) AS idealSizesSelectedResourceEstimatedBytes, CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedBytes') AS FLOAT64) AS actualSizesEstimatedWastedLoadedBytes, SAFE_DIVIDE( CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedBytes') AS FLOAT64), CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedBytes') AS FLOAT64) ) AS relativeSizesEstimatedWastedLoadedBytes, FROM UNNEST(JSON_EXTRACT_ARRAY(custom_metrics, '$.responsive_images.responsive-images')) AS image ) );

To make ☝🏻 easier to test, I went ahead and pushed 1054716 as a first attempt at this.

This looks exactly right. Let me try to run the query to verify.

I guess I was assuming each quantiles would be relative to each other, so we could easily calculate the % based on these numbers if we decided we wanted to. Perhaps I'm misunderstanding how each of these APPROX_QUANTILES values relate to each other.

The quantiles are only based on the values, but not relatively. So if you have 10 values, with 8 values being 1 and 2 values being 10, almost all quantiles would show the value 1. Of the percentiles displayed here only the 90th percentile would not show 1 but 10.

So my feedback to include relative values here is not related to the quantiles, but about that the absolute sizes error is very much dependent on the size of the images. From an error perspective though, we shouldn't give larger images more impact in our data assessment.

For example, if for an ideal image of 2000px or 1MB an image of 4000px or 2MB is loaded, absolutely speaking, this is a lot worse than for an ideal image of 200px or 100kB loading an image of 400px or 200kB. But relatively, it's the same: The actually loaded image is 100% larger (or 200% as large) compared to the ideal image.

I see what you're saying. Showing relative waste gives us a different view of the effect that incorrect sizes attributes have. I'd anticipate that any additional constraint on max sizes values will likely have a greater impact on larger images, since those are the ones that are more likely to be shown when a smaller source is selected, but having both values will make it easier to confirm.

After further review, I have one point of feedback regarding:

SAFE_DIVIDE( CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedPixels') AS INT64), CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedPixels') AS INT64) ) AS relativeSizesEstimatedWastedLoadedPixels, ... SAFE_DIVIDE( CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedBytes') AS FLOAT64), CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedBytes') AS FLOAT64) ) AS relativeSizesEstimatedWastedLoadedBytes,

I'm unsure the order of this division is right. I don't necessarily have the answer myself, but I'm still trying to decipher what the two values respectively are. To look at some real data, I ran a the "sub query" part of this (i.e. without aggregating into percentiles) for a specific URL, in this case https://wordpress.org/. I've put the query results into this spreadsheet for us to look at. This basically has the data for every single image on that page.

What I see is that the actualSizes... value is many times 0. I guess that means the sizes attribute leads to the ideal srcset being selected? Maybe we should verify at the actual page for whether that makes sense based on our understanding.

In my understanding so far, the actualSizes... is the amount of pixels/bytes that is wasted, and the idealSizes... is the amount of pixels/bytes based on the ideal available srcset. If that's the case, it would mean the sum of the two values is the amount of pixels/bytes actually loaded. Based on that, to get the relevant wasted loaded pixels/bytes, we need to make the division the other way around (actualSizes... / idealSizes...). If the loaded image is a little too large, it would give us small percentages for example, while if the image is e.g. twice as large, it would give us 100%, or three times as large would give us 200%.

My understanding is that in both the cases of estimated wasted pixels and wasted bytes, idealSizesSelectedResource... represents the resource (from srcset) that would have been selected if the sizes value was accurate. Similarly, actualSizesEstimatedWasted... represents the difference between the ideal size and the size of the selected source. So the places where you're seeing 0 reported for actualSizesEstimatedWasted... means that even though the sizes attribute was inaccurate, there wasn't a better source available in the srcset.

That said, I think you're probably right that we would want to divide the estimated wasted bytes by the ideal selected to get a relative value of the wasted value.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

felixarntz · 2024-04-05T19:19:38Z

@joemcgill

It may also be useful to get a count of pages contained in each group.

Can you clarify how do you mean that? Do you mean the number of pages that fall into each percentile? IMO this would be difficult to intertwine with this query. FWIW we usually need multiple queries to answer all the questions we have. I think when it comes to measuring the opportunity in number of pages or WordPress sites, I think a separate query would be more helpful, maybe in a separate PR. We could write a query that groups the images by page and then gets e.g. the median wasted pixels or bytes, absolute or relative. And then overall get a distribution of that data, which would give us data like "x% of WordPress sites have a median sizes error of y% or worse".

felixarntz

@joemcgill One additional technical recommendation at a higher level: While we probably know JS better than BigQuery SQL, using JS in BigQuery is problematic for two reasons:

The queries are a lot slower as the BigQuery parsers and runners cannot optimize JS. It's like a black box to them.
For the same reason, error reporting is poor, which makes errors in the JS hard to deal with. For instance, BigQuery doesn't know whether the data types are correct, so failures are only found upon running the query, and you don't get a better message than that the JS failed, without any details.

We can replace the entire getImgSizesAccuracy(custom_metrics) function with native BigQuery SQL code to make the queries faster and improve DX, as any mistake you make would be reported in the BigQuery console (e.g. in the Google Cloud Project) as you're writing the query, not only when you run it.

I have rewritten the function, in the way it currently is, here. You can replace this 1:1 (only update the function name elsewhere, as I've made it follow the SQL best practice of the capitalized name):

CREATE TEMPORARY FUNCTION GET_IMG_SIZES_ACCURACY(custom_metrics STRING) RETURNS
  ARRAY<STRUCT<sizesAbsoluteError INT64,
  sizesRelativeError FLOAT64,
  idealSizesSelectedResourceEstimatedPixels INT64,
  actualSizesEstimatedWastedLoadedPixels INT64,
  idealSizesSelectedResourceEstimatedBytes FLOAT64,
  actualSizesEstimatedWastedLoadedBytes FLOAT64>>
AS (
  ARRAY(
    SELECT AS STRUCT
      CAST(JSON_EXTRACT_SCALAR(image, '$.sizesAbsoluteError') AS INT64) AS sizesAbsoluteError,
      CAST(JSON_EXTRACT_SCALAR(image, '$.sizesRelativeError') AS FLOAT64) AS sizesRelativeError,
      CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedPixels') AS INT64) AS idealSizesSelectedResourceEstimatedPixels,
      CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedPixels') AS INT64) AS actualSizesEstimatedWastedLoadedPixels,
      CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedBytes') AS FLOAT64) AS idealSizesSelectedResourceEstimatedBytes,
      CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedBytes') AS FLOAT64) AS actualSizesEstimatedWastedLoadedBytes
    FROM
      UNNEST(JSON_EXTRACT_ARRAY(custom_metrics, '$.responsive_images.responsive-images')) AS image
  )
);

I'd recommend that you do that, and afterwards incorporate any applicable suggestions from my previous review, which should be straightforward with this "template".

To explain the code a little:

JSON_EXTRACT_ARRAY() extracts JSON data into an array. Every single array entry is a JSON string itself, which is needed because BigQuery can't just "guess" the type of data in it.
We UNNEST() so that you effectively query the images almost like a table of image objects.
Since in our case we know the inner JSON of each item contains an object, we then call JSON_EXTRACT_SCALAR() on every field of the inner item that we need.
Last but not least we have to then cast them to the correct data type, as JSON_EXTRACT_SCALAR() returns everything as a string.

joemcgill · 2024-04-10T18:37:59Z

Thanks @felixarntz. I think I've addressed most of the feedback except for adding back relative values, which I've asked about here. It's definitely challenging to get this right without being able to run these queries to see how the data is returned. I appreciate your help with this.

joemcgill · 2024-04-10T18:42:20Z

One other question that I had with the updated SQL, is that the use of JSON_EXTRACT_SCALAR
and JSON_EXTRACT_ARRAY are both listed in the docs as legacy functions that are no longer recomended. Should we try to use JSON_VALUE and JSON_VALUE_ARRAY instead?

sql/2024/04/inaccurate-sizes-attribute-impact.sql

felixarntz · 2024-04-10T21:02:07Z

sql/2024/04/inaccurate-sizes-attribute-impact.sql

+      imgData.idealSizesSelectedResourceEstimatedPixels
+      imgData.actualSizesEstimatedWastedLoadedPixels,
+      imgData.idealSizesSelectedResourceEstimatedBytes
+      imgData.actualSizesEstimatedWastedLoadedBytes,


This looks exactly right. Let me try to run the query to verify.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

felixarntz · 2024-04-10T21:43:27Z

One other question that I had with the updated SQL, is that the use of JSON_EXTRACT_SCALAR and JSON_EXTRACT_ARRAY are both listed in the docs as legacy functions that are no longer recomended. Should we try to use JSON_VALUE and JSON_VALUE_ARRAY instead?

Potentially, though I don't personally feel strongly about that. I have to admit I haven't used the recommendations myself as I only found out a few days ago myself that the JSON_EXTRACT functions are no longer recommended. I'm not sure how straightforward the change would be (is it just a 1:1 replacement or does it work differently?), so maybe we stick with what I'm more familiar with for now? Once the query looks good to go in this way, we could try to update to the recommended functions and re-run to verify we get the same results.

Co-authored-by: Felix Arntz <[email protected]>

joemcgill

Thanks @felixarntz. I've replied to both of your comments inline.

joemcgill · 2024-04-10T22:07:19Z

sql/2024/04/inaccurate-sizes-attribute-impact.sql

+      imgData.idealSizesSelectedResourceEstimatedPixels
+      imgData.actualSizesEstimatedWastedLoadedPixels,
+      imgData.idealSizesSelectedResourceEstimatedBytes
+      imgData.actualSizesEstimatedWastedLoadedBytes,


My understanding is that in both the cases of estimated wasted pixels and wasted bytes, idealSizesSelectedResource... represents the resource (from srcset) that would have been selected if the sizes value was accurate. Similarly, actualSizesEstimatedWasted... represents the difference between the ideal size and the size of the selected source. So the places where you're seeing 0 reported for actualSizesEstimatedWasted... means that even though the sizes attribute was inaccurate, there wasn't a better source available in the srcset.

That said, I think you're probably right that we would want to divide the estimated wasted bytes by the ideal selected to get a relative value of the wasted value.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

felixarntz

@joemcgill This looks great, just one tiny problem. I ran the query and will put the results into the PR description now.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

felixarntz · 2024-04-11T20:51:30Z

sql/2024/04/inaccurate-sizes-attribute-impact.sql

+  percentile,
+  client


Potentially a good idea to reverse this. I'm not sure whether in this particular situation it's a good idea to have the mobile and desktop results always next to each other, but usually we're looking at them as two independent lenses at the same data, so client is typically best to use first in ORDER BY.

Values can occasionally be a decimal Co-authored-by: Felix Arntz <[email protected]>

Co-authored-by: Felix Arntz <[email protected]>

joemcgill

I've made the changes you've recommended. Thanks for testing this out.

sql/2024/04/inaccurate-sizes-attribute-impact.sql

adamsilverstein

Nice!

westonruter · 2024-04-16T17:44:49Z

For reference, the diff between the old query and the new one:

--- old.sql	2024-04-16 10:41:31.794750809 -0700
+++ new.sql	2024-04-16 10:41:24.940750600 -0700
@@ -1,6 +1,6 @@
 # HTTP Archive query to measure impact of inaccurate sizes attributes per <img> for WordPress sites.
 #
-# WPP Research, Copyright 2022 Google LLC
+# WPP Research, Copyright 2024 Google LLC
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -13,81 +13,90 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+#
+# See https://github.com/GoogleChromeLabs/wpp-research/pull/108
+
+DECLARE DATE_TO_QUERY DATE DEFAULT '2024-03-01';
 
-# See query results here: https://github.com/GoogleChromeLabs/wpp-research/pull/19
-CREATE TEMPORARY FUNCTION
-  getSrcsetSizesAccuracy(payload STRING)
-  RETURNS ARRAY<STRUCT<sizesAbsoluteError INT64,
+CREATE TEMPORARY FUNCTION GET_IMG_SIZES_ACCURACY(custom_metrics STRING) RETURNS
+  ARRAY<STRUCT<hasSrcset BOOL,
+  hasSizes BOOL,
+  sizesAbsoluteError FLOAT64,
   sizesRelativeError FLOAT64,
-  wDescriptorAbsoluteError INT64,
-  wDescriptorRelativeError FLOAT64,
+  idealSizesSelectedResourceEstimatedPixels INT64,
   actualSizesEstimatedWastedLoadedPixels INT64,
+  relativeSizesEstimatedWastedLoadedPixels FLOAT64,
+  idealSizesSelectedResourceEstimatedBytes FLOAT64,
   actualSizesEstimatedWastedLoadedBytes FLOAT64,
-  wastedLoadedPercent FLOAT64>>
-  LANGUAGE js AS '''
-try {
-  var $ = JSON.parse(payload);
-  var responsiveImages = JSON.parse($._responsive_images);
-  responsiveImages = responsiveImages['responsive-images'];
-  return responsiveImages.map(({
-    sizesAbsoluteError,
-    sizesRelativeError,
-    wDescriptorAbsoluteError,
-    wDescriptorRelativeError,
-    idealSizesSelectedResourceEstimatedPixels,
-    actualSizesEstimatedWastedLoadedPixels,
-    actualSizesEstimatedWastedLoadedBytes
-  }) => {
-    let wastedLoadedPercent;
-    if ( idealSizesSelectedResourceEstimatedPixels > 0 ) {
-      wastedLoadedPercent = actualSizesEstimatedWastedLoadedPixels / idealSizesSelectedResourceEstimatedPixels;
-    } else {
-      wastedLoadedPercent = null;
-    }
-    return {
-      sizesAbsoluteError,
-      sizesRelativeError,
-      wDescriptorAbsoluteError,
-      wDescriptorRelativeError,
-      actualSizesEstimatedWastedLoadedPixels,
-      actualSizesEstimatedWastedLoadedBytes,
-      wastedLoadedPercent
-    };
-  }
+  relativeSizesEstimatedWastedLoadedBytes FLOAT64>>
+AS (
+  ARRAY(
+    SELECT AS STRUCT
+      CAST(JSON_EXTRACT_SCALAR(image, '$.hasSrcset') AS BOOL) AS hasSrcset,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.hasSizes') AS BOOL) AS hasSizes,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.sizesAbsoluteError') AS FLOAT64) AS sizesAbsoluteError,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.sizesRelativeError') AS FLOAT64) AS sizesRelativeError,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedPixels') AS INT64) AS idealSizesSelectedResourceEstimatedPixels,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedPixels') AS INT64) AS actualSizesEstimatedWastedLoadedPixels,
+      SAFE_DIVIDE(
+        CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedPixels') AS INT64),
+        CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedPixels') AS INT64)
+      ) AS relativeSizesEstimatedWastedLoadedPixels,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedBytes') AS FLOAT64) AS idealSizesSelectedResourceEstimatedBytes,
+      CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedBytes') AS FLOAT64) AS actualSizesEstimatedWastedLoadedBytes,
+      SAFE_DIVIDE(
+        CAST(JSON_EXTRACT_SCALAR(image, '$.actualSizesEstimatedWastedLoadedBytes') AS FLOAT64),
+        CAST(JSON_EXTRACT_SCALAR(image, '$.idealSizesSelectedResourceEstimatedBytes') AS FLOAT64)
+      ) AS relativeSizesEstimatedWastedLoadedBytes,
+    FROM
+      UNNEST(JSON_EXTRACT_ARRAY(custom_metrics, '$.responsive_images.responsive-images')) AS image
+  )
+);
+
+CREATE TEMPORARY FUNCTION IS_CMS(technologies ARRAY<STRUCT<technology STRING, categories ARRAY<STRING>, info ARRAY<STRING>>>, cms STRING, version STRING) RETURNS BOOL AS (
+  EXISTS(
+    SELECT * FROM UNNEST(technologies) AS technology, UNNEST(technology.info) AS info
+    WHERE technology.technology = cms
+    AND (
+      version = ""
+      OR ENDS_WITH(version, ".x") AND (STARTS_WITH(info, RTRIM(version, "x")) OR info = RTRIM(version, ".x"))
+      OR info = version
+    )
+  )
 );
-} catch (e) {
-  return [];
-}
-''';
+
+WITH wordpressSizesData AS (
 SELECT
-  percentile,
   client,
-  APPROX_QUANTILES(image.sizesAbsoluteError, 1000)[OFFSET(percentile * 10)] AS sizesAbsoluteError,
-  APPROX_QUANTILES(image.sizesRelativeError, 1000)[OFFSET(percentile * 10)] AS sizesRelativeError,
-  APPROX_QUANTILES(image.wDescriptorAbsoluteError, 1000)[OFFSET(percentile * 10)] AS wDescriptorAbsoluteError,
-  APPROX_QUANTILES(image.wDescriptorRelativeError, 1000)[OFFSET(percentile * 10)] AS wDescriptorRelativeError,
-  APPROX_QUANTILES(image.actualSizesEstimatedWastedLoadedPixels, 1000)[OFFSET(percentile * 10)] AS actualSizesEstimatedWastedLoadedPixels,
-  APPROX_QUANTILES(image.actualSizesEstimatedWastedLoadedBytes, 1000)[OFFSET(percentile * 10)] AS actualSizesEstimatedWastedLoadedBytes,
-  APPROX_QUANTILES(image.wastedLoadedPercent, 1000)[OFFSET(percentile * 10)] AS wastedLoadedPercent
-FROM (
-  SELECT
-    tpages._TABLE_SUFFIX AS client,
     image
   FROM
-    `httparchive.pages.2022_10_01_*` AS tpages,
-    UNNEST(getSrcsetSizesAccuracy(payload)) AS image
-  JOIN
-    `httparchive.technologies.2022_10_01_*` AS tech
-  ON
-    tech.url = tpages.url
+    `httparchive.all.pages`,
+    UNNEST(GET_IMG_SIZES_ACCURACY(custom_metrics)) AS image
   WHERE
-    tpages._TABLE_SUFFIX = tech._TABLE_SUFFIX
-    AND app = 'WordPress'
-    AND category = 'CMS' ),
-  UNNEST([10, 25, 50, 75, 90]) AS percentile
+    date = DATE_TO_QUERY
+    AND IS_CMS(technologies, 'WordPress', '')
+    AND is_root_page = TRUE
+    AND image.hasSrcset = TRUE
+    AND image.hasSizes = TRUE
+)
+
+SELECT
+  percentile,
+  client,
+  APPROX_QUANTILES(image.sizesAbsoluteError, 100)[OFFSET(percentile)] AS sizesAbsoluteError,
+  APPROX_QUANTILES(image.sizesRelativeError, 100)[OFFSET(percentile)] AS sizesRelativeError,
+  APPROX_QUANTILES(image.idealSizesSelectedResourceEstimatedPixels, 100)[OFFSET(percentile)] AS idealSizesSelectedResourceEstimatedPixels,
+  APPROX_QUANTILES(image.actualSizesEstimatedWastedLoadedPixels, 100)[OFFSET(percentile)] AS actualSizesEstimatedWastedLoadedPixels,
+  APPROX_QUANTILES(image.relativeSizesEstimatedWastedLoadedPixels, 100)[OFFSET(percentile)] AS relativeSizesEstimatedWastedLoadedPixels,
+  APPROX_QUANTILES(image.idealSizesSelectedResourceEstimatedBytes, 100)[OFFSET(percentile)] AS idealSizesSelectedResourceEstimatedBytes,
+  APPROX_QUANTILES(image.actualSizesEstimatedWastedLoadedBytes, 100)[OFFSET(percentile)] AS actualSizesEstimatedWastedLoadedBytes,
+  APPROX_QUANTILES(image.relativeSizesEstimatedWastedLoadedBytes, 100)[OFFSET(percentile)] AS relativeSizesEstimatedWastedLoadedBytes,
+FROM
+  wordpressSizesData,
+  UNNEST([10, 20, 30, 40, 50, 60, 70, 80, 90]) AS percentile
 GROUP BY
   percentile,
   client
 ORDER BY
-  percentile,
-  client
+  client,
+  percentile

joemcgill requested a review from felixarntz March 29, 2024 19:57

felixarntz reviewed Apr 1, 2024

View reviewed changes

Update query to use custom_metrics and only used data

932f5f1

joemcgill requested a review from felixarntz April 2, 2024 01:40

felixarntz reviewed Apr 5, 2024

View reviewed changes

joemcgill added 2 commits April 10, 2024 12:48

Update query to use SQL instead of JS

7f0b113

Add hasSrcset and hasSizes values

87964b0

joemcgill added 2 commits April 10, 2024 13:48

Try adding relative wasted values to GET_IMG_SIZES_ACCURACY

1054716

Add var for date to query

1971cbc

felixarntz reviewed Apr 10, 2024

View reviewed changes

sql/2024/04/inaccurate-sizes-attribute-impact.sql Outdated Show resolved Hide resolved

Reverse relative waste calculation

85cca98

Co-authored-by: Felix Arntz <[email protected]>

joemcgill commented Apr 10, 2024

View reviewed changes

felixarntz reviewed Apr 11, 2024

View reviewed changes

joemcgill and others added 2 commits April 11, 2024 16:02

Update sizesAbsoluteError to FLOAT64

d252916

Values can occasionally be a decimal Co-authored-by: Felix Arntz <[email protected]>

Add PR reference

2dc8070

Co-authored-by: Felix Arntz <[email protected]>

joemcgill commented Apr 11, 2024

View reviewed changes

sql/2024/04/inaccurate-sizes-attribute-impact.sql Outdated Show resolved Hide resolved

joemcgill and others added 2 commits April 11, 2024 16:04

Change the order of ORDER BY clauses

57796c3

Update changelog

dcde854

joemcgill requested a review from adamsilverstein April 11, 2024 21:09

Merge branch 'main' into update/innacurate-sizes-attribute-query

387cb06

joemcgill requested a review from westonruter April 15, 2024 15:11

adamsilverstein approved these changes Apr 15, 2024

View reviewed changes

westonruter approved these changes Apr 16, 2024

View reviewed changes

westonruter merged commit de12374 into main Apr 16, 2024
3 checks passed

joemcgill mentioned this pull request Apr 29, 2024

Collect and measure impact of image sizes improvements WordPress/performance#1186

Open

3 tasks

felixarntz mentioned this pull request Oct 3, 2024

Implement SQL query to compare impact of using auto-sizes plugin for sizes=auto functionality #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update inaccurate sizes query for 2024 #108

Update inaccurate sizes query for 2024 #108

joemcgill commented Mar 29, 2024 •

edited

Loading

joemcgill commented Mar 29, 2024 •

edited

Loading

felixarntz left a comment

joemcgill commented Apr 2, 2024

felixarntz left a comment

felixarntz Apr 5, 2024

joemcgill Apr 10, 2024 •

edited

Loading

joemcgill Apr 10, 2024

felixarntz Apr 10, 2024

felixarntz Apr 10, 2024

joemcgill Apr 10, 2024

felixarntz Apr 10, 2024 •

edited

Loading

joemcgill Apr 10, 2024

felixarntz commented Apr 5, 2024

felixarntz left a comment •

edited

Loading

joemcgill commented Apr 10, 2024

joemcgill commented Apr 10, 2024

felixarntz Apr 10, 2024

felixarntz commented Apr 10, 2024

joemcgill left a comment

joemcgill Apr 10, 2024

felixarntz left a comment

felixarntz Apr 11, 2024

joemcgill left a comment

adamsilverstein left a comment

westonruter commented Apr 16, 2024

Update inaccurate sizes query for 2024 #108

Update inaccurate sizes query for 2024 #108

Conversation

joemcgill commented Mar 29, 2024 • edited Loading

Query results

joemcgill commented Mar 29, 2024 • edited Loading

felixarntz left a comment

Choose a reason for hiding this comment

joemcgill commented Apr 2, 2024

felixarntz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joemcgill Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz commented Apr 5, 2024

felixarntz left a comment • edited Loading

Choose a reason for hiding this comment

joemcgill commented Apr 10, 2024

joemcgill commented Apr 10, 2024

Choose a reason for hiding this comment

felixarntz commented Apr 10, 2024

joemcgill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixarntz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joemcgill left a comment

Choose a reason for hiding this comment

adamsilverstein left a comment

Choose a reason for hiding this comment

westonruter commented Apr 16, 2024

joemcgill commented Mar 29, 2024 •

edited

Loading

joemcgill commented Mar 29, 2024 •

edited

Loading

joemcgill Apr 10, 2024 •

edited

Loading

felixarntz Apr 10, 2024 •

edited

Loading

felixarntz left a comment •

edited

Loading