Skip to content

Commit

Permalink
Fix incorrect "total" numbers in Security chapter (2024, 2022, ?) (#3912
Browse files Browse the repository at this point in the history
)

* Update iframe_attributes_usage description

* Fix total_iframes in iframe_attributes_usage.sql

* Fix total pages in meta_csp_disallowed_directives.sql

* Clarify in 3 queries that the total is not global

* Note clarification

* Update contributor details

* Add comments to 2022, 2021, 2020 queries

* Fix linting issues

* Adapt text with updated query results

* Query for 2020 and 2021 (using crawl.pages)

* Fix linting

* Adapt articles of 2022, 2021 and 2020

* Apply number fixes

* Apply fixes to translated chapters

---------

Co-authored-by: Gertjan Franken <[email protected]>
  • Loading branch information
JannisBush and GJFR authored Dec 29, 2024
1 parent aaa187a commit 4504d2d
Show file tree
Hide file tree
Showing 21 changed files with 164 additions and 41 deletions.
2 changes: 1 addition & 1 deletion sql/2020/security/iframe_attributes_usage.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# usage of allow and sandbox attribute of iframe elements, per page and over all iframe elements
SELECT
client,
COUNT(0) AS total_iframes,
COUNT(0) AS total_iframes, # Note: These are not the total number of iframes but only the number of iframes with allow/sandbox + 1 for each website without such iframes
COUNTIF(allow IS NOT NULL) AS freq_allow,
COUNTIF(allow IS NOT NULL) / COUNT(0) AS pct_allow_frames,
COUNTIF(sandbox IS NOT NULL) AS freq_sandbox,
Expand Down
2 changes: 1 addition & 1 deletion sql/2021/security/iframe_attributes_usage.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# usage of allow and sandbox attribute of iframe elements, per page and over all iframe elements
SELECT
client,
COUNT(0) AS total_iframes,
COUNT(0) AS total_iframes, # Note: These are not the total number of iframes but only the number of iframes with allow/sandbox + 1 for each website without such iframes
COUNTIF(allow IS NOT NULL) AS freq_allow,
COUNTIF(allow IS NOT NULL) / COUNT(0) AS pct_allow_frames,
COUNTIF(sandbox IS NOT NULL) AS freq_sandbox,
Expand Down
2 changes: 1 addition & 1 deletion sql/2022/security/iframe_attributes_usage.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# usage of allow and sandbox attribute of iframe elements, per page and over all iframe elements
SELECT
client,
COUNT(0) AS total_iframes,
COUNT(0) AS total_iframes, # Note: These are not the total number of iframes but only the number of iframes with allow/sandbox + 1 for each website without such iframes
COUNTIF(allow IS NOT NULL) AS freq_allow,
COUNTIF(allow IS NOT NULL) / COUNT(0) AS pct_allow_frames,
COUNTIF(sandbox IS NOT NULL) AS freq_sandbox,
Expand Down
2 changes: 1 addition & 1 deletion sql/2024/security/coep_header_prevalence.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#standardSQL
# Section: Attack Preventions - Preventing attacks using Cross-Origin policies
# Question: Which are the most common COEP values?
# Note: Considers headers of main document responses
# Note: Considers headers of main document responses only
SELECT
client,
coep_header,
Expand Down
2 changes: 1 addition & 1 deletion sql/2024/security/coop_header_prevalence.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#standardSQL
# Section: Attack Preventions - Preventing attacks using Cross-Origin policies
# Question: Which are the most common COOP values?
# Note: Considers headers of main document responses
# Note: Considers headers of main document responses only
SELECT
client,
coop_header,
Expand Down
5 changes: 3 additions & 2 deletions sql/2024/security/csp_number_of_allowed_hosts.sql
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
#standardSQL
# Section: Attack Preventions - Preventing attacks using CSP
# Question: CSP on home pages: number of unique headers, header length and number of allowed HTTP(S) hosts in all directives
# Note: for CSP we checked whether the header value is NULL (empty?) (99.65% of CSP headers are not NULL on desktop), we did not do this for other headers?
CREATE TEMP FUNCTION getNumUniqueHosts(str STRING) AS (
(SELECT COUNT(DISTINCT x) FROM UNNEST(REGEXP_EXTRACT_ALL(str, r'(?i)(https*://[^\s;]+)[\s;]')) AS x)
);

SELECT
client,
percentile,
COUNT(0) AS total_requests,
COUNTIF(csp_header IS NOT NULL) AS total_csp_headers,
COUNT(0) AS total_csp_headers,
COUNTIF(csp_header IS NOT NULL) AS total_non_null_csp_headers,
COUNTIF(csp_header IS NOT NULL) / COUNT(0) AS pct_csp_headers,
COUNT(DISTINCT csp_header) AS num_unique_csp_headers,
APPROX_QUANTILES(LENGTH(csp_header), 1000 IGNORE NULLS)[OFFSET(percentile * 10)] AS csp_header_length,
Expand Down
4 changes: 2 additions & 2 deletions sql/2024/security/csp_script_source_list_keywords.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Question: usage of default/script-src, and within the directive usage of strict-dynamic, nonce values, unsafe-inline and unsafe-eval
SELECT
client,
total_pages,
total_pages_with_csp,
freq_csp,
freq_default_script_src,
SAFE_DIVIDE(freq_default_script_src, freq_csp) AS pct_default_script_src_over_csp,
Expand All @@ -22,7 +22,7 @@ SELECT
FROM (
SELECT
client,
COUNT(0) AS total_pages,
COUNT(0) AS total_pages_with_csp,
COUNTIF(csp_header IS NOT NULL) AS freq_csp,
COUNTIF(REGEXP_CONTAINS(csp_header, '(?i)(default|script)-src')) AS freq_default_script_src,
COUNTIF(REGEXP_CONTAINS(csp_header, '(?i)(default|script)-src[^;]+strict-dynamic')) AS freq_strict_dynamic,
Expand Down
8 changes: 4 additions & 4 deletions sql/2024/security/iframe_attribute_popular_hosts.sql
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ SELECT
client,
policy_type,
hostname,
total_iframes,
total_iframes_with_allow_or_sandbox,
COUNTIF(has_policy) AS freq,
COUNTIF(has_policy) / total_iframes AS pct
COUNTIF(has_policy) / total_iframes_with_allow_or_sandbox AS pct
FROM (
SELECT
client,
Expand All @@ -37,7 +37,7 @@ FROM (
JOIN (
SELECT
client,
SUM(ARRAY_LENGTH(JSON_EXTRACT_ARRAY(JSON_EXTRACT_SCALAR(payload, '$._security'), '$.iframe-allow-sandbox'))) AS total_iframes
SUM(ARRAY_LENGTH(JSON_EXTRACT_ARRAY(JSON_EXTRACT_SCALAR(payload, '$._security'), '$.iframe-allow-sandbox'))) AS total_iframes_with_allow_or_sandbox
FROM
`httparchive.all.pages`
WHERE
Expand All @@ -49,7 +49,7 @@ USING
(client)
GROUP BY
client,
total_iframes,
total_iframes_with_allow_or_sandbox,
policy_type,
hostname
HAVING
Expand Down
26 changes: 20 additions & 6 deletions sql/2024/security/iframe_attributes_usage.sql
Original file line number Diff line number Diff line change
@@ -1,16 +1,29 @@
#standardSQL
# Section: Content Inclusion - Iframe Sandbox/Permissions Policy
# Question: How often are the allow and sandbox attributes used on iframes? Both per page and over all iframe elements
# Question: How often are the allow and sandbox attributes used on iframes? Both per page (used in at least one iframe on a page) and over all iframe elements
WITH total_iframe_count AS (
SELECT
client,
date,
SUM(SAFE_CAST(JSON_EXTRACT(custom_metrics, '$.num_iframes') AS INT64)) AS total_iframes
FROM
`httparchive.all.pages`
WHERE
(date = '2022-06-01' OR date = '2023-06-01' OR date = '2023-12-01' OR date = '2024-03-01' OR date = '2024-04-01' OR date = '2024-05-01' OR date = '2024-06-01') AND
is_root_page
GROUP BY client, date
)

SELECT
client,
date,
COUNT(0) AS total_iframes,
total_iframes,
COUNTIF(allow IS NOT NULL) AS freq_allow,
COUNTIF(allow IS NOT NULL) / COUNT(0) AS pct_allow_frames,
COUNTIF(allow IS NOT NULL) / total_iframes AS pct_allow_frames,
COUNTIF(sandbox IS NOT NULL) AS freq_sandbox,
COUNTIF(sandbox IS NOT NULL) / COUNT(0) AS pct_sandbox_frames,
COUNTIF(sandbox IS NOT NULL) / total_iframes AS pct_sandbox_frames,
COUNTIF(allow IS NOT NULL AND sandbox IS NOT NULL) AS freq_both_frames,
COUNTIF(allow IS NOT NULL AND sandbox IS NOT NULL) / COUNT(0) AS pct_both_frames,
COUNTIF(allow IS NOT NULL AND sandbox IS NOT NULL) / total_iframes AS pct_both_frames,
COUNT(DISTINCT url) AS total_urls,
COUNT(DISTINCT IF(allow IS NOT NULL, url, NULL)) AS allow_freq_urls,
COUNT(DISTINCT IF(allow IS NOT NULL, url, NULL)) / COUNT(DISTINCT url) AS allow_pct_urls,
Expand All @@ -36,8 +49,9 @@ FROM (
is_root_page
)
LEFT JOIN UNNEST(iframeAttrs) AS iframeAttr
)
) JOIN total_iframe_count USING (client, date)
GROUP BY
total_iframes,
client,
date
ORDER BY
Expand Down
58 changes: 58 additions & 0 deletions sql/2024/security/iframe_attributes_usage_fix.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#standardSQL
# Section: Content Inclusion - Iframe Sandbox/Permissions Policy
# Question: How often are the allow and sandbox attributes used on iframes? Both per page (used in at least one iframe on a page) and over all iframe elements
WITH total_iframe_count AS (
SELECT
client,
date,
SUM(SAFE.INT64(custom_metrics.other.num_iframes)) AS total_iframes
FROM
`httparchive.crawl.pages`
WHERE
(date = '2020-08-01' OR date = '2021-07-01' OR date = '2022-06-01') AND
is_root_page
GROUP BY client, date
)

SELECT
client,
date,
total_iframes,
COUNTIF(allow IS NOT NULL) AS freq_allow,
COUNTIF(allow IS NOT NULL) / total_iframes AS pct_allow_frames,
COUNTIF(sandbox IS NOT NULL) AS freq_sandbox,
COUNTIF(sandbox IS NOT NULL) / total_iframes AS pct_sandbox_frames,
COUNTIF(allow IS NOT NULL AND sandbox IS NOT NULL) AS freq_both_frames,
COUNTIF(allow IS NOT NULL AND sandbox IS NOT NULL) / total_iframes AS pct_both_frames,
COUNT(DISTINCT url) AS total_urls,
COUNT(DISTINCT IF(allow IS NOT NULL, url, NULL)) AS allow_freq_urls,
COUNT(DISTINCT IF(allow IS NOT NULL, url, NULL)) / COUNT(DISTINCT url) AS allow_pct_urls,
COUNT(DISTINCT IF(sandbox IS NOT NULL, url, NULL)) AS sandbox_freq_urls,
COUNT(DISTINCT IF(sandbox IS NOT NULL, url, NULL)) / COUNT(DISTINCT url) AS sandbox_pct_urls
FROM (
SELECT
client,
date,
url,
SAFE.STRING(iframeAttr.allow) AS allow,
SAFE.STRING(iframeAttr.sandbox) AS sandbox
FROM (
SELECT
client,
date,
page AS url,
JSON_EXTRACT_ARRAY(custom_metrics.security.`iframe-allow-sandbox`) AS iframeAttrs
FROM
`httparchive.crawl.pages`
WHERE
(date = '2020-08-01' OR date = '2021-07-01' OR date = '2022-06-01') AND
is_root_page
) LEFT JOIN UNNEST(iframeAttrs) AS iframeAttr
) JOIN total_iframe_count USING (client, date)
GROUP BY
total_iframes,
client,
date
ORDER BY
date,
client
21 changes: 19 additions & 2 deletions sql/2024/security/meta_csp_disallowed_directives.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,24 @@
# Section: Security misconfigurations - CSP directives that are ignored in <meta>
# Question: How many pages use invalid CSP directives in <meta>?
# Note: uses the old payload._almanac metric location instead of custom_metrics.almanac (also the meta-nodes metric is in the generic almanac.js custom metric)
WITH totals AS (
SELECT
client,
COUNT(0) AS total_pages
FROM
`httparchive.all.requests`
WHERE
date = '2024-06-01' AND
is_root_page
GROUP BY
client
)


SELECT
client,
COUNT(DISTINCT page) AS total_pages,
total_pages,
COUNT(DISTINCT page) AS total_pages_with_csp_meta,
COUNT(CASE WHEN REGEXP_CONTAINS(LOWER(JSON_VALUE(meta_node, '$.content')), r'(?i)frame-ancestors') THEN page END) AS count_frame_ancestors,
COUNT(CASE WHEN REGEXP_CONTAINS(LOWER(JSON_VALUE(meta_node, '$.content')), r'(?i)frame-ancestors') THEN page END) / COUNT(DISTINCT page) AS pct_frame_ancestors,
COUNT(CASE WHEN REGEXP_CONTAINS(LOWER(JSON_VALUE(meta_node, '$.content')), r'(?i)sandbox( allow-[a-z]+)*;') THEN page END) AS count_sandbox,
Expand All @@ -22,7 +37,9 @@ FROM (
),
UNNEST(JSON_QUERY_ARRAY(metrics, '$.meta-nodes.nodes')) meta_node,
UNNEST(['Content-Security-Policy']) AS policy
JOIN totals USING (client)
WHERE
LOWER(JSON_VALUE(meta_node, '$.http-equiv')) = 'content-security-policy' OR LOWER(JSON_VALUE(meta_node, '$.name')) = 'content-security-policy'
GROUP BY
client
client,
total_pages
3 changes: 2 additions & 1 deletion src/config/contributors.json
Original file line number Diff line number Diff line change
Expand Up @@ -2124,14 +2124,15 @@
"JannisBush": {
"avatar_url": "33023300",
"github": "JannisBush",
"linkedin": "jannis-rautenstrauch",
"name": "Jannis Rautenstrauch",
"teams": {
"2024": [
"analysts"
]
},
"twitter": "jannis_r",
"website": "https://cispa.de/en/people/c01jara"
"website": "https://jannisbush.github.io/"
},
"jaredcwhite": {
"avatar_url": "658496",
Expand Down
8 changes: 6 additions & 2 deletions src/content/en/2020/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -621,15 +621,19 @@ In a similar fashion, by defining the `allow` attribute on `<iframe>` elements,
<figcaption>{{ figure_link(caption="Prevalence of Feature Policy directives on frames.", sheets_gid="547110187", sql_file="iframe_allow_directives.sql") }}</figcaption>
</figure>

The `Feature-Policy` response header has a fairly low adoption rate, at 0.60% of the desktop pages and 0.51% of mobile pages. On the other hand, Feature Policy was enabled on 19.5% of the 8 million frames that were found on the desktop pages. On mobile pages, 16.4% of the 9.2 million frames contained the `allow` attribute.
The `Feature-Policy` response header has a fairly low adoption rate, at 0.60% of the desktop pages and 0.51% of mobile pages. On the other hand, Feature Policy was enabled on 11.8% of the 13.2 million frames that were found on the desktop pages. On mobile pages, 10.8% of the 13.9 million frames contained the `allow` attribute.

<p class="note">An earlier version of this chapter reported incorrect values for the total number of frames and the percentage of frames with the `allow` attribute. More information can be found in this <a hreflang="en" href="https://github.com/HTTPArchive/almanac.httparchive.org/pull/3912">GitHub PR</a>.</p>

Based on the most commonly used directives in the Feature Policy on iframes, we can see that these are mainly used to control how the frames play videos. For instance the most prevalent directive, `encrypted-media`, is used to control access to the Encrypted Media Extensions API, which is required to play DRM-protected videos. The most common iframe origins with a Feature Policy were `https://www.facebook.com` and `https://www.youtube.com` (49.87% and 26.18% of the frames with a Feature Policy on desktop pages respectively).

### Iframe sandbox

By including an untrusted third-party in an iframe, that third-party can try to launch a number of attacks on the page. For instance, it could navigate the top page to a phishing page, launch pop-ups with fake anti-virus advertisements, etc.

The `sandbox` attribute on iframes can be used to restrict the capabilities, and therefore also the opportunities for launching attacks, of the embedded web page. As embedding third-party content such as advertisements or videos is common practice on the web, it is not surprising that many of these are restricted via the `sandbox` attribute: 30.29% of the iframes on desktop pages have a `sandbox` attribute while on mobile pages this is 33.16%.
The `sandbox` attribute on iframes can be used to restrict the capabilities, and therefore also the opportunities for launching attacks, of the embedded web page. As embedding third-party content such as advertisements or videos is common practice on the web, it is not surprising that many of these are restricted via the `sandbox` attribute: 18.3% of the iframes on desktop pages have a `sandbox` attribute while on mobile pages this is 21.9%.

<p class="note">An earlier version of this chapter reported incorrect values for the percentage of frames with the `sandbox` attribute. More information can be found in this <a hreflang="en" href="https://github.com/HTTPArchive/almanac.httparchive.org/pull/3912">GitHub PR</a>.</p>

<figure>
<table>
Expand Down
8 changes: 6 additions & 2 deletions src/content/en/2021/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -641,15 +641,19 @@ We see 1.3% of websites on the mobile using the `Permissions-Policy` already. A
<figcaption>{{ figure_link(caption="Prevalence of `allow` directives on frames.", sheets_gid="623004240", sql_file="iframe_allow_directives.sql") }}</figcaption>
</figure>

One can also use the `allow` attribute in `<iframe>` elements to enable or disable features allowed to be used in the embedded frame. 28.4% of 10.8 million frames in mobile contained the `allow` attribute to enable permission or feature policies.
One can also use the `allow` attribute in `<iframe>` elements to enable or disable features allowed to be used in the embedded frame. 18.3% of 16.8 million frames in mobile contained the `allow` attribute to enable permission or feature policies.

<p class="note">An earlier version of this chapter reported incorrect values for the total number of frames and the percentage of frames with the `allow` attribute. These errors have now been corrected. More information can be found in this <a hreflang="en" href="https://github.com/HTTPArchive/almanac.httparchive.org/pull/3912">GitHub PR</a>.</p>

As in previous years, the most used directives in `allow` attributes on iframes are still related to controls for embedded videos and media. The most used directive continues to be `encrypted-media` which is used to control access to the Encrypted Media Extensions API.

### Iframe sandbox

An untrusted third-party in an iframe could launch a number of attacks on the page. For instance, it could navigate the top page to a phishing page, launch popups with fake anti-virus advertisements and other cross-frame scripting attacks.

The `sandbox` attribute on iframes applies restrictions to the content, and therefore reduces the opportunities for launching attacks from the embedded web page. The value of the attribute can either be empty to apply all restrictions (the embedded page cannot execute any JavaScript code, no forms can be submitted, and no popups can be created, to name a few restrictions), or space-separated tokens to lift particular restrictions. As embedding third-party content such as advertisements or videos via iframes is common practice on the web, it is not surprising that many of these are restricted via the `sandbox` attribute: 32.6% of the iframes on desktop pages have a `sandbox` attribute while on mobile pages this is 32.6%.
The `sandbox` attribute on iframes applies restrictions to the content, and therefore reduces the opportunities for launching attacks from the embedded web page. The value of the attribute can either be empty to apply all restrictions (the embedded page cannot execute any JavaScript code, no forms can be submitted, and no popups can be created, to name a few restrictions), or space-separated tokens to lift particular restrictions. As embedding third-party content such as advertisements or videos via iframes is common practice on the web, it is not surprising that many of these are restricted via the `sandbox` attribute: 19.7% of the iframes on desktop pages have a `sandbox` attribute while on mobile pages this is 21.0%.

<p class="note">An earlier version of this chapter reported incorrect values for the percentage of frames with the `sandbox` attribute. More information can be found in this <a hreflang="en" href="https://github.com/HTTPArchive/almanac.httparchive.org/pull/3912">GitHub PR</a>.</p>

{{ figure_markup(
image="security-prevalence-of-sandbox-directives-on-frames.png",
Expand Down
8 changes: 6 additions & 2 deletions src/content/en/2022/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,9 @@ Besides being used as an HTTP header, this feature can be used within `<iframe>`
<iframe src="https://example.com" allow="geolocation 'src' https://example.com'; camera *"></iframe>
```

18.9% of 11.5 million frames in mobile contained the `allow` attribute to enable permission or feature policies.
12.6% of 17.4 million frames in mobile contained the `allow` attribute to enable permission or feature policies.

<p class="note">An earlier version of this chapter reported incorrect values for the total number of frames and the percentage of frames with the `allow` attribute. More information can be found in this <a hreflang="en" href="https://github.com/HTTPArchive/almanac.httparchive.org/pull/3912">GitHub PR</a>.</p>

The following is a list of the top 10 `allow` directives that were detected in frames:

Expand Down Expand Up @@ -737,7 +739,9 @@ To mitigate these concerns the HTML specification (version 5) introduced the `sa

The above chart of the 2022 data shows that more than 99% of websites with a `sandbox` attribute enable the `allow-scripts` and `allow-same-origin` permissions.

Of desktop websites that embed an iframe, 35.2% also include the `sandbox` attribute.
For all iframes found on desktop websites, 21.08% include the `sandbox` attribute.

<p class="note">An earlier version of this chapter reported the incorrect percentage of frames with the `sandbox` attribute. More information can be found in this <a hreflang="en" href="https://github.com/HTTPArchive/almanac.httparchive.org/pull/3912">GitHub PR</a>.</p>

We find that `Content-Security-Policy` headers which include a `sandbox` directive are at a mere 0.3% usage for mobile (desktop is similar at 0.4%) which may speak to the fact that this attribute is only applied on a per-case basis for the practice of embedding iframe content within pages, rather than ahead-of-time planning through a content security policy definition.

Expand Down
Loading

0 comments on commit 4504d2d

Please sign in to comment.