Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add content scripts section in specification #542

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

oliverdunk
Copy link
Member

@oliverdunk oliverdunk commented Feb 9, 2024

Adds a first draft of information on content scripts in the specification.

There are still some updates needed, in particular around the algorithm for deciding when to inject a script, but I wanted to open something to get some early feedback.


Preview | Diff


Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority:

1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rob--W @rdcronin Would you be able to take a look at this one and confirm if it is accurate? This was my best understanding based on bugs and documentation in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rephrase this.

(The issue with the current language is that:
a] it doesn't specify what the "non-opaque" origin is or where it comes from
b] it doesn't always use the origin of the parent; it uses the initiator (or "precursor"))

If the URL of a document has a specified scheme**, the user agent will fall back to the origin of the initiator instead. This is commonly, but not always, the parent or embedding frame.

** In chrome, these schemes are data:, about:, filesystem:, and blob:. Is that the same in other browsers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @Rob--W and @xeenon to request feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliverdunk The semantics have extensively been discussed on Chromium's issue tracker where I and Devlin discussed the API design. If you're interested, the start of the discussion is at https://issues.chromium.org/issues/40443085#comment48. The design that is close to what we have now was sketched in https://issues.chromium.org/issues/40443085#comment61 , with the final name (match_origin_as_fallback) at https://issues.chromium.org/issues/40443085#comment67. Devlin summarized the discussion at https://issues.chromium.org/issues/40443085#comment71

Upon reviewing the proposed texts here, I think that there is some confusion on terminology. The current text mentions blob URLs as an opaque origin, but that is not the case.

Relevant to content script matching is the URL of the document (which can have an origin component) and the origin of the document (as a security principal). There may not always be an obvious relation between the two:

  • URLs may have visible origin parts in it, such as http(s) and also blob: and (Chrome-only) filesystem: (e.g. blob:https://example.com/UUID).
  • URLs may not have a visible origin in it (about:blank and about:srcdoc), but still have a non-opaque origin: commonly the opener of the frame or window is another http(s) URL. Or even any number of about:blank/srcdoc documents where the first was initially opened by a http(s) origin.
  • The security principal of a document can be an opaque origin, even if the URL of that document looks like it has a non-opaque origin. This happens with <iframe sandbox> or sandbox directive in the Content-Security-Policy. A content script can use window.origin to see whether the origin is opaque, as it would serialize to "null".
    • In case of opaque origins, there is almost always a non-opaque initiator that opened the frame. The term "precursor origin" is used here <iframe sandbox="allow-scripts" src="https://example.com">
  • The exception is when the initiator of the navigation does not have a non-opaque origin. For example, when the user navigates to a data:-URL or to about:blank. Since data:-URL
    • Chrome does currently not run content scripts in these documents.
    • Firefox currently allows content scripts with matches for all URLs AND match_about_blank: true to run scripts in top-level about:blank. This is not documented anywhere though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both, writing the description for these two keys has taken by far the most time in this PR. I've given it another attempt and would appreciate any feedback.

As a general note, concepts like the precursor origin and security principal don't appear to be defined in any other specifications. It seems like they are more informal terms used often in implementations and by implementors. With that in mind, I've tried to describe them as best as possible without talking about them by name.

A few additional notes:

  • I've added an informal note to match_about_blank describing the Firefox behavior for top-level about:blank pages.
  • I've added a note that the path must be a wildcard if match_origin_as_fallback is set. This is the behavior today in Chrome. Interestingly, we don't have any restrictions on include_globs or exclude_globs. This feels like an omission to me and I wonder if we should specify something.
  • In Chrome, sandboxing doesn't seem to be relevant. We always apply these fallbacks, even if the parent is inaccessible to the child frame. With that in mind, I haven't mentioned it here.

Clearly there's a lot of detail here so please let me know if I've missed anything or it could be clearer.

Copy link
Contributor

@rdcronin rdcronin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Oliver! I had a chance to take a quick pass on this one.

specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved

#### Key `match_about_blank`

If this is `true`, the content script will also be injected into an additional user agent specified set of pages used to represent empty frames. This will only happen if the content script matches the page that embedded the frame. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we (browsers) have different criteria for match_about_blank?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here is too vague. match_about_blank was designed for about:blank and about:srcdoc.

If you're looking for clarity, see https://stackoverflow.com/questions/41408936/can-anyone-explain-that-what-is-the-use-of-match-about-blank-in-chrome-extensi, where I previously posted an answer that describes why match_about_blank exists and what it does.

Other documentation:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rob--W, could you take another look? I've made some tweaks although it's unclear to me what was too vague.


Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority:

1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rephrase this.

(The issue with the current language is that:
a] it doesn't specify what the "non-opaque" origin is or where it comes from
b] it doesn't always use the origin of the parent; it uses the initiator (or "precursor"))

If the URL of a document has a specified scheme**, the user agent will fall back to the origin of the initiator instead. This is commonly, but not always, the parent or embedding frame.

** In chrome, these schemes are data:, about:, filesystem:, and blob:. Is that the same in other browsers?

specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved
Copy link

@jpmedley jpmedley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the extra review. I didn't remember that I had looked at this already until I got to my first comment halfway through it.

specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Show resolved Hide resolved

Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority:

1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @Rob--W and @xeenon to request feedback.

specification/index.bs Outdated Show resolved Hide resolved
@Rob--W
Copy link
Member

Rob--W commented Apr 26, 2024

Although not obvious from Github's UI, I just posted additional context on match_about_blank and match_origin_as_fallback:

Copy link
Member Author

@oliverdunk oliverdunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all! I've just done another pass on this, would appreciate any additional feedback.

specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Show resolved Hide resolved

Used to match frames with an opaque or otherwise missing origin. The origin to match against is determined in the following order of priority:

1. If the frame has an [=opaque origin=], such as with a [=blob URLs=], use the non-opaque origin.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both, writing the description for these two keys has taken by far the most time in this PR. I've given it another attempt and would appreciate any feedback.

As a general note, concepts like the precursor origin and security principal don't appear to be defined in any other specifications. It seems like they are more informal terms used often in implementations and by implementors. With that in mind, I've tried to describe them as best as possible without talking about them by name.

A few additional notes:

  • I've added an informal note to match_about_blank describing the Firefox behavior for top-level about:blank pages.
  • I've added a note that the path must be a wildcard if match_origin_as_fallback is set. This is the behavior today in Chrome. Interestingly, we don't have any restrictions on include_globs or exclude_globs. This feels like an omission to me and I wonder if we should specify something.
  • In Chrome, sandboxing doesn't seem to be relevant. We always apply these fallbacks, even if the parent is inaccessible to the child frame. With that in mind, I haven't mentioned it here.

Clearly there's a lot of detail here so please let me know if I've missed anything or it could be clearer.

specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved
specification/index.bs Outdated Show resolved Hide resolved

### Key `match_about_blank`

If this is true, use the URL of the parent frame when matching a child frame whose document URL has the `about` [=scheme=]. See also [[#determine-the-url-for-content-script-matching]]. Defaults to `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chrome currently matches all documents with an about scheme as described in the current copy (source link care of @oliverdunk), but Firefox appears to explicitly check against about:blank and about:srcdoc (source). @xeenon, how does Safari handle this key?

Any changes here should be reflected in the algorithm "Determine the URL for content script matching" section below.


If this is true, use the URL of the parent frame when matching a child frame whose document URL has the `about` [=scheme=]. See also [[#determine-the-url-for-content-script-matching]]. Defaults to `false`.

Note: In Firefox, setting `match_about_blank` to `true` also allows injection into top-level `about:blank` pages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a non-normative label to this block.


### Key `run_at`

Specifies when the content script should be injected. Valid values are `document_start`, `document_end` and `document_idle`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a WebIDL definition for these values.


### Key `world`

The [=world=] any JavaScript scripts should be injected into. Defaults to `ISOLATED`. Valid values are `MAIN` and `ISOLATED`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a WebIDL definition for these values.


### Key `include_globs`

A list of [=globs=] that a page should match. A page matches if the URL matches both the [[#key-matches]] field and the [[#key-include_globs]] field.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably revise "a page should match" to a more generic concept like "potential injection context." Words are hard. @oliverdunk suggested maybe "document"?

1. Let |url| be the document's URL.
1. If the document is within a child frame:
1. If the [=scheme=] of the document's URL is `about`, and `match_about_blank` or `match_origin_as_fallback` is set to true:
1. Set |url| to a URL based on the origin of the parent frame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "URL based on the origin of the parent frame" feels a little … awkward? Definitely not required, but maybe we should have a definition or abstract algorithm that describes this and link out to it from here.

1. Let |url| be the result of running [[#determine-the-url-for-content-script-matching]].
1. If the extension does not have access to the origin, return.
1. If |url| is not matched by a match pattern in `matches`, return.
1. If `include_globs` is present and |url| is not matched by any pattern, return.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bit easier for the reader to parse.

Suggested change
1. If `include_globs` is present and |url| is not matched by any pattern, return.
1. If `include_globs` is present and |url| is not matched by any glob pattern, return.

Comment on lines +290 to +292
1. If |url| is not matched by a match pattern in `matches`, return.
1. If `include_globs` is present and |url| is not matched by any pattern, return.
1. If |url| matches an entry in `exclude_matches` or `exclude_globs`, return.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change needed.

On my first pass through this I assumed that we could simplify the language here by combining steps 2 and 3 (similar to how step 4 is written), but @oliverdunk pointed out thatinclude_globs behaves as I expected. He also pointed out that the user scripts behavior of includeGlobs intentional diverges from content scripts – the the proposal notes that globs are:

  // Implemented as disjunction: runs in documents whose URL matches
  // "matches" or "includeGlobs", and not "excludeMatches" nor "excludeGlobs".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants