Skip to content

HTTP Streaming, and Cross Origin Resource Sharing (CORS), and the AWS CloudFront Cache

Michael B. Klein edited this page Feb 7, 2018 · 1 revision

Background

At Northwestern University Library, we have experienced a number of issues getting content to stream over HTTP from our AWS-hosted Audiovisual Repository (AVR) to Chrome and Firefox.

Cross-Origin Resource Sharing (CORS)

When a page served from one site requests a resource (such as a stream) from another site, it’s called a cross-origin request. W3C has a recommendation document dealing with CORS requests, client/server behavior, and recommended browser behaviors.

When a browser makes a cross-origin request for a resource, it adds an Origin header indicating the domain of the page requesting the stream. If that Origin is authorized to play the stream, the server responds with an Access-Control-Allow-Origin header. If that header is not present or doesn’t match the requesting domain, the browser won’t play back the stream even though it successfully retrieved it. This is important: CORS is for client-side decisions about what content to execute or play back, not about whether a resource can be retrieved in the first place.

AWS’ CORS request handling happens in S3, not in CloudFront. CloudFront passes the request to S3 – along with the Origin header, if present – and caches the result. But the decision about whether to include the Access-Control-Allow-Origin header in the response is S3’s to make.

Example Configuration

NUL’s AVR uses the following CORS configuration on the S3 bucket hosting its derivatives:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*.northwestern.edu</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <MaxAgeSeconds>3000</MaxAgeSeconds>
    <AllowedHeader>Authorization</AllowedHeader>
    <AllowedHeader>Access-Control-Allow-Origin</AllowedHeader>
</CORSRule>
</CORSConfiguration>

This does not affect any given client’s authorization to retrieve content from the bucket. (That’s handled by CloudFront via request signing.) All it does is make sure that any request containing an Origin: some.server.at.northwestern.edu header receives a response that includes an Access-Control-Allow-Origin: some.server.at.northwestern.edu, and that any request that does not specify a northwestern.edu origin doesn’t get an Access-Control-Allow-Origin response header at all.

Caching

This is where things get dicey. Even with the correct CORS configuration on the S3 bucket, we found that some content simply failed to stream (or load) on Chrome and/or Firefox. The effect was inconsistent between browsers, different versions of the same browser, and even among different items being streamed to the same browser within the same session.

By default, CloudFront caches responses based on URL alone. That means that if the first request for a resource comes in without the proper Origin header, the response that will be cached and sent in response to all other requests for that same URL will not contain the Access-Control-Allow-Origin header the browser wants, regardless of whether those requests contain the proper Origin header. It is therefore possible for the client to send the correct Origin header and get a response that does not include a correct and meaningful Access-Control-Allow-Origin header, depending on what is in the CloudFront cache.

The solution is to change CloudFront’s caching strategy. Instead of using only the URL as the cache key, CloudFront can use the URL plus the values of certain headers. Turning on (“whitelisting”) the Origin header as part of the cache key guards against the cache misfire above, at the expense of some caching efficiency. For example, CloudFront now needs to cache a separate copy of the same content at the same URL for every Origin header that comes in.

Browser Differences

The reason Chrome and Firefox were affected (in different ways) while Safari was not has to do with their request strategies. Firefox seems to make a bare (Origin-less) request for every .m3u8 playlist file before making a proper CORS request, which can poison the cache right off the bat. Chrome does something similar. Testing multiple browsers at once and switching back and forth can do unpredictable things based on these request behaviors.

Safari, on the other hand, sends an OPTIONS request before its GET request in order to determine whether a resource is CORS-authorized. Since CloudFront (by default) never caches responses to OPTIONS requests, Safari’s CORS decision making is free from local or CloudFront cache interference.

Considerations for Other Servers

This issue may crop up even outside the context of AWS/CloudFront. If there is a caching proxy (e.g., squid) in front of the HTTP streaming server, it also needs to be configured to make Origin-based caching decisions, or modify the Access-Control-Allow-Origin header on the fly.