Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SapiStreamEmitter that does not consider Content-Range #22

Open
Stadly opened this issue Sep 13, 2021 · 5 comments · May be fixed by #23 or #28
Open

SapiStreamEmitter that does not consider Content-Range #22

Stadly opened this issue Sep 13, 2021 · 5 comments · May be fixed by #23 or #28

Comments

@Stadly
Copy link

Stadly commented Sep 13, 2021

Feature Request

Q A
New Feature yes
RFC no
BC Break no

Summary

SapiStreamEmitter::emit considers the header Content-Range and emits only the relevant range when only a single range is requested and the range unit is bytes. This seems very nice and convenient at first glance.

If the application wants to support multi-range requests or range units other than bytes, however, populating the response with the correct content range(s) must be done by the application, and SapiStreamEmitter will emit the entire provided response body.

This seems very asymmetrical to me. Who should actually be in charge of ensuring that the response body is correct? The one who populates the response body or the one who emits it?

My application would have to populate the final response body in the case of multi-range or non-byte unit, but populate the entire file into the response body in the case of single byte range, and then let SapiStreamEmitter handle the range-stuff. I would like the option to let SapiStreamEmitter just output the whole response body that I have provided, regardless of the Content-Range range header.

I actually don't think the Content-Range functionality has anything to do in the emitter, so I would prefer to just remove it. But such a change in functionality might warrant a new major version. The behavior could also be set in a constructor argument. Or there could be a separate class.

If the Content-Range functionality has anything to do in the emitter, why isn't it also a part of the SapiEmitter. In my opinion, exchanging the one for the other should not lead to different output.

@boesing
Copy link
Member

boesing commented Sep 13, 2021

Sounds interesting but I actually don't have a use-case for this while not fully understanding the standard on how to implement multi-range responses at all. Due to this fact, I am unsure if we are adding this by ourselves I will work on this.

Since the HttpHandlerRunner consumes an EmitterInterface, a such implementations would be possible from a technical perspective. Pull Requests are very welcome so if you have a working implementation, we are happy to review that.

Maybe there are other @laminas/technical-steering-committee members which have a deeper knowledge regarding this specific details (multi-range, non-bytes, ...).


I guess this provides some technical details: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests

@weierophinney
Copy link
Member

The idea behind the SapiStreamEmitter is that it triggers only if the response has provided a Content-Range header. If it has, and the response body is seekable, it will emit a set amount of content starting at the offset, as provided in that header.

As to why it's implemented the way it is: it's primarily to work with the way PSR-7 defines content body, which is as streams. The default implementation we provide in Diactoros is backed by a PHP stream, which can either be an in-memory resource (e.g., php://memory or php://temp) or an actual filesystem resources (e.g., the return value of an fopen() call). This latter case is particularly interesting, as it allows you to do chunked downloads of large files in a way that is resource efficent; because it is only a file handle, it's doing seek operations so the entire file doesn't need to be in memory at any given time.

Because of the way the StreamInterface implementation is written in PSR-7, you could even write a generator-backed implementation that loops over results from some operation and spits out a structured set — this could be quite useful with API pagination!

The point is that your middleware and handlers can determine what the user has requested via the headers, and offload how that range is returned to the stream implementation itself, versus doing that heavy lifting directly.

If you don't like this particular approach, omit the SapiStreamEmitter from the EmitterStack. We included it as a default as it makes returning chunked responses for large resources (e.g., downloadable files) trivial to implement for end-users.

@Stadly
Copy link
Author

Stadly commented Sep 14, 2021

@boesing

Sounds interesting but I actually don't have a use-case for this while not fully understanding the standard on how to implement multi-range responses at all. Due to this fact, I am unsure if we are adding this by ourselves I will work on this.

I came across the issue while creating the file serving library FileWaiter. FileWaiter considers multiple headers (among them Range) in order to populate a response with the correct headers and body for a request. I tried using SapiEmitter for emitting the body, but ran into memory issues. I then tried using SapiStreamEmitter, but realized that the emitted body was then incorrect for single range requests. This is because FileWaiter ensures that only the requested bytes are part of the response body, but then SapiStreamEmitter only emits a subset of those bytes again, based on the Content-Range header.

Example of request and response for single byte range, where the entire file contents are abcdefghijklmnopqrstuvwxyz.

Request

GET /url/to/file HTTP/1.1
Range: bytes=5-15

Response

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 11
Content-Range: bytes 5-15/26

fghijklmnop

When emitting this response, SapiStreamEmitter will only emit klmnop as the body, since the first 5 bytes are omitted. SapiEmitter will emit the whole fghijklmnop.

Example of request and response for multiple byte ranges (the 4th byte, bytes 11 to 21, the last 5 bytes, bytes 19 and to the end), where the entire file contents are abcdefghijklmnopqrstuvwxyz.

Request

GET /url/to/file HTTP/1.1
Range: bytes=bytes=3-3, 10-20, -5, 18-

Response

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 227
Content-Type: multipart/byteranges; boundary=BOUNDARY

--BOUNDARY
Content-Range: bytes 3-3/26

d
--BOUNDARY
Content-Range: bytes 10-20/26

klmnopqrstu
--BOUNDARY
Content-Range: bytes 21-25/26

vwxyz
--BOUNDARY
Content-Range: bytes 18-25/26

stuvwxyz
--BOUNDARY--

When emitting this response, both SapiStreamEmitter and SapiEmitter will emit the whole response body, since there is no Content-Range header (it is embedded into the body for multi-range requests).

@Stadly
Copy link
Author

Stadly commented Sep 14, 2021

@weierophinney

The idea behind the SapiStreamEmitter is that it triggers only if the response has provided a Content-Range header. If it has, and the response body is seekable, it will emit a set amount of content starting at the offset, as provided in that header.

This is very nice, aside from the fact that it will only give the expected output if all the conditions are met: single range, byte unit, seekable. If any of the conditions are not met, the entire body will be output.

Therefore, it makes more sense to populate the correct response body when generating the response than when emitting it. Then it is clear who is responsible for what, and the responsibilities will always be the same - not depend on some condition (single range, byte unit, seekable).

This latter case is particularly interesting, as it allows you to do chunked downloads of large files in a way that is resource efficent; because it is only a file handle, it's doing seek operations so the entire file doesn't need to be in memory at any given time.

This can also be done when populating only a range of a file into a response body, for instance using LimitStream - there is no need to do it during emit to get the performance benefit.

If you don't like this particular approach, omit the SapiStreamEmitter from the EmitterStack. We included it as a default as it makes returning chunked responses for large resources (e.g., downloadable files) trivial to implement for end-users.

The problem then is that you end up with SapiEmitter, which may consumes huge amounts of memory because the entire response body is converted to a string before output.

@weierophinney
Copy link
Member

The problem then is that you end up with SapiEmitter, which may consumes huge amounts of memory because the entire response body is converted to a string before output.

  • So propose an alternative emitter via a pull request, OR
  • Create an emitter for your specific needs, AND
  • Set that emitter as first in the EmitterStack, or as the specific EmitterInterface used in your application.

We're well aware that not all emitters will work for all use cases, which is why the EmitterInterface and EmitterStack exist. The implementations we are providing are for the most general use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants