Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Work on standardizing multipart/form-data parsing (for Request.prototype.formData) #10

Open
andreubotella opened this issue Jul 30, 2022 · 2 comments

Comments

@andreubotella
Copy link
Member

The fetch spec includes APIs for interacting with form submissions. For example, there is the Request and Response constructors accepting URLSearchParams and FormData objects as the request/response body, which is generally useful and is expected to be part of the common minimum API.

However, the fetch spec also defines the formData() method of the Body interface mixin, which is included in Request and Response. This method parses the HTTP body as a form submission enctype (either application/x-www-form-urlencoded or multipart/form-data) and returns a FormData object. Since form submission bodies only generally make sense as requests, and it's rarely useful to parse a request body from an HTTP client, it wouldn't make much sense to include this method as part of the common minimum API – but it is certainly useful for fetch-based HTTP server APIs, as Deno and CFW have.

For multipart/form-data parsing, however, this method leaves things almost completely unspecified. While there is a formal definition of this format (in RFC7578, which relies on the multipart definitions in RFC2046), it is in the form of an ABNF grammar rather than a parsing algorithms, and so different implementations differ in how they parse some input.

What's more, browsers have not always escaped field names and filenames in multipart/form-data payloads in the same way. For example, until last year Firefox escaped double quotes by prepending a backslash, and newlines by turning them into spaces; while Chromium and Webkit used percent-encoding. And while this percent-encoding behavior was added to the HTML spec (whatwg/html#6282), and FIrefox's behavior fixed in turn, no implementation of the parsing that I'm aware of (including Chromium and Webkit!) decode the percent-encoding escapes:

const original = new FormData();
original.set('a"b', "");
original.set('c"d', new File([], 'e"f'));
log(original);  // a"b c"d e"f

const parsed = await new Response(original).formData();
log(parsed);  // a%22b c%22d e%22f
// (In CFW it's a%22b c%22d undefined, because it seems like files are not
// distinguished from non-file values when parsing.)

function log(formdata) {
  // FormData is pair-iterable.
  const entries = [...formdata];
  const firstEntryName = entries[0][0];
  const secondEntryName = entries[1][0];
  const secondEntryFilename = entries[1][1].name;
  console.log(firstEntryName, secondEntryName, secondEntryFilename);
}

For browsers, specifying multipart/form-data parsing is not a big priority, since there are not many use cases for them, and the formData() method has been broken for 8 years or so. But for WinterCG runtimes with a fetch-based HTTP server API, being able to parse form submissions with the existing fetch API is crucial, and being able to accurately parse the form submissions that all browser engines are currently submitting is a large part of that. So this seems like a very interesting issue to tackle as part of the WinterCG project.

@cyco130
Copy link

cyco130 commented Oct 12, 2022

I agree this is very important.

But also the formData API is not very suitable for server-side usage. It requires buffering the whole request. I think a streaming API for parsing multipart requests in general (and multipart/form-data in particular) is necessary for any kind of real life usage of the fetch API on the server.

@andreubotella
Copy link
Member Author

I agree this is very important.

But also the formData API is not very suitable for server-side usage. It requires buffering the whole request. I think a streaming API for parsing multipart requests in general (and multipart/form-data in particular) is necessary for any kind of real life usage of the fetch API on the server.

Certainly. @lucacasonato had some proposals about this. But they would still involve defining a multipart/form-data parsing algorithm, and that is the main bulk of the work I will be setting out to do when I get started on this.

@Ethan-Arrowood Ethan-Arrowood transferred this issue from another repository Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants