Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent treatment of (terminal?) backslashes in Content-Disposition names (all header options?) #3032

Open
xmo-odoo opened this issue Mar 10, 2025 · 3 comments

Comments

@xmo-odoo
Copy link

xmo-odoo commented Mar 10, 2025

Hit the issue because a form in a test case started returning a name None, which displeases python when used as a kwarg. That test form uses multipart/form-data and one of the input names ends with a \.

  • both chrome and firefox do send Content-Disposition: form-data; name="test\" when the field name ends with a \
  • werkzeug's own MultipartEncoder lets backslashes through untouched

I was able to pinpoint 3.0.4 for the change so I assume it's related to #2904 (#2939) though I did not try reverting that.

werkzeug <= 3.0.3

>>> from werkzeug.http import parse_options_header
>>> parse_options_header('form-data; name="test\\"')
('form-data', {'name': 'test\\'})

werkzeug >= 3.0.4

>>> from werkzeug.http import parse_options_header
>>> parse_options_header('form-data; name="test\\"')
('form-data', {})

Full test case

[tox]
env_list =
    py{312,313}-werkzeug30{0,1,2,3,4,5,6}

[testenv]
deps =
     pytest
     werkzeug300: werkzeug==3.0.0
     werkzeug301: werkzeug==3.0.1
     werkzeug302: werkzeug==3.0.2
     werkzeug303: werkzeug==3.0.3
     werkzeug304: werkzeug==3.0.4
     werkzeug305: werkzeug==3.0.5
     werkzeug306: werkzeug==3.0.6
commands = pytest test.py
# test.py
from werkzeug.http import parse_options_header
from werkzeug.datastructures import Headers
from werkzeug.sansio import multipart

def test_browser_content_disposition():
    assert parse_options_header('form-data; name="test\\"') \
        == ('form-data', {'name': 'test\\'})

def test_requests_content_disposition():
    # consistently weird
    assert parse_options_header(
        'form-data; name="test\\"; filename="test\\"',
    ) == (
        'form-data',
        {'name': 'test"; filename='},
    )

def test_roundtrip():
    enc = multipart.MultipartEncoder(b"x")
    body = enc.send_event(multipart.Field("test\\", {}))
    body += enc.send_event(multipart.Data(b"thing", False))
    body += enc.send_event(multipart.Epilogue(b""))

    dec = multipart.MultipartDecoder(b"x")
    dec.receive_data(body)
    dec.receive_data(None)

    assert dec.next_event() == multipart.Preamble(data=b'')
    assert dec.next_event() == multipart.Field(name='test\\', headers=Headers([('Content-Disposition', 'form-data; name="test\\"')]))
    assert dec.next_event() == multipart.Data(data=b'thing', more_data=False)
    assert dec.next_event() == multipart.Epilogue(data=b'')
@xmo-odoo xmo-odoo changed the title Inconsistent treatment of (terminal?) backslashes in Content-Disposition names Inconsistent treatment of (terminal?) backslashes in Content-Disposition names (all header options?) Mar 10, 2025
@davidism
Copy link
Member

davidism commented Mar 10, 2025

This is an inconsistency in how browsers handle escaping quoted values. For some reason, \" at the end of the value should be treated as \\", which makes the parsing way more complicated. The fact that browsers will send this incorrect escape is likely either because it hasn't been reported/fixed in them, or for historical reasons, but it's unlikely to be an indication that such names should be used. I think it might be related to how Internet Explorer worked with Windows paths, and other browsers kept it for historical reasons. Is there a reason you need to generate fields with this name? It seems like a better solution would be to use field names that don't hit corner cases of the inconsistent client behavior.

@davidism
Copy link
Member

davidism commented Mar 10, 2025

Note that the current HTTP spec does not mention this special case. https://httpwg.org/specs/rfc9110.html#quoted.strings Searching back through obsoleted RFCs, I didn't find any mention of the special case. In fact, it says that our current behavior is correct:

Recipients that process the value of a quoted-string MUST handle a quoted-pair as if it were replaced by the octet following the backslash.

@xmo-odoo
Copy link
Author

xmo-odoo commented Mar 11, 2025

Note that the current HTTP spec does not mention this special case. https://httpwg.org/specs/rfc9110.html#quoted.strings Searching back through obsoleted RFCs, I didn't find any mention of the special case. In fact, it says that our current behavior is correct:

Sure, I don't disagree that per the letter of the spec the behaviour is correct (kinda, werkzeug does discard the entire parameter value), and that the entire thing is a minefield (I originally had a link to curl/curl#7789 in there but apparently removed it during one of the edition passes I made). However,

  1. it's still an issue when interacting with browsers (and really most user agents out there from what I've seen), and from there users
  2. the fact that werkzeug's own mime encoder generates broken content per its decoder seems like an issue regardless
  3. the change in a patch release is a bit rough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants