double quotes " can break decoding #159

markusramsak · 2021-01-05T13:48:08Z

the following simplified original version CANNOT be parsed correctly because of the closing quote in the "From: " line.

Delivered-To: [email protected]
Date: Thu, 10 Sep 2020 09:29:57 -0400
To: <[email protected]>
From: "Amway =?utf-8?q?=C3=96sterreich"?= <[email protected]>
Subject: Amway Newsletter Nr. 18 - 10. September 2020
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: Quoted-Printable

if I move the closing quote after ?=, it works.

Delivered-To: [email protected]
Date: Thu, 10 Sep 2020 09:29:57 -0400
To: <[email protected]>
From: "Amway =?utf-8?q?=C3=96sterreich?=" <[email protected]>
Subject: Amway Newsletter Nr. 18 - 10. September 2020
Message-ID: <[email protected]>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: Quoted-Printable

please fix that so the parser can handle this.

The text was updated successfully, but these errors were encountered:

zbateson · 2021-01-05T19:09:45Z

Hi @markusramsak --

A quoted part takes precedence. Specifically, "An 'encoded-word' MUST NOT appear within a 'quoted-string'.", see https://tools.ietf.org/html/rfc2047#section-5

I believe what you're trying to say is the mime-encoded part isn't "decoded", but that's correct behaviour as far as I'm aware. It would be hard to build an exception for what you want without breaking what should be considered valid because the quotes are supposed to take precedence at least as far as I can tell.

Feel free to correct me with relevant examples, including handling by popular mail parsers or clients, or rfcs or other libraries that specifically are handling your situation differently to facilitate a discussion about it.

markusramsak · 2021-01-05T19:19:55Z

I know that it shouldn't happen but I am the programmer of a mail client with more than 100.000 mails to parse and display and the only thing I can say is, it happens.
I just simplified the mail but the issue is real in every newsletter email from the company Amway (https://www.amway.at)

Other mail clients like gmail oder Apple Mail could decode this mail subject correctly - and I would like too.

Maybe it is just a matter of replacing "?=[space] by ?="[space] but I don't know if it would break anything

zbateson · 2021-01-05T19:45:40Z

Unfortunately the way the parser works, the 'part looking for quotes' is separate from the 'part looking for mime encoded parts'. It's semantically okay for a mime-encoded part to have a quote in it, it just won't be handled as a 'control character' terminating (or starting) a quoted-part.

markusramsak · 2021-01-05T19:51:32Z

if it can't be done on your side, than I would implement on my side to replace these wrong characters in the "From " line before it is parsed by your parser.
I would call it "preparsing" because it happens before your complex parsing.

markusramsak · 2021-01-05T19:54:30Z

by the way you did an excellent job with this library! About 9995 out of 10000 emails can be parsed on average from my web mail client (backed by your library) without any issues.

zbateson · 2021-01-05T20:14:27Z

if it can't be done on your side, than I would implement on my side to replace these wrong characters in the "From " line before it is parsed by your parser.

I'm not sure that it can't, but it would be an effort -- I'd have to change the precedence of how things are parsed, which would make some valid but extremely unlikely cases invalid, like From: "My =?utf-8?Q?"weird"?= name" <[email protected]>... (i.e. purposely containing what looks like a mime-encoded part in a name) but I can't imagine that would ever be an issue... there may be other things affected too because of how the parsers are built, it would have to be investigated.

If you're able to sanitize for exceptions you know of like that, I think that would be the way to go at least for now... we can leave this open and look when there's time or if it's affecting more people. You could also try emailing the folks at Amway to tell them there's an issue with their emails :) maybe they're using a house-built system that needs to be patched, or maybe it's a huge commercial system that means handling this scenario should be prioritized.

by the way you did an excellent job with this library! About 9995 out of 10000 emails can be parsed on average from my web mail client (backed by your library) without any issues.

Excellent, very happy to hear that!

zbateson added the Future consideration label Jan 5, 2021

zbateson mentioned this issue Jan 5, 2021

Weird multiline quoted "From" cannot be parsed correctly #161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

double quotes " can break decoding #159

double quotes " can break decoding #159

markusramsak commented Jan 5, 2021

zbateson commented Jan 5, 2021

markusramsak commented Jan 5, 2021

zbateson commented Jan 5, 2021

markusramsak commented Jan 5, 2021

markusramsak commented Jan 5, 2021

zbateson commented Jan 5, 2021

double quotes " can break decoding #159

double quotes " can break decoding #159

Comments

markusramsak commented Jan 5, 2021

zbateson commented Jan 5, 2021

markusramsak commented Jan 5, 2021

zbateson commented Jan 5, 2021

markusramsak commented Jan 5, 2021

markusramsak commented Jan 5, 2021

zbateson commented Jan 5, 2021