Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird multiline quoted "From" cannot be parsed correctly #161

Open
markusramsak opened this issue Jan 5, 2021 · 3 comments
Open

Weird multiline quoted "From" cannot be parsed correctly #161

markusramsak opened this issue Jan 5, 2021 · 3 comments

Comments

@markusramsak
Copy link

markusramsak commented Jan 5, 2021

The "From: " part of this simplified email cannot be parsed correctly.
The result should be: Holger_Akademie für Impulsgebung <[email protected]>

From: =?UTF-8?Q?"Holger=5FAkademie_f=C3=BCr_Impulsge?=
 =?UTF-8?Q?bung"_<[email protected]>?=
To: <[email protected]>
Subject: Nach dem Meisterkreis am 28. April 2016
Date: Thu, 28 Apr 2016 22:05:47 +0200
Message-ID: <[email protected]>
@zbateson
Copy link
Owner

zbateson commented Jan 5, 2021

Hi @markusramsak --

This one too is invalid. It fails on two parts:

  1. Semantic parts of a header need to be outside the encoded parts, the encoded parts can encode only within them generally. There is an exception that needs to be made for References/Content-Id headers. See Lack of RFC 1342 support for IdHeader #109 for info on that, but particularly this from RFC 1342:

An encoded-word may replace a "text" token (as defined by RFC 822) in:
(1) a Subject or Comments header field, (2) any extension message
header field, (3) any user-defined message header field, or (4) any
RFC 1341 body part header field (such as Content-Description) for
which the field body contains only "text"s.

That means for example, this is a commented part: (=?UTF-8?=Q?blah?=) but this is not: =?UTF-8?=Q?(blah)?= .

  1. For email addresses specifically, they may not be mime-header encoded anyway... my processing may allow it if it's just part of an address (it may not, I can't remember off-hand) but it's not allowed by the RFC: "An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'." in https://tools.ietf.org/html/rfc2047#section-5

Again, I welcome discussion/examples of other handling, etc...

@markusramsak
Copy link
Author

I believe you but these is a part of an real email where this case happenend. I understand if you don't want to handle these cases but then I would try to handle these cases.
These cases are rare but they happen.

@zbateson
Copy link
Owner

zbateson commented Jan 5, 2021

This would also be fixed by prioritizing the mime-encoded part over the quoted part like #159 if it makes sense to do so (should be investigated to see impact/usefulness).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants