You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I have an email body which is in fact UTF-8 but the email lacks any charset declaration, you treat is as ASCII and encode it to UTF-8. Which is in sync with RFC 2045:
Default RFC 822 messages without a MIME Content-Type header are taken by this protocol to be plain text in the US-ASCII character set
So $message->getTextContent() gives me something like:
Yup, there's no easy fix. If you want to use mb_check_encoding yourself though, you're free to do so. There's no 'good way' of knowing if a charset is 'something' without some indication though, all that function does is check that bytes within the passed string 'pass' as the charset you give it, it doesn't know if that's the original charset.
Yes, indeed. As far as I see it, mb_check_encoding() just checks if each byte (or byte sequence) is a formally valid character. So mb_check_encoding('fööbär', 'UTF-8') returns true. The only thing that mb_check_encoding() saves you from is totally invalid characters, as mentioned at php-mime-mail-parser/php-mime-mail-parser#329
So I'm no longer using the code I posted here. Therefore I'm closing this.
When I have an email body which is in fact UTF-8 but the email lacks any
charset
declaration, you treat is as ASCII and encode it to UTF-8. Which is in sync with RFC 2045:So
$message->getTextContent()
gives me something like:Thunderbird does the same. However, php-mime-mail-parser's
$parser->getMessageBody('text')
keeps it as-is:I'm "fixing" this in my code by checking: When I
utf8_decode()
your output (i.e. "reverse" the conversion you did automatically), is it still UTF-8?:The text was updated successfully, but these errors were encountered: