Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charset ks_c_5601-1987 not support #216

Open
7i77an opened this issue Jun 4, 2019 · 6 comments
Open

Charset ks_c_5601-1987 not support #216

7i77an opened this issue Jun 4, 2019 · 6 comments

Comments

@7i77an
Copy link

7i77an commented Jun 4, 2019

Hi Vincent,

Processing body and subject email, raise error:
Error parsing body part. Charset ks_c_5601-1987 is not supported. Continuing without decoding.

My platform is debian and charset conversion library is ICU.
I have chance to support from library this charset?

			try {
				vmime::shared_ptr <vmime::charsetConverter> conv = vmime::charsetConverter::create(tp->getCharset(), vmime::charset("utf-8"));
				vmime::shared_ptr <vmime::utility::charsetFilteredOutputStream> partFilteredStream = conv->getFilteredOutputStream(partStreamAdapter);
				content->extract(*partFilteredStream);
				partFilteredStream->flush();
			}
			catch(const vmime::exceptions::charset_conv_error& e) {
				jbodyPart["warning"] = e.what();
				g_logger->Warning("Account::ParseBody: Error parsing body part. Charset %s is not supported. Continuing without decoding", tp->getCharset().getName().c_str());
				content->extract(partStreamAdapter);
			}

Thanks.

@vincent-richard
Copy link
Member

Hello!

You could try to map this charset to the actual name in ICU or iconv, it seems it is an alias for "EUC-KR", so try this:

charset::charset(const string& name) ... {
    ...
    if (utility::stringUtils::isStringEqualNoCase(m_name, "ks_c_5601-1987")) {
        m_name = "EUC-KR";
    }
}

I think we really need to do some generic mapping as there might be a lot of alias for a lot of charsets. See a related issue here: php-mime-mail-parser/php-mime-mail-parser#26

@7i77an
Copy link
Author

7i77an commented Jun 4, 2019

I'ts correct.
We need generic mapping base and method to add some custom charset map.

If you do not plan on working on this, please feel free to close the issue.

Thanks a lot Vincent.

@7i77an
Copy link
Author

7i77an commented Jun 5, 2019

Hi Vincent,

I put this code into charset.ccp:

// Explicitly map alias entries for some charsets
struct CharsetAliasEntry
  {
    CharsetAliasEntry(const string& charset_, const string& alias_)
            : charset(charset_), alias(alias_)
    {
    }

    const string charset;
    const string alias;
  };

CharsetAliasEntry g_charsetAliasMap[] =
{
    CharsetAliasEntry("ascii",              "us-ascii"),
    CharsetAliasEntry("us-ascii",           "us-ascii"),
    CharsetAliasEntry("ansi_x3.4-1968",     "us-ascii"),
    CharsetAliasEntry("646",                "us-ascii"),
    CharsetAliasEntry("iso-8859-1",         "ISO-8859-1"),
    CharsetAliasEntry("iso-8859-2",         "ISO-8859-2"),
    CharsetAliasEntry("iso-8859-3",         "ISO-8859-3"),
    CharsetAliasEntry("iso-8859-4",         "ISO-8859-4"),
    CharsetAliasEntry("iso-8859-5",         "ISO-8859-5"),
    CharsetAliasEntry("iso-8859-6",         "ISO-8859-6"),
    CharsetAliasEntry("iso-8859-6-i",       "ISO-8859-6-I"),
    CharsetAliasEntry("iso-8859-6-e",       "ISO-8859-6-E"),
    CharsetAliasEntry("iso-8859-7",         "ISO-8859-7"),
    CharsetAliasEntry("iso-8859-8",         "ISO-8859-8"),
    CharsetAliasEntry("iso-8859-8-i",       "ISO-8859-8-I"),
    CharsetAliasEntry("iso-8859-8-e",       "ISO-8859-8-E"),
    CharsetAliasEntry("iso-8859-9",         "ISO-8859-9"),
    CharsetAliasEntry("iso-8859-10",        "ISO-8859-10"),
    CharsetAliasEntry("iso-8859-11",        "ISO-8859-11"),
    CharsetAliasEntry("iso-8859-13",        "ISO-8859-13"),
    CharsetAliasEntry("iso-8859-14",        "ISO-8859-14"),
    CharsetAliasEntry("iso-8859-15",        "ISO-8859-15"),
    CharsetAliasEntry("iso-8859-16",        "ISO-8859-16"),
    CharsetAliasEntry("iso-ir-111",         "ISO-IR-111"),
    CharsetAliasEntry("iso-2022-cn",        "ISO-2022-CN"),
    CharsetAliasEntry("iso-2022-cn-ext",    "ISO-2022-CN"),
    CharsetAliasEntry("iso-2022-kr",        "ISO-2022-KR"),
    CharsetAliasEntry("iso-2022-jp",        "ISO-2022-JP"),
    CharsetAliasEntry("utf-16be",           "UTF-16BE"),
    CharsetAliasEntry("utf-16le",           "UTF-16LE"),
    CharsetAliasEntry("utf-16",             "UTF-16"),
    CharsetAliasEntry("windows-1250",       "windows-1250"),
    CharsetAliasEntry("windows-1251",       "windows-1251"),
    CharsetAliasEntry("windows-1252",       "windows-1252"),
    CharsetAliasEntry("windows-1253",       "windows-1253"),
    CharsetAliasEntry("windows-1254",       "windows-1254"),
    CharsetAliasEntry("windows-1255",       "windows-1255"),
    CharsetAliasEntry("windows-1256",       "windows-1256"),
    CharsetAliasEntry("windows-1257",       "windows-1257"),
    CharsetAliasEntry("windows-1258",       "windows-1258"),
    CharsetAliasEntry("ibm866",             "IBM866"),
    CharsetAliasEntry("ibm850",             "IBM850"),
    CharsetAliasEntry("ibm852",             "IBM852"),
    CharsetAliasEntry("ibm855",             "IBM855"),
    CharsetAliasEntry("ibm857",             "IBM857"),
    CharsetAliasEntry("ibm862",             "IBM862"),
    CharsetAliasEntry("ibm864",             "IBM864"),
    CharsetAliasEntry("utf-8",              "UTF-8"),
    CharsetAliasEntry("utf-7",              "UTF-7"),
    CharsetAliasEntry("shift_jis",          "Shift_JIS"),
    CharsetAliasEntry("big5",               "Big5"),
    CharsetAliasEntry("euc-jp",             "EUC-JP"),
    CharsetAliasEntry("euc-kr",             "EUC-KR"),
    CharsetAliasEntry("gb2312",             "GB2312"),
    CharsetAliasEntry("gb18030",            "gb18030"),
    CharsetAliasEntry("viscii",             "VISCII"),
    CharsetAliasEntry("koi8-r",             "KOI8-R"),
    CharsetAliasEntry("koi8_r",             "KOI8-R"),
    CharsetAliasEntry("cskoi8r",            "KOI8-R"),
    CharsetAliasEntry("koi",                "KOI8-R"),
    CharsetAliasEntry("koi8",               "KOI8-R"),
    CharsetAliasEntry("koi8-u",             "KOI8-U"),
    CharsetAliasEntry("tis-620",            "TIS-620"),
    CharsetAliasEntry("t.61-8bit",          "T.61-8bit"),
    CharsetAliasEntry("hz-gb-2312",         "HZ-GB-2312"),
    CharsetAliasEntry("big5-hkscs",         "Big5-HKSCS"),
    CharsetAliasEntry("gbk",                "gbk"),
    CharsetAliasEntry("cns11643",           "x-euc-tw"),

    //#
    //# Netscape private ...
    //#
    CharsetAliasEntry("x-imap4-modified-utf7","x-imap4-modified-utf7"),
    CharsetAliasEntry("x-euc-tw",             "x-euc-tw"),
    CharsetAliasEntry("x-mac-ce",             "x-mac-ce"),
    CharsetAliasEntry("x-mac-turkish",        "x-mac-turkish"),
    CharsetAliasEntry("x-mac-greek",          "x-mac-greek"),
    CharsetAliasEntry("x-mac-icelandic",      "x-mac-icelandic"),
    CharsetAliasEntry("x-mac-croatian",       "x-mac-croatian"),
    CharsetAliasEntry("x-mac-romanian",       "x-mac-romanian"),
    CharsetAliasEntry("x-mac-cyrillic",       "x-mac-cyrillic"),
    CharsetAliasEntry("x-mac-ukrainian",      "x-mac-cyrillic"),
    CharsetAliasEntry("x-mac-hebrew",         "x-mac-hebrew"),
    CharsetAliasEntry("x-mac-arabic",         "x-mac-arabic"),
    CharsetAliasEntry("x-mac-farsi",          "x-mac-farsi"),
    CharsetAliasEntry("x-mac-devanagari",     "x-mac-devanagari"),
    CharsetAliasEntry("x-mac-gujarati",       "x-mac-gujarati"),
    CharsetAliasEntry("x-mac-gurmukhi",       "x-mac-gurmukhi"),
    CharsetAliasEntry("armscii-8",            "armscii-8"),
    CharsetAliasEntry("x-viet-tcvn5712",      "x-viet-tcvn5712"),
    CharsetAliasEntry("x-viet-vps",           "x-viet-vps"),
    CharsetAliasEntry("iso-10646-ucs-2",      "UTF-16BE"),
    CharsetAliasEntry("x-iso-10646-ucs-2-be", "UTF-16BE"),
    CharsetAliasEntry("x-iso-10646-ucs-2-le", "UTF-16LE"),
    CharsetAliasEntry("x-user-defined",       "x-user-defined"),
    CharsetAliasEntry("x-johab",              "x-johab"),

    //#
    //# Aliases for ISO-8859-1
    //#
    CharsetAliasEntry("latin1",               "ISO-8859-1"),
    CharsetAliasEntry("iso_8859-1",           "ISO-8859-1"),
    CharsetAliasEntry("iso8859-1",            "ISO-8859-1"),
    CharsetAliasEntry("iso8859-2",            "ISO-8859-2"),
    CharsetAliasEntry("iso8859-3",            "ISO-8859-3"),
    CharsetAliasEntry("iso8859-4",            "ISO-8859-4"),
    CharsetAliasEntry("iso8859-5",            "ISO-8859-5"),
    CharsetAliasEntry("iso8859-6",            "ISO-8859-6"),
    CharsetAliasEntry("iso8859-7",            "ISO-8859-7"),
    CharsetAliasEntry("iso8859-8",            "ISO-8859-8"),
    CharsetAliasEntry("iso8859-9",            "ISO-8859-9"),
    CharsetAliasEntry("iso8859-10",           "ISO-8859-10"),
    CharsetAliasEntry("iso8859-11",           "ISO-8859-11"),
    CharsetAliasEntry("iso8859-13",           "ISO-8859-13"),
    CharsetAliasEntry("iso8859-14",           "ISO-8859-14"),
    CharsetAliasEntry("iso8859-15",           "ISO-8859-15"),
    CharsetAliasEntry("iso_8859-1:1987",      "ISO-8859-1"),
    CharsetAliasEntry("iso-ir-100",           "ISO-8859-1"),
    CharsetAliasEntry("l1",                   "ISO-8859-1"),
    CharsetAliasEntry("ibm819",               "ISO-8859-1"),
    CharsetAliasEntry("cp819",                "ISO-8859-1"),
    CharsetAliasEntry("csisolatin1",          "ISO-8859-1"),

    //#
    //# Aliases for ISO-8859-2
    //#
    CharsetAliasEntry("latin2",               "ISO-8859-2"),
    CharsetAliasEntry("iso_8859-2",           "ISO-8859-2"),
    CharsetAliasEntry("iso_8859-2:1987",      "ISO-8859-2"),
    CharsetAliasEntry("iso-ir-101",           "ISO-8859-2"),
    CharsetAliasEntry("l2",                   "ISO-8859-2"),
    CharsetAliasEntry("csisolatin2",          "ISO-8859-2"),

    //#
    //# Aliases for ISO-8859-3
    //#
    CharsetAliasEntry("latin3",               "ISO-8859-3"),
    CharsetAliasEntry("iso_8859-3",           "ISO-8859-3"),
    CharsetAliasEntry("iso_8859-3:1988",      "ISO-8859-3"),
    CharsetAliasEntry("iso-ir-109",           "ISO-8859-3"),
    CharsetAliasEntry("l3",                   "ISO-8859-3"),
    CharsetAliasEntry("csisolatin3",          "ISO-8859-3"),

    //#
    //# Aliases for ISO-8859-4
    //#
    CharsetAliasEntry("latin4",               "ISO-8859-4"),
    CharsetAliasEntry("iso_8859-4",           "ISO-8859-4"),
    CharsetAliasEntry("iso_8859-4:1988",      "ISO-8859-4"),
    CharsetAliasEntry("iso-ir-110",           "ISO-8859-4"),
    CharsetAliasEntry("l4",                   "ISO-8859-4"),
    CharsetAliasEntry("csisolatin4",          "ISO-8859-4"),

    //#
    //# Aliases for ISO-8859-5
    //#
    CharsetAliasEntry("cyrillic",             "ISO-8859-5"),
    CharsetAliasEntry("iso_8859-5",           "ISO-8859-5"),
    CharsetAliasEntry("iso_8859-5:1988",      "ISO-8859-5"),
    CharsetAliasEntry("iso-ir-144",           "ISO-8859-5"),
    CharsetAliasEntry("csisolatincyrillic",   "ISO-8859-5"),

    //#
    //# Aliases for ISO-8859-6
    //#
    CharsetAliasEntry("arabic",                "ISO-8859-6"),
    CharsetAliasEntry("iso_8859-6",            "ISO-8859-6"),
    CharsetAliasEntry("iso_8859-6:1987",       "ISO-8859-6"),
    CharsetAliasEntry("iso-ir-127",            "ISO-8859-6"),
    CharsetAliasEntry("ecma-114",              "ISO-8859-6"),
    CharsetAliasEntry("asmo-708",              "ISO-8859-6"),
    CharsetAliasEntry("csisolatinarabic",      "ISO-8859-6"),

    //#
    //# Aliases for ISO-8859-6-I
    //#
    CharsetAliasEntry("csiso88596i",           "ISO-8859-6-I"),

    //#
    //# Aliases for ISO-8859-6-E
    //#
    CharsetAliasEntry("csiso88596e",           "ISO-8859-6-E"),

    //#
    //# Aliases for ISO-8859-7
    //#
    CharsetAliasEntry("greek",                 "ISO-8859-7"),
    CharsetAliasEntry("greek8",                "ISO-8859-7"),
    CharsetAliasEntry("sun_eu_greek",          "ISO-8859-7"),
    CharsetAliasEntry("iso_8859-7",            "ISO-8859-7"),
    CharsetAliasEntry("iso_8859-7:1987",       "ISO-8859-7"),
    CharsetAliasEntry("iso-ir-126",            "ISO-8859-7"),
    CharsetAliasEntry("elot_928",              "ISO-8859-7"),
    CharsetAliasEntry("ecma-118",              "ISO-8859-7"),
    CharsetAliasEntry("csisolatingreek",       "ISO-8859-7"),

    //#
    //# Aliases for ISO-8859-8
    //#
    CharsetAliasEntry("hebrew",                "ISO-8859-8"),
    CharsetAliasEntry("iso_8859-8",            "ISO-8859-8"),
    CharsetAliasEntry("visual",                "ISO-8859-8"),
    CharsetAliasEntry("iso_8859-8:1988",       "ISO-8859-8"),
    CharsetAliasEntry("iso-ir-138",            "ISO-8859-8"),
    CharsetAliasEntry("csisolatinhebrew",      "ISO-8859-8"),

    //#
    //# Aliases for ISO-8859-8-I
    //#
    CharsetAliasEntry("csiso88598i",           "ISO-8859-8-I"),
    CharsetAliasEntry("iso-8859-8i",           "ISO-8859-8-I"),
    CharsetAliasEntry("logical",               "ISO-8859-8-I"),

    //#
    //# Aliases for ISO-8859-8-E
    //#
    CharsetAliasEntry("csiso88598e",           "ISO-8859-8-E"),

    //#
    //# Aliases for ISO-8859-9
    //#
    CharsetAliasEntry("latin5",                "ISO-8859-9"),
    CharsetAliasEntry("iso_8859-9",            "ISO-8859-9"),
    CharsetAliasEntry("iso_8859-9:1989",       "ISO-8859-9"),
    CharsetAliasEntry("iso-ir-148",            "ISO-8859-9"),
    CharsetAliasEntry("l5",                    "ISO-8859-9"),
    CharsetAliasEntry("csisolatin5",           "ISO-8859-9"),

    //#
    //# Aliases for UTF-8
    //#
    CharsetAliasEntry("unicode-1-1-utf-8",     "UTF-8"),

    //# nl_langinfo(CODESET) in HP/UX returns 'utf8' under UTF-8 locales
    CharsetAliasEntry("utf8",                  "UTF-8"),

    //#
    //# Aliases for Shift_JIS
    //#
    CharsetAliasEntry("x-sjis",                "Shift_JIS"),
    CharsetAliasEntry("shift-jis",             "Shift_JIS"),
    CharsetAliasEntry("ms_kanji",              "Shift_JIS"),
    CharsetAliasEntry("csshiftjis",            "Shift_JIS"),
    CharsetAliasEntry("windows-31j",           "Shift_JIS"),
    CharsetAliasEntry("cp932",                 "Shift_JIS"),
    CharsetAliasEntry("sjis",                  "Shift_JIS"),

    //#
    //# Aliases for EUC_JP
    //#
    CharsetAliasEntry("cseucpkdfmtjapanese",   "EUC-JP"),
    CharsetAliasEntry("x-euc-jp",              "EUC-JP"),

    //#
    //# Aliases for ISO-2022-JP
    //#
    CharsetAliasEntry("csiso2022jp",           "ISO-2022-JP"),

    //# The following are really not aliases ISO-2022-JP, but sharing the same decoder
    CharsetAliasEntry("iso-2022-jp-2",         "ISO-2022-JP"),
    CharsetAliasEntry("csiso2022jp2",          "ISO-2022-JP"),

    //#
    //# Aliases for Big5
    //#
    CharsetAliasEntry("csbig5",                "Big5"),
    CharsetAliasEntry("cn-big5",               "Big5"),

    //# x-x-big5 is not really a alias for Big5, add it only for MS FrontPage
    CharsetAliasEntry("x-x-big5",              "Big5"),

    //# Sun Solaris
    CharsetAliasEntry("zh_tw-big5",            "Big5"),


    //#
    //# Aliases for EUC-KR
    //#
    CharsetAliasEntry("cseuckr",               "EUC-KR"),
    CharsetAliasEntry("ks_c_5601-1987",        "EUC-KR"),
    CharsetAliasEntry("iso-ir-149",            "EUC-KR"),
    CharsetAliasEntry("cseuckr",               "EUC-KR"),
    CharsetAliasEntry("ks_c_5601",             "EUC-KR"),
    CharsetAliasEntry("ksc_5601",              "EUC-KR"),
    CharsetAliasEntry("ksc5601",               "EUC-KR"),
    CharsetAliasEntry("csksc56011987",         "EUC-KR"),
    CharsetAliasEntry("5601",                  "EUC-KR"),


    //#
    //# Aliases for GB2312
    //#
    //# The following are really not aliases GB2312, add them only for MS FrontPage
    CharsetAliasEntry("gb_2312-80",            "GB2312"),
    CharsetAliasEntry("iso-ir-58",             "GB2312"),
    CharsetAliasEntry("chinese",               "GB2312"),
    CharsetAliasEntry("csiso58gb231280",       "GB2312"),
    CharsetAliasEntry("csgb2312",              "GB2312"),
    CharsetAliasEntry("zh_cn.euc",             "GB2312"),

    //# Sun Solaris
    CharsetAliasEntry("gb_2312",               "GB2312"),

    //#
    //# Aliases for windows-125x 
    //#
    CharsetAliasEntry("x-cp1250",              "windows-1250"),
    CharsetAliasEntry("x-cp1251",              "windows-1251"),
    CharsetAliasEntry("x-cp1252",              "windows-1252"),
    CharsetAliasEntry("x-cp1253",              "windows-1253"),
    CharsetAliasEntry("x-cp1254",              "windows-1254"),
    CharsetAliasEntry("x-cp1255",              "windows-1255"),
    CharsetAliasEntry("x-cp1256",              "windows-1256"),
    CharsetAliasEntry("x-cp1257",              "windows-1257"),
    CharsetAliasEntry("x-cp1258",              "windows-1258"),

    //#
    //# Aliases for windows-874 
    //#
    CharsetAliasEntry("windows-874",           "windows-874"),
    CharsetAliasEntry("ibm874",                "windows-874"),
    CharsetAliasEntry("dos-874",               "windows-874"),

    //#
    //# Aliases for macintosh
    //#
    CharsetAliasEntry("macintosh",             "macintosh"),
    CharsetAliasEntry("x-mac-roman",           "macintosh"),
    CharsetAliasEntry("mac",                   "macintosh"),
    CharsetAliasEntry("csmacintosh",           "macintosh"),

    //#
    //# Aliases for IBM866
    //#
    CharsetAliasEntry("cp866",                 "IBM866"),
    CharsetAliasEntry("cp-866",                "IBM866"),
    CharsetAliasEntry("866",                   "IBM866"),
    CharsetAliasEntry("csibm866",              "IBM866"),

    //#
    //# Aliases for IBM850
    //#
    CharsetAliasEntry("cp850",                 "IBM850"),
    CharsetAliasEntry("850",                   "IBM850"),
    CharsetAliasEntry("csibm850",              "IBM850"),

    //#
    //# Aliases for IBM852
    //#
    CharsetAliasEntry("cp852",                 "IBM852"),
    CharsetAliasEntry("852",                   "IBM852"),
    CharsetAliasEntry("csibm852",              "IBM852"),

    //#
    //# Aliases for IBM855
    //#
    CharsetAliasEntry("cp855",                 "IBM855"),
    CharsetAliasEntry("855",                   "IBM855"),
    CharsetAliasEntry("csibm855",              "IBM855"),

    //#
    //# Aliases for IBM857
    //#
    CharsetAliasEntry("cp857",                 "IBM857"),
    CharsetAliasEntry("857",                   "IBM857"),
    CharsetAliasEntry("csibm857",              "IBM857"),

    //#
    //# Aliases for IBM862
    //#
    CharsetAliasEntry("cp862",                 "IBM862"),
    CharsetAliasEntry("862",                   "IBM862"),
    CharsetAliasEntry("csibm862",              "IBM862"),

    //#
    //# Aliases for IBM864
    //#
    CharsetAliasEntry("cp864",                 "IBM864"),
    CharsetAliasEntry("864",                   "IBM864"),
    CharsetAliasEntry("csibm864",              "IBM864"),
    CharsetAliasEntry("ibm-864",               "IBM864"),

    //#
    //# Aliases for T.61-8bit
    //#
    CharsetAliasEntry("t.61",                  "T.61-8bit"),
    CharsetAliasEntry("iso-ir-103",            "T.61-8bit"),
    CharsetAliasEntry("csiso103t618bit",       "T.61-8bit"),

    //#
    //# Aliases for UTF-7
    //#
    CharsetAliasEntry("x-unicode-2-0-utf-7",   "UTF-7"),
    CharsetAliasEntry("unicode-2-0-utf-7",     "UTF-7"),
    CharsetAliasEntry("unicode-1-1-utf-7",     "UTF-7"),
    CharsetAliasEntry("csunicode11utf7",       "UTF-7"),

    //#
    //# Aliases for ISO-10646-UCS-2
    //#
    CharsetAliasEntry("csunicode",                "UTF-16BE"),
    CharsetAliasEntry("csunicode11",              "UTF-16BE"),
    CharsetAliasEntry("iso-10646-ucs-basic",      "UTF-16BE"),
    CharsetAliasEntry("csunicodeascii",           "UTF-16BE"),
    CharsetAliasEntry("iso-10646-unicode-latin1", "UTF-16BE"),
    CharsetAliasEntry("csunicodelatin1",          "UTF-16BE"),
    CharsetAliasEntry("iso-10646",                "UTF-16BE"),
    CharsetAliasEntry("iso-10646-j-1",            "UTF-16BE"),

    //#
    //# Aliases for ISO-8859-10
    //#
    CharsetAliasEntry("latin6",                   "ISO-8859-10"),
    CharsetAliasEntry("iso-ir-157",               "ISO-8859-10"),
    CharsetAliasEntry("l6",                       "ISO-8859-10"),

    //# Currently .properties cannot handle : in key
    //#iso_8859-10:1992","ISO-8859-10
    CharsetAliasEntry("csisolatin6",              "ISO-8859-10"),

    //#
    //# Aliases for ISO-8859-15
    //#
    CharsetAliasEntry("iso_8859-15",              "ISO-8859-15"),
    CharsetAliasEntry("csisolatin9",              "ISO-8859-15"),
    CharsetAliasEntry("l9",                       "ISO-8859-15"),

    //#
    //# Aliases for ISO-IR-111
    //#
    CharsetAliasEntry("ecma-cyrillic",            "ISO-IR-111"),
    CharsetAliasEntry("csiso111ecmacyrillic",     "ISO-IR-111"),

    //#
    //# Aliases for ISO-2022-KR
    //#
    CharsetAliasEntry("csiso2022kr",              "ISO-2022-KR"),

    //#
    //# Aliases for VISCII
    //#
    CharsetAliasEntry("csviscii",                 "VISCII"),

    //#
    //# Aliases for x-euc-tw
    //#
    CharsetAliasEntry("zh_tw-euc",                "x-euc-tw"),

    //#
    //# Following names appears in unix nl_langinfo(CODESET)
    //# They can be compiled as platform specific if necessary
    //# DONT put things here if it does not look generic enough (like hp15CN)
    //#
    CharsetAliasEntry("iso88591",                 "ISO-8859-1"),
    CharsetAliasEntry("iso88592",                 "ISO-8859-2"),
    CharsetAliasEntry("iso88593",                 "ISO-8859-3"),
    CharsetAliasEntry("iso88594",                 "ISO-8859-4"),
    CharsetAliasEntry("iso88595",                 "ISO-8859-5"),
    CharsetAliasEntry("iso88596",                 "ISO-8859-6"),
    CharsetAliasEntry("iso88597",                 "ISO-8859-7"),
    CharsetAliasEntry("iso88598",                 "ISO-8859-8"),
    CharsetAliasEntry("iso88599",                 "ISO-8859-9"),
    CharsetAliasEntry("iso885910",                "ISO-8859-10"),
    CharsetAliasEntry("iso885911",                "ISO-8859-11"),
    CharsetAliasEntry("iso885912",                "ISO-8859-12"),
    CharsetAliasEntry("iso885914",                "ISO-8859-14"),
    CharsetAliasEntry("iso885913",                "ISO-8859-13"),
    CharsetAliasEntry("iso885915",                "ISO-8859-15"),
    //#
    CharsetAliasEntry("tis620",                   "TIS-620"),
    //#
    CharsetAliasEntry("cp1250",                   "windows-1250"),
    CharsetAliasEntry("cp1251",                   "windows-1251"),
    CharsetAliasEntry("cp1252",                   "windows-1252"),
    CharsetAliasEntry("cp1253",                   "windows-1253"),
    CharsetAliasEntry("cp1254",                   "windows-1254"),
    CharsetAliasEntry("cp1255",                   "windows-1255"),
    CharsetAliasEntry("cp1256",                   "windows-1256"),
    CharsetAliasEntry("cp1257",                   "windows-1257"),
    CharsetAliasEntry("cp1258",                   "windows-1258"),

    CharsetAliasEntry("x-gbk",                    "gbk"),
    CharsetAliasEntry("windows-936",              "gbk"),
    CharsetAliasEntry("ansi-1251",                "windows-1251"),

};                                

void charset::setAliasCharset()
{
    const string cset = utility::stringUtils::toLower(m_name);

    for (unsigned int i = 0 ; i < (sizeof(g_charsetAliasMap) / sizeof(g_charsetAliasMap[0])) - 1 ; ++i)
    {
        if (cset.find(g_charsetAliasMap[i].charset) != string::npos)
        {
            m_name = g_charsetAliasMap[i].alias;
            break;
        }
    }
}

And call setAliasCharset replaced utf-7 check:

charset::charset(const string& name)
	: m_name(name)
{
    setAliasCharset();
}

void charset::parseImpl
(const parsingContext& /* ctx */, const string& buffer, const size_t position,
 const size_t end, size_t* newPosition)
 {
m_name = utility::stringUtils::trim
	(string(buffer.begin() + position, buffer.begin() + end));

    setAliasCharset();

setParsedBounds(position, end);

if (newPosition)
	*newPosition = end;
 }

charset.hpp:

 private:
    void setAliasCharset();

   string m_name;

I hope you find it useful...

@vincent-richard
Copy link
Member

Hello! I need to check license issues caused by incorporating a MPL-covered file (or any other file, or even data coming from these files) into VMime (which is dual-licensed, including GPL).

@jengelh
Copy link
Contributor

jengelh commented Jun 10, 2019

No code import is needed methinks.
The IANA mapping list is https://www.iana.org/assignments/character-sets/character-sets.xhtml from which alias calls can be derived/written by oneself.

@jstedfast
Copy link

FWIW, I can confirm that this maps to EUC-KR. I've had this mapping in use by all of the MIME libraries I've written over the past 2 decades (Evolution, MimeKit, and GMime.

Hope that helps.

BTW, I always recommend VMime to anyone who asks me about MIME libraries for c++ :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants