Missing documentation of `UTF8PROC_DECOMPOSE`, `UTF8PROC_COMPOSE` flags in `utf8proc_decompose_char`

Based on the actual `utf8proc_NFKC` implementation, I tried with success to write a NFKC normalization C++ function that operates directly on UTF32 code points:

```cpp
bool tryNormalizeNFKC(const vector<char32_t>& codePoints, vector<char32_t>& normalized)
{
    normalized.clear();
    normalized.reserve(codePoints.size());

    char32_t buff[8];
    utf8proc_ssize_t rc;
    int lastBoundClass;
    for (size_t i = 0; i < codePoints.size(); i++)
    {
        // NOTE: UTF8PROC_DECOMPOSE is undocumented for utf8proc_decompose_char but it's necessary
        rc = utf8proc_decompose_char(codePoints[i], (utf8proc_int32_t*)buff, std::size(buff),
            (utf8proc_option_t)(UTF8PROC_DECOMPOSE | UTF8PROC_COMPAT), &lastBoundClass);
        if (rc < 0 || rc > std::size(buff))
            goto Fail;

        normalized.insert(normalized.end(), buff, buff + rc);
    }

    rc = utf8proc_normalize_utf32((utf8proc_int32_t*)normalized.data(),
        (utf8proc_ssize_t)normalized.size(), (utf8proc_option_t)(UTF8PROC_COMPOSE | UTF8PROC_STABLE));

    if (rc < 0)
        goto Fail;

    normalized.resize((size_t)rc);
    return true;

Fail:
    normalized.clear();
    return false;
}
```

This is more convenient for me to use instead of `utf8proc_NFKC`, since I already have the vector of `char32_t` codepoints, which I also need to further postprocess after the normalization. The only problem I found is that `UTF8PROC_DECOMPOSE` or `UTF8PROC_COMPOSE` are not documented as accepted flags in `utf8proc_decompose_char`, but either one of two is necessary to perform the desired transformation. Considering that the function has 'decompose' in the name that is even more confusing (I got it working just with try and guess and a bit of luck).

If you bother also clarifying a couple of other things:
- What's the maximum size I need `utf8proc_decompose_char` for the `dst` buffer (I guess that there exists a static max value)?
- I noticed `UTF8PROC_STABLE` may currently be unused in the code utf8proc code, correct?





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing documentation of `UTF8PROC_DECOMPOSE`, `UTF8PROC_COMPOSE` flags in `utf8proc_decompose_char` #290

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing documentation of UTF8PROC_DECOMPOSE, UTF8PROC_COMPOSE flags in utf8proc_decompose_char #290

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Missing documentation of `UTF8PROC_DECOMPOSE`, `UTF8PROC_COMPOSE` flags in `utf8proc_decompose_char` #290