Skip to content

Commit

Permalink
Update the Unicode support FAQ documentation (#256)
Browse files Browse the repository at this point in the history
* Clarify when ConsoleAppender uses locale

* Document ConsoleAppender behaviour

* Silence SonarCloud
  • Loading branch information
swebb2066 authored Aug 16, 2023
1 parent b60da4f commit d482258
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 21 deletions.
2 changes: 1 addition & 1 deletion src/main/cpp/charsetdecoder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,7 @@ class LocaleCharsetDecoder : public CharsetDecoder
// Decode characters that may be represented by multiple bytes
while (0 < remain)
{
wchar_t ch;
wchar_t ch = 0;
size_t n = std::mbrtowc(&ch, p, remain, &this->state);
if (0 == n) // NULL encountered?
{
Expand Down
10 changes: 8 additions & 2 deletions src/main/include/log4cxx/consoleappender.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,14 @@ namespace log4cxx

/**
* ConsoleAppender appends log events to <code>stdout</code> or
* <code>stderr</code> using a layout specified by the user. The
* default target is <code>stdout</code>.
* <code>stderr</code> using a layout specified by the user.
*
* The default target is <code>stdout</code>.
*
* You can use <a href="https://en.cppreference.com/w/c/io/fwide">fwide(stdout, 1)</a> in your configuration code
* or use the cmake directive `LOG4CXX_FORCE_WIDE_CONSOLE=ON` when building Log4cxx
* to force Log4cxx to use <a href="https://en.cppreference.com/w/c/io/fputws">fputws</a>.
* If doing this ensure the cmake directive `LOG4CXX_WCHAR_T` is also enabled.
*/
class LOG4CXX_EXPORT ConsoleAppender : public WriterAppender
{
Expand Down
10 changes: 10 additions & 0 deletions src/main/include/log4cxx/net/telnetappender.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,17 @@ class LOG4CXX_EXPORT TelnetAppender : public AppenderSkeleton
return true;
}

/**
The current encoding value.
\sa setOption
*/
LogString getEncoding() const;
/**
Set the encoding to \c value.
\sa setOption
*/
void setEncoding(const LogString& value);


Expand Down
43 changes: 25 additions & 18 deletions src/site/markdown/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,32 @@ DLL" with release builds of Log4cxx and "Multithread DLL Debug" with debug build
Yes. Apache Log4cxx exposes API methods in multiple string flavors supporting differently encoded
textual content, like `char*`, `std::string`, `wchar_t*`, `std::wstring`, `CFStringRef` et al. All
provided texts will be converted to the `LogString` type before further processing, which is one of
several supported Unicode representations selected by the `LOG4CXX_CHAR` cmake option. If methods are
several supported internal representations and is selected by the `LOG4CXX_CHAR` cmake option. If methods are
used that take `LogString` as arguments, the macro `LOG4CXX_STR()` can be used to convert literals
to the current `LogString` type. FileAppenders support an encoding property as well, which should be
explicitly specified to `UTF-8` or `UTF-16` for e.g. XML files. The important point is to get the
chain of input, internal processing and output correct and that might need some additional setup in
the app using Log4cxx:
to the current `LogString` type.

According to the [libc documentation](https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html),
The default external representation is controlled by the `LOG4CXX_CHARSET` cmake option.
This default is used to encode a multi-byte characters
unless an `Encoding` property is explicitly configured
for the log4cxx::FileAppender specialization you use.
Note you should use `UTF-8` or `UTF-16` encoding when writing XML or JSON layouts.
Log4cxx also implements character set encodings for `US-ASCII` (`ISO646-US` or `ANSI_X3.4-1968`)
and `ISO-8859-1` (`ISO-LATIN-1` or `CP1252`).
You are highly encouraged to stick to `UTF-8` for the best support from tools and operating systems.

The `locale` character set encoding provides support beyond the above internally implemented options.
It allows you to use any multi-byte encoding provided by the standard library.
If using the `locale` character set encoding or
you use `fwide` to make `stdout` or `stderr` wide-oriented (log4cxx::ConsoleAppender then uses `fputws`)
you will need to explicitly configure the system locale at startup,
for example by using:

```
std::setlocale( LC_ALL, "" ); /* Set user-preferred locale for C functions */
std::locale::global(std::locale("")); /* Set user-preferred locale for C++ functions */
```

This is necessary because, according to the [libc documentation](https://www.gnu.org/software/libc/manual/html_node/Setting-the-Locale.html),
all programs start in the `C` locale by default, which is the [same as ANSI_X3.4-1968](https://stackoverflow.com/questions/48743106/whats-ansi-x3-4-1968-encoding)
and what's commonly known as the encoding `US-ASCII`. That encoding supports a very limited set of
characters only, so inputting Unicode with that encoding in effect to output characters can't work
Expand All @@ -70,15 +88,4 @@ loggername - ?????????? ???? ??????????????
```

The important thing to understand is that this is some always applied, backwards compatible default
behaviour and even the case when the current environment sets a locale like `en_US.UTF-8`. One might
need to explicitly tell the app at startup to use the locale of the environment and make things
compatible with Unicode this way. See also [some SO post](https://stackoverflow.com/questions/571359/how-do-i-set-the-proper-initial-locale-for-a-c-program-on-windows)
on setting the default locale in C++.

```
std::setlocale( LC_ALL, "" ); /* Set locale for C functions */
std::locale::global(std::locale("")); /* set locale for C++ functions */
```

See [LOGCXX-483](https://issues.apache.org/jira/browse/LOGCXX-483) or [GHPR #31](https://github.com/apache/logging-log4cxx/pull/31#issuecomment-668870727)
for additional details.
behaviour and even the case when the current environment sets a locale like `en_US.UTF-8`.

0 comments on commit d482258

Please sign in to comment.