Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate whether we still need to compile with /source-charset:utf-8 /execution-charset:us-ascii on Windows #2345

Open
wantehchang opened this issue Jul 31, 2024 · 5 comments
Assignees

Comments

@wantehchang
Copy link
Collaborator

wantehchang commented Jul 31, 2024

@tongyuantongyu Yuan: I need your review of this issue.

You added the MSVC compiler flags /source-charset:utf-8 /execution-charset:us-ascii to libavif. I am wondering if we should still compile libavif with these flags. There are three reasons that make me reconsider this issue:

  1. While adding clang-cl compilation support, I found that clang-cl does not support /execution-charset:us-ascii.
  2. The vcpkg libavif package removes these flags.
  3. avifenc.exe and avifdec.exe are now configured to use the UTF-8 code pages.

It seems that the third reason alone should make it unnecessary to compile libavif with /execution-charset:us-ascii. I am not familiar with internationalization issues on Windows, so I would appreciate your expert opinion. Thanks!

@wantehchang wantehchang self-assigned this Jul 31, 2024
@tongyuantongyu
Copy link
Contributor

This flag was added to ensure we don't use non-ASCII characters in strings in the source code. Here are the two reasons:

  • Some terminals may not be able to render non-ASCII characters, or render it correctly, potentially creating confusion to the user.
  • Some programmatic way of calling subprocesses may assume stdout is using the active codepage and fail to decode.

So the flag is more like a canary, than to solve some Windows-specific issue.

@wantehchang
Copy link
Collaborator Author

Yuan: Thank you for the reply. Using non-ASCII characters in strings in the source code is very unlikely. The only reasons I can think of are:

  • (A programming error) copy and paste texts from a PDF or rich text file that has non-ASCII quotation marks or dashes.
  • Personal names (especially the European Latin character sets).
  • The copyright, trademark, or register trademark symbols.

All of these reasons are unlikely. So I will probably just remove these flags or try to improve the comments for them.

I am afraid that none of the libavif maintainers understand the purpose of these flags, so after the recent rework of the build system, the flags are only used when compiling the source files in the avif and avif_apps libraries. It is a maintenance burden.

@wantehchang
Copy link
Collaborator Author

wantehchang commented Aug 3, 2024

Yuan: I did some experiments on my Windows laptop. I understand how the compiler options work now. I think they can be used as compiler warnings, although /execution-charset:us-ascii does convert non-ASCII characters to question marks. But the details are quite complicated, and I am not sure if I will still remember the details in a few months.

Surprisingly, it is possible to bypass /execution-charset:us-ascii by using octal or hex escape sequence, e.g.,

      "San Jos\xC3\xA9",  // San José

I also experienced the incorrect rendering of non-ASCII letters in Command Prompt unless I followed the procedure in a Stack Overflow article to "use Unicode UTF-8 for worldwide language support."

@wantehchang
Copy link
Collaborator Author

Here is an example of the compiler warning:

C:\src\libavif\src\avif.c(1065): warning C4566: character represented by universal-character-name '\u00E9' cannot be represented in the current code page (20127)

@tongyuantongyu
Copy link
Contributor

I am afraid that none of the libavif maintainers understand the purpose of these flags

We may use the /utf-8 flag which aligns with other platforms, supported by and is the default of clang-cl, and is more intuitive.

Since libavif is using manifest to request UTF-8 Console, the only downside would be no longer able to ensure we never print non-ASCII character unless explicit user request. Based on your analysis this is quite unlikely to happen.

wantehchang added a commit to wantehchang/libavif that referenced this issue Aug 5, 2024
Use the MSVC /source-charset:utf-8 /execution-charset:us-ascii options
only when warnings are treated as errors (/WX) so that we never
substitute a question mark (?) for a non-ASCII character in string or
character literals.

Update the comments for /source-charset:utf-8
/execution-charset:us-ascii basec on improved understranding of these
options.

Related to AOMediaCodec#2345.
wantehchang added a commit to wantehchang/libavif that referenced this issue Aug 5, 2024
Use the MSVC /source-charset:utf-8 /execution-charset:us-ascii options
only when warnings are treated as errors (/WX) so that we never
substitute a question mark (?) for a non-ASCII character in string or
character literals.

Update the comments for /source-charset:utf-8
/execution-charset:us-ascii basec on improved understranding of these
options.

Related to AOMediaCodec#2345.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants