Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use /execution-charset:us-ascii only with /WX #2368

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wantehchang
Copy link
Collaborator

Use the MSVC /source-charset:utf-8 /execution-charset:us-ascii options only when warnings are treated as errors (/WX) so that we never substitute a question mark (?) for a non-ASCII character in string or character literals.

Update the comments for /source-charset:utf-8
/execution-charset:us-ascii basec on improved understranding of these options.

Related to #2345.

Use the MSVC /source-charset:utf-8 /execution-charset:us-ascii options
only when warnings are treated as errors (/WX) so that we never
substitute a question mark (?) for a non-ASCII character in string or
character literals.

Update the comments for /source-charset:utf-8
/execution-charset:us-ascii basec on improved understranding of these
options.

Related to AOMediaCodec#2345.
@wantehchang
Copy link
Collaborator Author

@tongyuantongyu Yuan: Please do a preliminary review of this pull request. Because of the substitution of a question mark (?) for a non-ASCII character in string or character literals, my first thought is that these options should only be used when warnings are treated as errors. So that's what this pull request does.

You also made a good point that for compatibility with clang-cl and with non-Windows platforms, we should use the /utf-8 option instead. We can do that next, but it's good to give this a try. However, if you or my coworkers don't understand my comment in this pull request, then I will switch to the /utf-8 option directly.

@tongyuantongyu
Copy link
Contributor

I prefer to always add /source-charset:utf-8 whether or not /WX is used. /source-charset:utf-8 is more to "tell MSVC to treat source code in UTF-8" than "validate that the source code contains only characters that can be represented in UTF-8", otherwise MSVC will default to use ANSI code page, decoding source code wrongly, and emit C4819 when a decode error is encounted.

My intention to add /source-charset:utf-8 was to make MSVC decode source code correctly. My intention to add /execution-charset:us-ascii was to ensure we don't force user to deal with i18n and character encodings when using CLI. If you agree with my intention, let's add these to the comments; or we need to recheck whether we really want the two flags.

Your new comments explained many possible misunderstandings a user or new maintainer may have. Can you move the explanations to a new paragraph, so the first paragraph explains what we want to do and what the flags do, and the second paragraph explains more details (like "check the strings you added" when seeing C4566) and resolves misunderstandings (like it won't interfere escape sequences and won't prevent printing user provided UTF-8 strings)?

@wantehchang wantehchang marked this pull request as draft August 6, 2024 16:04
@wantehchang
Copy link
Collaborator Author

Yuan: Thank you very much for your comment. I will try to revise the comments based on your suggestions. However, I am now wondering if we should just switch to /utf-8 or simply delete these options.

To fully understand these options, I had to read three pages of Microsoft documentation (for /source-charset, /execution-charset, and /validate-charset), and it is necessary to understand four terms used in the C standard (the source charset, the basic source charset, the extended source charset, and the execution charset) and the phases of compilation referred to in the definitions. I also had to resolve ambiguity in the documentation by experiments.

I think it is the /execution-charset:us-ascii option that raises concerns and questions about UTF-8 support. On the other hand, the /utf-8 option is unlikely to raise questions, so we may not even need to document it. (People want UTF-8 support, so they are unlikely to wonder what the /utf-8 option is for.)

I will look into why the vcpkg libavif package removed the /source-charset:utf-8 /execution-charset:us-ascii options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants