-
Notifications
You must be signed in to change notification settings - Fork 848
GHC needs to output UTF-8 data on Windows #738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Shouldn't GHC just call hSetEncoding stdout utf8? That's what I do in Hoogle and it seems to work fine. If you can't get GHC to change setting the code page is one alternative. Writing to a temp file then re encoding to the console might be nicer since changing code page could have other effects later on. |
I definitely intend to open up a GHC bug report referring to this issue. Nonetheless, we'll need to come up with a stop-gap solution for the current situation. Writing to a temp file won't actually help, because the issue is that GHC is deciding what it should be generating based on this global-ish setting. One possible enhancement to this branch I can think of is to |
Can't you write to the temp file with a binary handle? Changing code page is a bit dangerous. Running a process in a finally seems a bit dodgy. It's ok if there is nothing better, but yuk. Also, Cygwin sometimes screws up the code page (1 in 10,000 we see at work). |
Yes, I agree it's terrible. But GHC is being run as an external process, so whether we open up a binary handle or not is irrelevant. The specific case here is we're running |
I guess not much choice then! |
Sigh. I was hoping you had some Win32 magic hidden away that would save us. Ideally, we'd be able to do something like create a new anonymous pipe and just set the code page on that new pipe. But from everything I've read, that's not the way Windows works. |
I think GHC should probably always talk UTF8 and ignore the code page. The default windows approach is somewhat broken (being a remnant of history rather than a modern design - it's fixed in Windows CE). |
I'm in favor of that approach on all operating systems. The default of respecting either some semi-obscure environment variables (non-Windows) or some truly obtuse codepages (Windows) has led to lots of bugs in my experience. I'm sure this proposal will go over very well :/ |
GHC already assumed all input is UTF8 anyway, so this seems consistent. I would be in favour of GHC using code page eye to decide between normal quotes and "smart quotes" which always corrupt on Windows anyway. |
To be honest, I've never liked the smart quotes, even when rendered as intended :) |
GHC issue created: https://ghc.haskell.org/trac/ghc/ticket/10762 |
The lack of Hebrew looks like a problem with the font rendering (no characters for Hebrew in that font and no fallbacking like Pango), rather than encoding, I expect. (In a world where things are logical, anyway. And this is Windows.) |
Set code page to UTF-8 (65001) on Windows #738
PR merged, closing |
In particular, this affects #734, where we want GHC to dump .hi file contents. This operation will fail on Windows if the file contains characters not supported by the code page. Ideally, we'd like to be able to tell GHC to simply output in UTF-8, but there does not seem to be any way to do that besides going through the standard code page auto-discovery on Windows, which will necessitate stack setting the code page for the entire console to 65001. I've already experimented with this extensively, and I think it's the right behavior, but I'm writing up an issue and then a pull request for posterity, and so that others can test before this is merged to master.
Pinging @borsboom due to #734. @ndmitchell as our resident Windows expert, any thoughts on this? Also pinging @fumieval, as this looks somewhat related to #422.
The text was updated successfully, but these errors were encountered: