Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO version of sqlcmd does not parse ANSI text files correctly #494

Open
robertwmcnulty opened this issue Dec 21, 2023 · 2 comments
Open
Labels
bug Something isn't working Localization
Milestone

Comments

@robertwmcnulty
Copy link

If a sql text file is encoded as ANSI (as opposed to UTF-8 or similar) the newer Go version of sqlcmd will not correctly parse non-ASCII characters.

For example, if a file contains non-breaking spaces (character 160), which in T-SQL is generally treated identically to a normal space. In ANSI Windows-1252, this is encoded as a single-byte hex A0.

The Go version of sqlcmd appears to assume all files are UTF encoded, for it treats such a character as unknown and replaces it with unicode character 65533, which would be consistent with assuming UTF-8 encoded, for the single byte A0 is not valid UTF-8.

The attached file is a simple example txt file encoded using the Windows notepad as ANSI, containing "SELECT{Non-breaking-space}CURRENT_TIMESTAMP"

testfile.txt

It can be run in sqlcmd with a command like:
sqlcmd -i testfile.txt

The original ODBC version of sqlcmd has no problem running the above file, returning the expected timestamp.

The GO version however fails:
"Could not find stored procedure 'SELECT�CURRENT_TIMESTAMP'."

The behavior of the GO sqlcmd should either match the ODBC behavior, or this should be documented as one of the "Breaking changes from sqlcmd (ODBC)" that ANSI-encoded text files are not supported.

@shueybubbles
Copy link
Collaborator

thx for opening the issue. This is related to #111
ODBC SqlCmd treats non-Unicode/non-UTF8 files as "system code page encoded" and converts them to UTF16 on read using the Win32 API MultiByteToWideChar, at least on Windows. I am not sure what their Linux version does.
There's not much support in the Go dev community for code pages and we encourage folks who develop cloud-first applications that run on Linux etc to use UTF8 or UTF16 encoded files instead of relying on ambient properties like the system code page.

I do want to support the code page conversions but we just haven't had the time to do the work yet. I will update the README appropriately.

@shueybubbles
Copy link
Collaborator

this content is relevant for ODBC SqlCmd on Linux and may guide our implementation.
I don't know offhand what the Go method to detect "current locale" is.

https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/programming-guidelines?view=sql-server-ver16#character-set-support

@dlevy-msft dlevy-msft added bug Something isn't working Localization labels Jan 3, 2024
@dlevy-msft dlevy-msft added this to the Backlog milestone Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Localization
Projects
None yet
Development

No branches or pull requests

3 participants