You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, UTF-8 is the only supported encoding for guardonce. UTF-16 and UTF-32 files cannot be processed. I began to improve the handling of those files in #22 by ensuring they were flagged as a problem or ignored. However, it's possible to do better.
There's are a few heuristics that can be used to guess that a file is UTF-16 or UTF-32. The simplest is to check if the first few bytes of the file match a BOM. It's extremely unlikely that any header file in another encoding would start with the same bytes as a UTF BOM, so this seems sufficient for our case.
The encoding used for reading should also be used for writing when processing the file in place. However, I'm less sure about the correct behaviour for printing the new file to stdout. I suspect that I should just use the same encoding there too, though perhaps I should use the output stream's desired encoding if it is known.
The text was updated successfully, but these errors were encountered:
Currently, UTF-8 is the only supported encoding for guardonce. UTF-16 and UTF-32 files cannot be processed. I began to improve the handling of those files in #22 by ensuring they were flagged as a problem or ignored. However, it's possible to do better.
There's are a few heuristics that can be used to guess that a file is UTF-16 or UTF-32. The simplest is to check if the first few bytes of the file match a BOM. It's extremely unlikely that any header file in another encoding would start with the same bytes as a UTF BOM, so this seems sufficient for our case.
The encoding used for reading should also be used for writing when processing the file in place. However, I'm less sure about the correct behaviour for printing the new file to stdout. I suspect that I should just use the same encoding there too, though perhaps I should use the output stream's desired encoding if it is known.
The text was updated successfully, but these errors were encountered: