Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI CLI json output is not unicode as required by spec #871

Open
triatic opened this issue Nov 12, 2024 · 16 comments
Open

OCI CLI json output is not unicode as required by spec #871

triatic opened this issue Nov 12, 2024 · 16 comments

Comments

@triatic
Copy link

triatic commented Nov 12, 2024

When executing commands such as oci compute instance list, json_decode() in php can fail when decoding the json output. This is because the json output can contain non-ascii characters, and it is not unicode as required by specification.

OCI version 3.50.0 (msi package)
Windows 10 version 10.0.19045.5011

@NupurGupta3101
Copy link
Member

CLI outputs do not generate non unicode characters, if you have any such example please share, we will investigate.

@triatic
Copy link
Author

triatic commented Nov 12, 2024

ASCII encoding from oci compute instance list. Note, this json output contained non-ascii characters which were not unicode.

C:\>php -r "var_dump(mb_detect_encoding(shell_exec('oci compute instance list --compartment-id ocid1.tenancy.oc1..removed')));"
string(5) "ASCII"

@adizohar
Copy link
Member

Can you please share the output of the oci commands or the start of it shows data.., is there any errors or warning ?

{
  "data": {
    "items": [
      {

Which python version do you use ?

@triatic
Copy link
Author

triatic commented Nov 12, 2024

I'm using the newest Windows oci msi package downloaded from Github, which bundles Python. The json is formatted correctly, other than the non unicode characters.

The line that breaks things is this:

"processor-description": "3.0 GHz Ampere® Altra™",

Start of output:

C:\>oci compute instance list --compartment-id ocid1.tenancy.oc1..removed
{
  "data": [
    {
      "agent-config": {
... etc

@adizohar
Copy link
Member

I asked Python version :)
I tried to run and didn't see any non ascii, I will wait for OCI CLI team to respond

@triatic
Copy link
Author

triatic commented Nov 12, 2024

I asked Python version :)

Whatever the MSI package installs? I can see python38.dll in the installation directory, and I do not have Python globally installed in Windows.

@adizohar
Copy link
Member

Thank you for that

@triatic
Copy link
Author

triatic commented Nov 12, 2024

I tried to run and didn't see any non ascii

"3.0 GHz Ampere® Altra™" contains non ASCII characters, the ® and ™ characters. The problem for me is that they are also not produced in unicode by oci as required by json spec.

@adizohar
Copy link
Member

Understood, it is the processor type, Nupur, please take it with OCI CLI team
"processor-description": "3.0 GHz Ampere® Altra™"

@triatic
Copy link
Author

triatic commented Nov 12, 2024

@adizohar just to clarify, are you are saying only ASCII characters should be returned by oci's json output, and the expected fix is to remove ® and ™ from the json output?

@adizohar
Copy link
Member

No, I don't believe this is a bug or an issue that needs to be fixed. I have asked the OCI CLI team to take a look. In the meantime, you can filter out the non-ASCII characters before ingesting the JSON, or use the OCI Python SDK to read and handle these characters.

@triatic
Copy link
Author

triatic commented Nov 12, 2024

Ok. At the moment I am converting oci's output from ASCII to UTF-8 where the ® and ™ characters are present, which prevents json_decode() from failing.

@NupurGupta3101
Copy link
Member

NupurGupta3101 commented Nov 13, 2024

According to https://thesmsworks.co.uk/unicode-detector ® and ™ are unicode characters.

@triatic
Copy link
Author

triatic commented Nov 13, 2024

According to https://thesmsworks.co.uk/unicode-detector ® and ™ are unicode characters.

They can be encoded in unicode. But OCI CLI encodes them in Windows-1252 which is not valid for json: https://en.wikipedia.org/wiki/Windows-1252

@NupurGupta3101
Copy link
Member

Can you please share the output recieved (without any further parsing) from oci-cli when you trigger this command (or via a script). It will be more clear then.

@triatic
Copy link
Author

triatic commented Nov 14, 2024

Are you happy for me to edit out unique identifiers from the output?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants