Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thai Language Formatting Issue #3463

Open
mahirocoko opened this issue Jul 18, 2024 · 6 comments
Open

Thai Language Formatting Issue #3463

mahirocoko opened this issue Jul 18, 2024 · 6 comments
Labels
A-CLI Area: CLI A-Formatter Area: formatter S-Bug-confirmed Status: report has been confirmed as a valid bug

Comments

@mahirocoko
Copy link

mahirocoko commented Jul 18, 2024

I am experiencing an issue with the Biome-Zed plugin where Thai language text is not being formatted correctly. Specifically, the word "หน้าหลัก" is being changed to "หนาหลก". This is causing readability issues for Thai users.

Steps to Reproduce

  • Type "หน้าหลัก".
  • After formatting, the text changes to "หนาหลก".

Expected Behavior
The text "หน้าหลัก" should remain as it is and not be altered to "หนาหลก".

Actual Behavior
The text "หน้าหลัก" is incorrectly formatted to "หนาหลก".

Environment
Biome-Zed Version: [0.143.7]
Operating System: [macOS with M1 chip]

My Setting

{
  "format_on_save": "on",
  "lsp": {
    "biome": {
      "settings": {
        "require_config_file": true
      }
    }
  },
  "formatter": {
    "external": {
      "command": "./node_modules/@biomejs/biome/bin/biome",
      "arguments": ["format", "--write", "--stdin-file-path", "{buffer_path}"]
    }
  },
  "code_actions_on_format": {
    "source.fixAll.biome": true,
    "source.organizeImports.biome": true
  }
}
349785754-be501ca7-b15c-480b-91b4-ae9719f4b9a4.mov
@nhedger
Copy link
Member

nhedger commented Jul 18, 2024

We've had a similar report for the IntelliJ extension and suspect it may be related to output colorization on the CLI.

Would you mind passing --colors=off to your formatting command to see if it fixes the issue for you ?

@mahirocoko
Copy link
Author

mahirocoko commented Jul 18, 2024

We've had a similar report for the IntelliJ extension and suspect it may be related to output colorization on the CLI.

Would you mind passing --colors=off to your formatting command to see if it fixes the issue for you ?

Thank you for the suggestion, but unfortunately, turning off colors with --colors=off did not resolve the issue.
According to the example video, did I do it correctly?

Screen.Recording.2567-07-18.at.11.39.37.mov

@nhedger
Copy link
Member

nhedger commented Jul 18, 2024

Got access to a laptop and was able to reproduce your issue. --colors=off indeed doesn't seem to fix the issue, thanks for trying! We'll have to investigate what is happening.

Similarly to biomejs/biome-intellij#71, this seems to happen only when using STDIN, calling format on the file directly works as expected.

@nhedger
Copy link
Member

nhedger commented Jul 18, 2024

Welp, it's actually the other way around.

// หน้าหลัก
const a = 1

Forcing colors

cat test.tsx | biome format --write --colors=force --stdin-file-path test.tsx
// หน้าหลัก
const a = 1;

Disabling colors

cat test.tsx| biome format --write --colors=off --stdin-file-path test.tsx
// หนาหลก
const a = 1;

Would you ming trying this too?

--colors=force

@mahirocoko
Copy link
Author

mahirocoko commented Jul 18, 2024

cat test.tsx | biome format --write --colors=force --stdin-file-path test.tsx

Thank you for the suggestion. I tried forcing colors with --colors=force. However, neither approach resolved the issue.

When forcing colors:

[0m'use client'

import type { FC } from 'react'

interface IComponentNameProps {}

const ComponentName: FC<IComponentNameProps> = () => {
  return <div>หน้า</div>
}

export default ComponentName
[0m

It still produces the incorrect output. However, these options did help with resolving the issue related to the language.

@nhedger
Copy link
Member

nhedger commented Jul 18, 2024

This is what I believe is happening:

  1. When disabling colors using --colors=off or leaving it on auto, Biome disables colors by default when using stdin.
  2. This leads Biome to assume that colors are not supported, which makes it go into the following block.
    if cfg!(windows) || !self.writer.supports_color() {
    let is_ascii = grapheme_is_ascii(grapheme);
    if !is_ascii {
    let replacement = unicode_to_ascii(grapheme.chars().nth(0).unwrap());
    replacement.encode_utf8(&mut buffer);
    if let Err(err) = self.writer.write_all(&buffer[..replacement.len_utf8()]) {
    self.error = Err(err);
    return Err(fmt::Error);
    }
    continue;
    }
    };
  3. Because the character is not ASCII, we attempt to find a replacement character, but unicode_to_ascii only accepts a single char, and in this case we take the first char of the grapheme. since the น้ grapheme is composed of two chars this means that we lose , ending up with .

I'll move this issue to the main repo as I don't believe it's specific to the extension

@nhedger nhedger transferred this issue from biomejs/biome-zed Jul 18, 2024
@nhedger nhedger added A-CLI Area: CLI A-Formatter Area: formatter S-Bug-confirmed Status: report has been confirmed as a valid bug labels Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI A-Formatter Area: formatter S-Bug-confirmed Status: report has been confirmed as a valid bug
Projects
None yet
Development

No branches or pull requests

2 participants