Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mojibake in Japanese file names when moving to and from The Outside World #179

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

seishun
Copy link

@seishun seishun commented Apr 23, 2023

I can think of two ways to handle MacJapanese in the emulator:

  • Encode and decode it manually like MacRoman. The code would be more complex than for MacRoman because MacJapanese has both one-byte and two-byte characters, plus there are many of them (see http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT).
  • Pretend it's Shift JIS and use iconv. It seems to work fine in one direction (a more recent version of Emscripten is needed for the other one).

@mihaip
Copy link
Owner

mihaip commented Apr 24, 2023

Doing it as Shift JIS seems like a reasonable start.

There's already some encoding metadata on the Python side (

stickies_encoding: str = "mac_roman"
) because we need to take it into account when generating the Stickies file. Instead of having to duplicate it on the TypeScript side, can you make the Python code that generates the chunked file JSON def (
json.dump(
{
"name": os.path.splitext(image.name)[0],
"totalSize": total_size,
"chunks": chunks,
"chunkSize": CHUNK_SIZE,
},
) also include that (the TypeScript side reads in that data via the type at
export type EmulatorChunkedFileSpec = {
name: string;
baseUrl: string;
totalSize: number;
chunks: string[];
chunkSize: number;
prefetchChunks: number[];
};
)

@seishun
Copy link
Author

seishun commented Apr 24, 2023

Setting name_encoding based on stickies_encoding seems wrong because they're kinda orthogonal. Setting both based on a "common" value (e.g. encoding) makes more sense, but it's tricky: in case of KanjiTalk, we want to pass "shift_jis" to str.encode() and something that corresponds to MacJapanese (probably a number) to the emulator. Can you think of a simple way to do this?

@gingerbeardman
Copy link

gingerbeardman commented Aug 24, 2023

something that corresponds to MacJapanese (probably a number)

The ancient and official numbering scheme probably makes most sense:

Encoding Enum
MacRoman 0
MacJapanese 1
MacChineseTrad 2
MacKorean 3
... ...

Sources for the full list:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants