Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix locale #37

Open
qianxichen233 opened this issue Nov 18, 2024 · 12 comments
Open

Fix locale #37

qianxichen233 opened this issue Nov 18, 2024 · 12 comments
Assignees
Labels
WASM WASM related issue

Comments

@qianxichen233
Copy link
Contributor

When fixing bash, I faced a lot of issues related to locale. Previously, locale is not working and I commented them out since I thought they are not important as locale is mainly used for setting up language specific context. But it turns out locale is used much more often than I thought. For example, isblank will look up locale to determine language specific blank character. I currently have to manually modify each function that uses locale, which is working fine, but it might probably be a better choice to just completely fix locale.

@Yaxuan-w Yaxuan-w added the WASM WASM related issue label Nov 19, 2024
@rennergade
Copy link
Contributor

Screenshot from 2024-11-20 16-03-22

From some brief research locale files seem to just be ASCII text, though they are compiled to that? https://stackoverflow.com/questions/65514890/glibcs-isalpha-function-and-the-en-us-utf-8-locale

I believe all we should have to do is compile the locale files and place them in the appropriate directory in the file system

@rennergade
Copy link
Contributor

If what I posted above is true it shouldnt take any actual changes, just fixing the build and setup processes. @robinyuan1002 and @yzhang71 can you try that out.

@qianxichen233
Copy link
Contributor Author

To reproduce the issue, I created a simple program that uses isblank function, which uses locale stuff internally to determine if the character is considered as blank under current language. This program is not working as expected and isblank will always return false.

I digged into this a little bit. And found something interesting (actually by accident). So isblank will try to query __libc_tsd_CTYPE_B. And __libc_tsd_CTYPE_B is something gets set up under __ctype_init function. In crt1.c, we did not call the function to initialize ctype, so __libc_tsd_CTYPE_B is always empty previously. This is fixed by just calling __ctype_init in _start function in crt1.c.
However, isblank is still not working after adding __ctype_init. In __ctype_init, __libc_tsd_CTYPE_B is set to _nl_current_LC_CTYPE, which sets from _nl_global_locale (from lc-ctype.c and localeinfo.h) by default. The _nl_global_locale also has a default value, sets under global-locale.c file. One of the entry in _nl_global_locale is set to _nl_C_LC_CTYPE_class, which should hold all the information about the default character sets. The _nl_C_LC_CTYPE_class is a static array defined in C-ctype.c.
When I tried to inspect the _nl_C_LC_CTYPE_class stored in _nl_global_locale when __ctype_init is called, I found out that _nl_C_LC_CTYPE_class is empty, which confused me a bit since _nl_C_LC_CTYPE_class is supposed to defined in C-ctype.c. I did some further investigation and finally accidentally found out it is probably because _nl_C_LC_CTYPE_class got overriden by something unexpected. Because in global-locale.c, _nl_C_LC_CTYPE_class is defined as weak extern. And if I removed the weak definition, _nl_C_LC_CTYPE_class gets set to the correct value from C-ctype.c and isblank also able to work correctly.

So it is still unknown what is trying to override _nl_C_LC_CTYPE_class, and fixing isblank is just the first step of fixing locale. We probably also need to make sure functions like setlocale are able to work, and also need to make sure bash is able to run after reverting all the previous temporary solution on locale functions.

@rennergade
Copy link
Contributor

Great, @qianxichen233 can you save a branch where we can reproduce this? Robin and I will look into it.

@robinyuan1002 after your finals lets create a few test cases with different locale related functions.

@rennergade rennergade self-assigned this Dec 18, 2024
@rennergade
Copy link
Contributor

The problem might be that wasm doesn't handle weak externs correctly at all. Seems like there's some discussion here: emscripten-core/emscripten#12819

@qianxichen233
Copy link
Contributor Author

I just pushed a branch named "fix-locale", which contains the changes I made. Basically the changes I decribed above, and I also reverted three locale related file back to its unmodified state.

@robinyuan1002
Copy link
Collaborator

The problem now is that when we use uselocale function in a test case and try to compile this test case into wasm, we encountered many errors(e.g /home/lind-wasm/src/glibc/sysroot/lib/wasm32-wasi/libc.a(uselocale.o): relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against an undefined symbol _nl_current_IDENTIFICATION). I find out for some reason "nl_current##category" is not in the object file after compile.

@rennergade
Copy link
Contributor

The problem now is that when we use uselocale function in a test case and try to compile this test case into wasm, we encountered many errors(e.g /home/lind-wasm/src/glibc/sysroot/lib/wasm32-wasi/libc.a(uselocale.o): relocation R_WASM_MEMORY_ADDR_TLS_SLEB cannot be used against an undefined symbol _nl_current_IDENTIFICATION). I find out for some reason "nl_current##category" is not in the object file after compile.

There seems to be some weird things happening with linking in _nl_current. I'm not sure if this a vtable thing like @yzhang71 saw before.

Dennis can you possibly help Robin look at this?

@qianxichen233
Copy link
Contributor Author

a little bit more information: we tried to output the pre-compiled c file (with all macro expanded) and we can see clearly _nl_current stuff are defined in the file, something like extern _thread struct .... _nl_current_... attribute("hidden") attribute("tls_...");. But if we look into wasm-objdump the compiled object file, these symbols are not here

@qianxichen233
Copy link
Contributor Author

I found out that these _nl_current_ symbols are actually defined. For example, _nl_current_LC_MESSAGES are defined in locale/lc-messages.c. So it is not because these symbols are not defined.
The reason why these symbols failed the linking is because we have incorrect order of object in the generated libc archive file under sysroot. To verify this, I first tried to manually pass the desired object file when compile (i.e. clang ... locale_program.c build/lc-messages.o -o locale_program.wasm) and the error for _nl_current_LC_MESSAGES is gone. But I checked the libc.a, I found lc-messages.o are inside the libc.a and it contains the desired symbol. But wasm-ld still cannot link the _nl_current_LC_MESSAGES symbol to uselocale.o. One thing I noticed is that lc-messages.o is included below the uselocale.o in the archive file, so I tried to reorder the object file and make lc-messages.o appears above uselocale.o in the archive file. And the linking is successful. It is pretty weird to me the order of object files matters in the archive file. I saw an answer on stackoverflow, it basically said that the order does matter a long time ago, but this should be no longer required "on any UNIX system newer than ~15-20 years" (and the answer was in 2011).
One other thing I noticed is that, if I delete the __thread keyword on _nl_current_ declaration, I am also able to compile the locale program successfully. I checked libc.a in this case as well, and lc-messages.o is still defined below uselocale.o but it can be linked successfully. So my current assumption is that: this might be a bug in wasm-ld, that when it links wasm TLS symbols (variable defined with __thread keyword), the object files have to be order-dependent for some reason. When the symbols are not wasm TLS symbol, it is not order-dependent.

@JustinCappos
Copy link
Member

Okay, so it sounds like you've found a workaround for this now! Let's document this as an issue to work more on later, but just do the ordering in a way that works for us and move on.

@rennergade
Copy link
Contributor

Definitely agree with Justin that this should be good enough for now.

We're having issues with TLS elsewhere yes? Probably worth starting an issue on TLS to track any problems we encounter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WASM WASM related issue
Projects
None yet
Development

No branches or pull requests

6 participants