Parse Starlark files as raw bytes for Bzlmod #24217

fmeum · 2024-11-05T19:37:34Z

As long as Bazel internally represents strings as raw bytes "encoded" in Latin-1, the same must be true for all Starlark files that may contain file system paths.

Also includes changes to the Python test setup:

ScratchFile now always writes files as UTF-8
RunProgram encodes and decodes stdin/stderr/stdout as UTF-8
download no longer leaks a file

fmeum · 2024-11-05T19:37:54Z

This needs #24010 to get its test to pass.

fmeum · 2024-11-05T19:38:01Z

@bazel-io fork 8.0.0

fmeum · 2024-11-07T18:24:14Z

Ready for review now.

@tjgq FYI, this is the other Unicode-related PR I am planning to get into 8.0.0. If you see anything else that looks incompatible, please let me know.

fmeum · 2024-11-07T19:01:33Z

Turns out that there is an interesting failure, switching back to draft.

fmeum · 2024-11-07T22:48:05Z

@Wyverald CI should be green now, I just had to fix some issues in the Python test setup code.

Wyverald · 2024-11-13T23:17:31Z

hmm, actually -- does this require the "re-land" PR first? if so, we should hold back on the import.

fmeum · 2024-11-13T23:37:55Z

hmm, actually -- does this require the "re-land" PR first? if so, we should hold back on the import.

I think it does for the test to pass.

As long as Bazel internally represents strings as raw bytes "encoded" in Latin-1, the same must be true for all Starlark files that may contain e.g. file system paths.

fmeum · 2024-11-14T18:05:22Z

@Wyverald The reland has happened.

Wyverald · 2024-11-15T19:14:24Z

Do I understand correctly that, while this PR fixes the use case of non-ASCII characters in file paths, it "breaks" the case where non-ASCII characters are passed in e.g. string attrs in tags?

fmeum · 2024-11-15T19:44:27Z

Do I understand correctly that, while this PR fixes the use case of non-ASCII characters in file paths, it "breaks" the case where non-ASCII characters are passed in e.g. string attrs in tags?

It actually fixes both: The .bzl files containing the module extension source are read as raw bytes, module and repo context functions read and write raw bytes from files, ... With this change, strings attrs in tags will also line up with e.g. string literals in extension code.

Wyverald · 2024-11-15T20:01:34Z

I see!

module and repo context functions read and write raw bytes from files

I just double-checked, and it seems that we try to write files in UTF-8 in mctx/rctx: code

It's been broken like this forever, so presumably nobody has been relying on it working...

Wyverald · 2024-11-15T20:02:57Z

Also, presumably this means print(some_tag.some_attr) could yield gibberish, but that's nothing new. (To be clear, the import is already under way -- this is just me trying to understand the problem better)

fmeum · 2024-11-15T20:06:17Z

I just double-checked, and it seems that we try to write files in UTF-8 in mctx/rctx: code
It's been broken like this forever, so presumably nobody has been relying on it working...

That's being fixed in #24182 :-)

Also, presumably this means print(some_tag.some_attr) could yield gibberish, but that's nothing new.

It doesn't, since the Bazel client ultimately outputs raw bytes as well. That's being tested in #24243.

To be clear, the import is already under way -- this is just me trying to understand the problem better

In case it is not too late, I pushed a new test to demonstrate this end-to-end for tags.

Wyverald · 2024-11-15T20:11:54Z

Ahh, thanks for the explanation. It sounds like we're actually just treating Starlark strs as byte arrays everywhere now, right?

In case it is not too late, I pushed a new test to demonstrate this end-to-end for tags.

Ack -- I'll make sure that's imported too

fmeum · 2024-11-15T20:25:57Z

Ahh, thanks for the explanation. It sounds like we're actually just treating Starlark strs as byte arrays everywhere now, right?

Yes, that's correct. It's actually not the worst in a world where you increasingly deal with binary data and UTF-8 strings only.

fmeum · 2024-11-15T20:26:40Z

@bazel-io fork 8.0.0

fmeum requested review from Wyverald and meteorcloudy as code owners November 5, 2024 19:37

github-actions bot added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. awaiting-review PR is awaiting review from an assigned reviewer labels Nov 5, 2024

bazel-io mentioned this pull request Nov 5, 2024

[8.0.0] Parse Starlark files as raw bytes for Bzlmod #24218

Open

fmeum marked this pull request as draft November 7, 2024 16:26

fmeum force-pushed the 23859-unicode-bzlmod branch from 5b5f9a5 to 31f0271 Compare November 7, 2024 18:22

fmeum marked this pull request as ready for review November 7, 2024 18:22

fmeum marked this pull request as draft November 7, 2024 19:01

fmeum force-pushed the 23859-unicode-bzlmod branch 8 times, most recently from 165f03b to a68c20e Compare November 7, 2024 22:46

fmeum marked this pull request as ready for review November 7, 2024 22:47

Wyverald approved these changes Nov 13, 2024

View reviewed changes

Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 13, 2024

Wyverald added awaiting-review PR is awaiting review from an assigned reviewer and removed awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally labels Nov 13, 2024

Parse Starlark files as raw bytes for Bzlmod

e2a9596

As long as Bazel internally represents strings as raw bytes "encoded" in Latin-1, the same must be true for all Starlark files that may contain e.g. file system paths.

fmeum force-pushed the 23859-unicode-bzlmod branch from a68c20e to e2a9596 Compare November 14, 2024 18:03

fmeum requested a review from Wyverald November 14, 2024 18:10

Wyverald added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 15, 2024

Add tag test

07c1e7d

bazel-io mentioned this pull request Nov 15, 2024

[8.0.0] Parse Starlark files as raw bytes for Bzlmod #24348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse Starlark files as raw bytes for Bzlmod #24217

Parse Starlark files as raw bytes for Bzlmod #24217

fmeum commented Nov 5, 2024 •

edited

Loading

fmeum commented Nov 5, 2024

fmeum commented Nov 5, 2024

fmeum commented Nov 7, 2024

fmeum commented Nov 7, 2024

fmeum commented Nov 7, 2024

Wyverald commented Nov 13, 2024

fmeum commented Nov 13, 2024

fmeum commented Nov 14, 2024

Wyverald commented Nov 15, 2024

fmeum commented Nov 15, 2024

Wyverald commented Nov 15, 2024

Wyverald commented Nov 15, 2024

fmeum commented Nov 15, 2024

Wyverald commented Nov 15, 2024 •

edited

Loading

fmeum commented Nov 15, 2024

fmeum commented Nov 15, 2024

Parse Starlark files as raw bytes for Bzlmod #24217

Are you sure you want to change the base?

Parse Starlark files as raw bytes for Bzlmod #24217

Conversation

fmeum commented Nov 5, 2024 • edited Loading

fmeum commented Nov 5, 2024

fmeum commented Nov 5, 2024

fmeum commented Nov 7, 2024

fmeum commented Nov 7, 2024

fmeum commented Nov 7, 2024

Wyverald commented Nov 13, 2024

fmeum commented Nov 13, 2024

fmeum commented Nov 14, 2024

Wyverald commented Nov 15, 2024

fmeum commented Nov 15, 2024

Wyverald commented Nov 15, 2024

Wyverald commented Nov 15, 2024

fmeum commented Nov 15, 2024

Wyverald commented Nov 15, 2024 • edited Loading

fmeum commented Nov 15, 2024

fmeum commented Nov 15, 2024

fmeum commented Nov 5, 2024 •

edited

Loading

Wyverald commented Nov 15, 2024 •

edited

Loading