Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with international characters #42

Open
jacobalberty opened this issue Sep 5, 2019 · 1 comment
Open

Issues with international characters #42

jacobalberty opened this issue Sep 5, 2019 · 1 comment

Comments

@jacobalberty
Copy link

Just fired up the docker image and I'm getting this error over and over. I removed those files and things seem to be working fine now

mirror_1  | java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <clipped for privacy>/Fran?ais/timeQplus Guide de D?marrage Rapide (2016_09_12 16_41_37 UTC).pdf
mirror_1  |     at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
mirror_1  |     at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
mirror_1  |     at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
mirror_1  |     at java.nio.file.Paths.get(Paths.java:84)
mirror_1  |     at mirror.UpdateTree.find(UpdateTree.java:167)
mirror_1  |     at mirror.UpdateTree.addUpdate(UpdateTree.java:108)
mirror_1  |     at mirror.UpdateTree.addLocal(UpdateTree.java:97)
mirror_1  |     at mirror.MirrorSession.lambda$calcInitialState$1(MirrorSession.java:84)
mirror_1  |     at java.util.ArrayList.forEach(ArrayList.java:1257)
mirror_1  |     at mirror.MirrorSession.calcInitialState(MirrorSession.java:84)
mirror_1  |     at mirror.MirrorClient.startSession(MirrorClient.java:88)
mirror_1  |     at mirror.MirrorClient.access$300(MirrorClient.java:27)
mirror_1  |     at mirror.MirrorClient$SessionStarter.runOneLoop(MirrorClient.java:198)
mirror_1  |     at mirror.tasks.ThreadBasedTask.run(ThreadBasedTask.java:62)
mirror_1  |     at mirror.tasks.ThreadBasedTask.lambda$new$0(ThreadBasedTask.java:39)
mirror_1  |     at java.lang.Thread.run(Thread.java:748)
mirror_1  | 2019-09-05 03:35:21 INFO  Stopping session
mirror_1  | 2019-09-05 03:35:21 INFO  Connected, starting session, version unspecified
mirror_1  | 2019-09-05 03:35:21 INFO  Watchman root is /data/
@stephenh
Copy link
Owner

stephenh commented Sep 7, 2019

Hm, yeah, I'm not entirely surprised...I've had some issues reported in the past with this.

The core issue is that mirror uses watchman for file watching, which is a C library that uses POSIX APIs that are not UTF-8, they use whatever encoding the file system happens to use.

But for everything else that is not-file-watching, mirror uses the regular Java APIs, which assumed UTF-8.

So for vanilla strings that look the same in POSIX (via watchman via JNI) & UTF-8, everything is fine by happenstance.

Just for reference, I've tried to hack around some of this by just ignoring "can't be UTF-8" string failures from watchman's Java library:

stephenh/watchman@55847b5

https://github.com/stephenh/mirror/blob/master/src/main/java/mirror/watchman/WatchmanFileWatcher.java#L151

But for some reason your paths "made it past" the original POSIX -> watchman -> JNI -> mirror hop, and only failed when mirror then tried to send that thought-it-was-UTF-8 string back into Java's own "utf back to native" layer.

I really don't have any good ideas and realistically won't dive deeper on this at the moment.

If you can get a test case written that somehow creates a file path in git / the PR that exhibits the ^ failure, that'd be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants