Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NoSuchFileException during dynamic Javac execution with experimental_delay_virtual_input_materialization #12904

Closed
kevingessner opened this issue Jan 26, 2021 · 6 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug

Comments

@kevingessner
Copy link
Contributor

kevingessner commented Jan 26, 2021

Description of the problem:

When building java targets with dynamic execution, I frequently (but not always) get a crash compiling java_library targets. Stack traces below; they seem to be the same problem.

  • It's always a jar's .params file that is missing.
  • The problem seems to only occur with --experimental_delay_virtual_input_materialization enabled.
  • Enabling --worker_sandboxing doesn't help.
  • It crashes much more frequently with small values of --experimental_local_execution_delay.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I can't provide a minimal example because I don't have a public remote exec environment to share, but here are the flags I use for dynamic execution:

build:dynamic --internal_spawn_scheduler --nolegacy_spawn_scheduler --experimental_delay_virtual_input_materialization --worker_sandboxing
build:dynamic --experimental_local_execution_delay=2
build:dynamic --strategy=Javac=dynamic

I've seen it with experimental_local_execution_delay as high as 250; a small value like 2 makes it occur almost every build.

What operating system are you running Bazel on?

Linux 5.1.0-1.el7.elrepo.x86_64

What's the output of bazel info release?

release 3.7.2

Any other information, logs, or outputs that you want to share?

ERROR: /[redacted]/lib/common/src/main/java/com/foo/bar/util/BUILD.bazel:1:13: Building lib/common/src/main/java/com/etsy/foo/bar/libutil.jar (1 source file) failed: IOException while borrowing a worker from the pool:

---8<---8<--- Exception details ---8<---8<---
java.nio.file.NoSuchFileException: /[redacted]/execroot/com_etsy_search/bazel-out/k8-fastbuild/bin/lib/common/src/main/java/com/foo/bar/
util/libutil.jar-0.params
        at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(Unknown Source)
        at java.base/java.nio.file.Files.newByteChannel(Unknown Source)
        at java.base/java.nio.file.Files.newByteChannel(Unknown Source)
        at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(Unknown Source)
        at java.base/java.nio.file.Files.newInputStream(Unknown Source)
        at java.base/java.nio.file.Files.newBufferedReader(Unknown Source)
        at java.base/java.nio.file.Files.readAllLines(Unknown Source)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.expandArgument(WorkerSpawnRunner.java:341)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.createWorkRequest(WorkerSpawnRunner.java:299)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:422)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:225)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:142)
        at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:240)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:140)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.runLocally(DynamicSpawnStrategy.java:429)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.access$200(DynamicSpawnStrategy.java:69)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy$1.callImpl(DynamicSpawnStrategy.java:311)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy$Branch.call(DynamicSpawnStrategy.java:522)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy$Branch.call(DynamicSpawnStrategy.java:459)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
---8<---8<--- End of exception details ---8<---8<---
---8<---8<--- Exception details ---8<---8<---                                                                                                                                                     [62/44575]
java.io.FileNotFoundException: /home/kgessner/.cache/bazel/_bazel_kgessner/5051302d21274f293ef38239118817b9/execroot/com_etsy_search/bazel-out/k8-fastbuild/bin/apps/mmx/mmx_autosuggest-class.jar-0.params
(No such file or directory)
        at com.google.devtools.build.lib.unix.NativePosixFiles.lstat(Native Method)
        at com.google.devtools.build.lib.unix.UnixFileSystem.statInternal(UnixFileSystem.java:185)
        at com.google.devtools.build.lib.unix.UnixFileSystem.stat(UnixFileSystem.java:175)
        at com.google.devtools.build.lib.unix.UnixFileSystem.resolveOneLink(UnixFileSystem.java:130)
        at com.google.devtools.build.lib.vfs.FileSystem.appendSegment(FileSystem.java:353)
        at com.google.devtools.build.lib.vfs.FileSystem.resolveSymbolicLinks(FileSystem.java:421)
        at com.google.devtools.build.lib.vfs.Path.resolveSymbolicLinks(Path.java:622)
        at com.google.devtools.build.lib.vfs.DigestUtils.manuallyComputeDigest(DigestUtils.java:251)
        at com.google.devtools.build.lib.vfs.DigestUtils.getDigestWithManualFallback(DigestUtils.java:219)
        at com.google.devtools.build.lib.actions.FileArtifactValue.create(FileArtifactValue.java:257)
        at com.google.devtools.build.lib.actions.FileArtifactValue.createFromStat(FileArtifactValue.java:238)
        at com.google.devtools.build.lib.exec.SingleBuildFileCache.lambda$getMetadata$0(SingleBuildFileCache.java:67)
        at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876)
        at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
        at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
        at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
        at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4871)
        at com.google.devtools.build.lib.exec.SingleBuildFileCache.getMetadata(SingleBuildFileCache.java:61)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$DelegatingPairFileCache.getMetadata(SkyframeActionExecutor.java:1752)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.createWorkRequest(WorkerSpawnRunner.java:306)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:422)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:225)
        at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:142)
        at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:240)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:140)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.runLocally(DynamicSpawnStrategy.java:429)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.access$200(DynamicSpawnStrategy.java:69)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy$1.callImpl(DynamicSpawnStrategy.java:311)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy$Branch.call(DynamicSpawnStrategy.java:522)
        at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy$Branch.call(DynamicSpawnStrategy.java:459)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
---8<---8<--- End of exception details ---8<---8<---
@gregestren gregestren added team-Local-Exec Issues and PRs for the Execution (Local) team type: bug untriaged labels Jan 26, 2021
@meisterT meisterT added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Feb 10, 2021
@larsrc-google
Copy link
Contributor

I've been chasing down something similar internally. It's a mix of poor error reporting and the inherent hard-to-reproduce state of dynamic execution bugs. In the case I was looking at, a rule was creating outputs that were set non-readable, which under certain conditions would trigger a crash that looked like this.

@larsrc-google
Copy link
Contributor

The internal problem didn't look similar to this. I have seen nothing like this.

@kevingessner
Copy link
Contributor Author

Perhaps eb762d4 in 4.1.0 will help here?

@ron-stripe
Copy link
Contributor

I see it too when mixing remote with workers when using dynamic.---8<---8<--- Exception details ---8<---8<---
java.io.FileNotFoundException: /private/var/tmp/_bazel/a9c5c6412b075cac44add7ca183b7f52/execroot/com_stripe_uppsala/bazel-out/host/bin/src/xxxxx/XXXXXGenerator.jar-0.params (No such file or directory)
at com.google.devtools.build.lib.unix.NativePosixFiles.stat(Native Method)
at com.google.devtools.build.lib.unix.UnixFileSystem.statInternal(UnixFileSystem.java:184)
at com.google.devtools.build.lib.unix.UnixFileSystem.stat(UnixFileSystem.java:175)
at com.google.devtools.build.lib.vfs.Path.stat(Path.java:418)
at com.google.devtools.build.lib.exec.SingleBuildFileCache.lambda$getMetadata$0(SingleBuildFileCache.java:67)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
at com.google.common.cache.LocalCache.get(LocalCache.java:3951)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4871)
at com.google.devtools.build.lib.exec.SingleBuildFileCache.getMetadata(SingleBuildFileCache.java:61)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$DelegatingPairFileCache.getMetadata(SkyframeActionExecutor.java:1752)
at com.google.devtools.build.lib.worker.WorkerSpawnRunner.createWorkRequest(WorkerSpawnRunner.java:306)
at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:422)
at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:225)
at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:142)
at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:240)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:140)
at com.google.devtools.build.lib.dynamic.LegacyDynamicSpawnStrategy.runLocally(LegacyDynamicSpawnStrategy.java:368)
at com.google.devtools.build.lib.dynamic.LegacyDynamicSpawnStrategy.access$200(LegacyDynamicSpawnStrategy.java:66)
at com.google.devtools.build.lib.dynamic.LegacyDynamicSpawnStrategy$1.callImpl(LegacyDynamicSpawnStrategy.java:199)
at com.google.devtools.build.lib.dynamic.LegacyDynamicSpawnStrategy$DynamicExecutionCallable.call(LegacyDynamicSpawnStrategy.java:424)
at com.google.devtools.build.lib.dynamic.LegacyDynamicSpawnStrategy$DynamicExecutionCallable.call(LegacyDynamicSpawnStrategy.java:403)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
---8<---8<--- End of exception details ---8<---8<---

@larsrc-google
Copy link
Contributor

I notice the paths are all in the execroot, not the worker directories. So it might be well a race condition. I was just talking with @tjgq about the materialization logic and how the workers don't really need to re-read the file. Keeping the file content in memory would save some disk reads and hopefully prevent this problem. The expansion (reading files into the request arguments) happen at src/main/java/com/google/devtools/build/lib/worker/WorkerSpawnRunner.java:301, the writing of the files happens at src/main/java/com/google/devtools/build/lib/sandbox/SandboxHelpers.java:477. While the write itself is atomic, it's possible that other workers interfere later.

@larsrc-google
Copy link
Contributor

If this is still happening, could you give some more details on what the command lines look like for these worker requests? Are there multiple or recursive flagfiles?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Local-Exec Issues and PRs for the Execution (Local) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants