Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The runDatagen task can never be cached by gradle, as it depends on its own outputs #1120

Open
solonovamax opened this issue May 21, 2024 · 4 comments

Comments

@solonovamax
Copy link
Contributor

The run datagen task can never be cached by gradle, as its inputs will always depend on its outputs.

Reproducing:

  1. Create simple fabric mod with data generation enabled
  2. Add the following code (kotlin; too lazy to port to groovy. the steps to reproduce still match up without this code, this code just helps understand it)
    tasks {
        val runDatagen by named("runDatagen") {
            doFirst {
                println(inputs.files.files)
            }
        }
    
        processResources {
            doFirst {
                println(inputs.files.files)
                println(outputs.files.files)
            }
        }
    }
  3. Run the runDatagen task several times without changing anything.

Expected:

All runs after the first are cached.

Actual:

None of the runs are cached.


Using the print statements, you can see that the runDatagen task depends on its own outputs in the following manner:

  • processResources -> takes /src/main/generated/ as an input
  • processResources -> outputs to /build/resources/main/
  • runDatagen -> takes /build/resources/main/ as an input
  • runDatagen -> outputs to /src/main/generated/

Since processResources is never cached and always re-run, this causes runDatagen to always be re-run.
This is quite annoying for people who don't want to check src/main/generated/* into VCS, and instead have their jar task depend on runDatagen (to ensure the generated resources always remain up to date), as it will cause runDatagen to constantly be re-run, even when no changes have been made.

@modmuss50
Copy link
Member

Do you have an idea on how to fix this? I dont think there is a solution as unfortunately there is a funimental circular dependency issue with datagen.

Datagen requires the mod to be built to run, but the mod requires datagen to be ran to build. By default we suggest mod devs manually run datagen to work around this.

@solonovamax
Copy link
Contributor Author

I was going to suggest manually adding the resources to the classpath of the required Jar and RunGameTask tasks, however I realized that you're setting the resources via SourceSet#resources, so this isn't particularly the best idea, for as long as the intent is to have the code for generating the resources be in the same source set as the code for the mod.

In the past, I believe I've seen similar things where a new source set is created, named smth like datagen, so new directories src/datagen/resources/, src/datagen/java/, src/datagen/kotlin/, etc. are introduced.

Looking at how ksp deals with smth similar, it seems that what they do is they filter the inputs to exclude any of the output directories.
I would recommend probably doing something similar.

although, do keep in mind that technically this can lead to invalid results, if, for whatever reason, the classes that do the data generation depend on some of the files they generate. (eg. attempt to load them and then use that to determine what is generated), then this can lead to incorrectly marking it as not needing to be re-run, when it does actually need to be re-run.

@Ampflower
Copy link

Ampflower commented Jun 25, 2024

I have noticed that inputs also depends on arguments containing a practically guaranteed to be random field, /tmp/loom-classpath17467358589055212817.args. I believe this also causes caching to be practically invalid as the hash will never match, even if you otherwise attempt to remove the reciprocal dependency. It'd be ideal if at all possible to make the classpath's filename a stable hash dependent on inputs and just hold a lock from deletion in the case of competing Looms.

This is shown by --debug as

[INFO] [org.gradle.internal.execution.steps.SkipUpToDateStep] Task ':xplat:runDatagen' is not up-to-date because:
  Value of input property 'jvmArgs' has changed for task ':xplat:runDatagen'

Disclaimer; I have found this through archloom, although if it is present in upstream (here), it would be contributing to the cache invalidation, which it seems to be here:

try {
final Path argsFile = Files.createTempFile("loom-classpath", ".args");
Files.writeString(argsFile, content, StandardCharsets.UTF_8);
args.add("@" + argsFile.toAbsolutePath());
} catch (IOException e) {
throw new UncheckedIOException("Failed to create classpath file", e);
}

@modmuss50
Copy link
Member

I have noticed that inputs also depends on arguments containing a practically guaranteed to be random field, /tmp/loom-classpath17467358589055212817.args. I believe this also causes caching to be practically invalid as the hash will never match, even if you otherwise attempt to remove the reciprocal dependency. It'd be ideal if at all possible to make the classpath's filename a stable hash dependent on inputs and just hold a lock from deletion in the case of competing Looms.

This is shown by --debug as

[INFO] [org.gradle.internal.execution.steps.SkipUpToDateStep] Task ':xplat:runDatagen' is not up-to-date because:
  Value of input property 'jvmArgs' has changed for task ':xplat:runDatagen'

Disclaimer; I have found this through archloom, although if it is present in upstream (here), it would be contributing to the cache invalidation, which it seems to be here:

try {
final Path argsFile = Files.createTempFile("loom-classpath", ".args");
Files.writeString(argsFile, content, StandardCharsets.UTF_8);
args.add("@" + argsFile.toAbsolutePath());
} catch (IOException e) {
throw new UncheckedIOException("Failed to create classpath file", e);
}

Oh wow, nice job spotting that. That can indeed be fixed! Would you mind opening an issue on the loom repo? One thing to keep in mind is we wouldn't want to start skipping the normal run game tasks if their inputs havent changed, only the data generation tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants