Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cache test WDL and GitHub Action #66

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

tefirman
Copy link
Collaborator

@tefirman tefirman commented Jan 16, 2025

Description

  • Call-caching is only triggered during multiple calls of the same workflow
  • Adding a new GitHub action to call a very simple workflow twice
  • Custom Cromwell configuration is necessary to enable call-caching
  • Action checks the Cromwell logs of the second run to ensure call-caching was performed
  • Third run uses slightly different inputs to ensure that call-caching is not used in that case
  • Action again checks Cromwell logs to confirm this for the third run.

Related Issues

Testing

  • Executed GitHub Action on this branch multiple times, works as expected.

@tefirman tefirman linked an issue Jan 16, 2025 that may be closed by this pull request
@tefirman
Copy link
Collaborator Author

Added @sitapriyamoorthi for review of the Cromwell/WDL aspects.
Added @sckott for review of the GitHub Action yml.
Tagging @seankross for visibility.

If we feel like this shouldn't be its own GitHub Action, we can definitely explore other options, just let me know.

@tefirman tefirman added unit test Adding or modifying a unit test infrastructure Infrastructure fix to execute WDL GitHub Actions labels Jan 16, 2025
Copy link
Collaborator

@sckott sckott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great

.github/workflows/test-cromwell-cache.yml Outdated Show resolved Hide resolved
.github/workflows/test-cromwell-cache.yml Outdated Show resolved Hide resolved
cacheTest/README Outdated Show resolved Hide resolved
.github/workflows/test-cromwell-cache.yml Show resolved Hide resolved
Copy link
Collaborator

@sitapriyamoorthi sitapriyamoorthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tefirman sorry looking for some clarification here:

  1. Why do we need to modify the cromwell.conf? Can we not articulate this simple by having three WDLs with different options.json run in sequence with different caching configurations?
  2. Wouldnt modifying the cromwell.conf affect the other unit tests?

Sorry just trying to gain some clarity as to why we need to reconfigure cromwell.conf file?

@tefirman
Copy link
Collaborator Author

tefirman commented Jan 27, 2025

@sitapriyamoorthi -- No worries, the Cromwell setup is super confusing honestly.

  1. The call-caching capability first needs to be enabled in your underlying config, and only then can it actually be triggered via the options json. Call-caching is disabled by default, so without providing that cromwell.conf file, the caching arguments in the options json get ignored and caching is never invoked. See Cromwell docs here.
  2. The way the GitHub Actions are set up, cromwell.conf is only utilized for this caching unit test, so all the other unit tests will remain unchanged.

Copy link
Collaborator

@sitapriyamoorthi sitapriyamoorthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions

  1. Write two separate unit tests:
    • One for validating that outputs are properly written to the cache.
    • Another for ensuring the workflow reads outputs from the cache when appropriate.
  2. Optionally, include a third combined test for end-to-end behavior to validate the full caching cycle.

Why Separate Tests?

  1. Clarity: By isolating the read and write behaviors, you can clearly identify which part of the caching functionality is working or failing.

    • A failure in a combined test could make it harder to pinpoint whether the issue lies in writing to the cache, reading from it, or both. Especially knowing that the cluster permission issues can also affect this.
  2. Modular Testing: Unit tests should ideally focus on a single functionality or behavior.

    • A separate test for writing ensures that the cache is being populated correctly.
    • A separate test for reading ensures that previously cached results are being accessed as expected.
  3. Debugging: If something goes wrong, separate tests make it easier to debug. For example:

    • If the write-to-cache test fails, you know there's an issue with how the cache is being created or populated.
    • If the read-from-cache test fails, it’s likely an issue with cache retrieval or matching logic.
  4. Future Scalability: Separate tests make it easier to expand coverage for more complex caching scenarios in the future (e.g., testing caching with different runtime conditions or inputs).

@tefirman
Copy link
Collaborator Author

I definitely agree with splitting these unit tests in concept. I will say that in practice, that becomes kinda difficult due to fresh environments for each GitHub Action, i.e. I can't read from cache without writing from the cache first. It's possible in theory though, let me give it a shot. I'm thinking the general approach will be:

  • Write both jobs in the same yml
  • Have the read-from-cache check rely on the write-to-cache check
  • Pass the cache metadata from write-to-cache to read-to-cache

Although, now I'm wondering if this would just be easier as one of the python-based "api-tests"... @sckott, penny-for-your-thoughts here?

@sckott
Copy link
Collaborator

sckott commented Jan 28, 2025

I'm wondering if this would just be easier as one of the python-based "api-tests"

happy to think about it with you but don't totally understand - maybe we could chat about it on a call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Infrastructure fix to execute WDL GitHub Actions unit test Adding or modifying a unit test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching with identical inputs Cache invalidation on environmental changes
3 participants