Skip to content

Conversation

criticic
Copy link

@criticic criticic commented Jul 31, 2025

Closes #6528

cc @tdhock @Anirban166

This PR updates the Autocomment-atime-results action to version v1.5.0, which brings two key improvements:

  1. Secure support for PRs from forks
  2. Improved build times via caching of built packages

Due to GitHub's cache access restrictions, caches created in PRs are not accessible across workflows or branches. To work around this, the cache needs to be populated from the base branch (e.g., on a push or via workflow_dispatch, as currently implemented), so that PRs can reuse it.

The README has also been updated with more information about how caching works:
Autocomment-atime-results README.md

For further context, see the PR which added this change: Anirban166/Autocomment-atime-results#44

Some Stats (Before vs Now):

  • First run: 17 minutes -> 10 minutes
  • Subsequent runs: 6 minutes -> 3 minutes
    Note: Version v1.4.3 already introduced caching via r-lib/actions/setup-r-dependencies@v2, which implicitly cached-R packages by default. With v1.5.0, we now additionally cache libgit, resulting in further reductions in run time.

The first three runs here would be relevant: https://github.com/criticic/data.table-test-work-workflow/actions

@criticic criticic requested a review from MichaelChirico as a code owner July 31, 2025 06:53
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.77%. Comparing base (d875823) to head (bd196db).
⚠️ Report is 49 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7230   +/-   ##
=======================================
  Coverage   98.77%   98.77%           
=======================================
  Files          81       81           
  Lines       15241    15241           
=======================================
  Hits        15055    15055           
  Misses        186      186           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tdhock
Copy link
Member

tdhock commented Jul 31, 2025

Thanks for your PR!
The code changes seem reasonable, but I inspected the results, and I think there is an issue.
criticic/data.table-test-work-workflow#26 has the following result figure currently:
68747470733a2f2f61737365742e636d6c2e6465762f653265366437366462633965366364306164313164373565623666383533323462386362383665323f636d6c3d706e672663616368652d6279706173733d63653933373864662d646365352d343130612d386
it shows 4 tests where HEAD=Fast/Fixed which is slower than CRAN. That is not normal.

For example on this recent PR #7232 we see the result below
68747470733a2f2f61737365742e636d6c2e6465762f663165626162393536316430393264623039336662393962323336633635353065336635316332333f636d6c3d706e672663616368652d6279706173733d34636532346335312d386330662d343431382d613
where CRAN is just as fast as HEAD/Fast/Fixed.

These data suggest that there may be some issue with the caching code, or at least with the testing repo https://github.com/criticic/data.table-test-work-workflow (which I expect is giving a result that is similar to what we should expect in the main data.table repo after merging this PR, is that right?)

Can you please investigate and fix/explain?

@criticic
Copy link
Author

I'll rerun the tests and try to figure out what's causing this.

@criticic
Copy link
Author

criticic commented Aug 7, 2025

Update, the changes seem to be root of the problem, atleast on a repeated run before and after, but I cannot pinpoint the issue, neither am i able replicate this locally, i'll report back if i have any progress

@criticic criticic marked this pull request as draft August 18, 2025 08:08
@criticic
Copy link
Author

@Anirban166 @tdhock sorta stuck here, not sure how to proceed, what should i do?

@tdhock
Copy link
Member

tdhock commented Aug 19, 2025

can you please explain what you have done to investigate?

@tdhock
Copy link
Member

tdhock commented Aug 27, 2025

the installing different package versions seems to be going a lot faster, even though this PR was never merged. Why?

One run is https://github.com/Rdatatable/data.table/actions/runs/17267722462

which gave this result (20 seconds)
image

@criticic
Copy link
Author

criticic commented Aug 27, 2025

it is because 1.4.3 introduced the r-lib/actions/setup-r-dependencies@v2, which caches the dependencies by default, so only the libgit2 build time is the major blocker there

see https://github.com/Rdatatable/data.table/actions/caches

@tdhock
Copy link
Member

tdhock commented Aug 27, 2025

ok then if caching is already working, then what is the benefit of this pr?

@criticic
Copy link
Author

libgit is not being cached here, and this PR also supports PR from forks.

@tdhock
Copy link
Member

tdhock commented Aug 27, 2025

ok that makes sense, thanks!
did you ever figure out why CRAN version is slow in atime result figures using this PR?

@criticic
Copy link
Author

criticic commented Aug 27, 2025

i did not yet, one thing i was trying was to just compare the cached files between runs see what changed, and another would prolly be to test part by part. I will look into it, this weekend.

@tdhock
Copy link
Member

tdhock commented Aug 28, 2025

For the current action (1.4.3), I have observed that the first run in a new branch/pr is slow (~10 minutes for the installing different package versions) whereas the next runs are fast (~20 seconds). does that make sense to you?
image

The docs say "To get the most benefit, you can "warm up" the cache by running the workflow on your main branch. This is because PRs cannot share caches with each other, but they can access caches created from the base branch (e.g., main)."
So does that mean the action has not been run on data.table master?
Looking at https://github.com/Rdatatable/data.table/actions/workflows/performance-tests.yml I do not see any runs on master branch?

@criticic
Copy link
Author

yup that's just because currently all the caches are scoped to that specific PR, so it helps on subsequent runs on the same PR. if it was run on master, then all PRs could use this cache, rather than each having to regenerate it

@jangorecki
Copy link
Member

Regarding supporting PRs from forks. As long as it runs on github CI runners then it's ok.
In case we want to migrate atime performance tests away from github to any kind of internal CI runners, we should prohibit that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

keeping built/installed historical versions between atime CI runs
4 participants