CI tests take 20 mins #231

peterdudfield · 2024-12-10T08:38:21Z

Is there a way to speed up CI tests?
Locally they are a lot quicker for me?
Investigate what is taking a long time in CI

laurenceandrews · 2024-12-14T03:43:07Z

Happy to take a look at this as my first issue here! Let me know your thoughts/any info I'm missing on the current CI setup as this is my very first glance at the code.

A full pytest runs in around 3:45 for me locally. Doesn't seem like I have permission to re-run somebody else's Github action to test CI speed. I'm assuming if I make my own branch and PR, this would be the easiest way to test CI speed?

CI runners suffer from several things (other than generally limited resources):

1. Cold environment

Dependencies, environments, and temp files are re-installed and rebuilt every run (this is probably the main difference maker if it's not something we're already conscious of in the CI setup).

2. Higher network latency

CI runners can experience slower external API calls or downloads compared to a fast, stable local network.

3. Restrictions on parallelisation

Tests that run in parallel locally (if they do at all) may not be optimally configured to do so in CI.

4. Slower disk I/O

File reading/writing and large dataset/temp file creation is often slower on CI runners compared to local SSDs.

Below are some areas we could look at to help tackle the above:

1. Cold Environment

Cache Python dependencies using GitHub Actions cache to avoid reinstalling them every run.
Explore caching temporary or intermediate files created during tests.

2. Higher Network Latency

Replace API or external service calls with mocks where possible to eliminate network delays.
Consider using pre-downloaded static datasets or artifacts in place of live network fetches.

3. Restrictions on Parallelisation

Look into pytest-xdist to run tests in parallel within the CI environment.
Adjust parallelization settings in pytest to find the optimal number of workers for CI’s limited resources.

4. Slower Disk I/O

Reduce reliance on larger temporary datasets or files during tests, or mock their creation.
Optimize file operations by using memory-efficient formats or in-memory operations (e.g., StringIO instead of actual files).
Profile specific tests that rely heavily on I/O to identify bottlenecks and optimize them.

peterdudfield · 2024-12-14T08:10:02Z

Any help would be great

peterdudfield · 2024-12-15T21:14:13Z

Yea, if you fork and run github actions yourself, thats probably the best way to do it

laurenceandrews · 2024-12-23T13:48:46Z

Sorry, going very slowly on this with Christmas activities going on.

Upon cloning I immediately ran into some failing tests, so I need to see to that before I can analyse the speed issue.

Any idea if the failing tests are a common/known issue before I look deeper into them?

peterdudfield · 2024-12-23T14:04:33Z

Which tests are failing? You might need to log in to huggingface, but I'm not totaly sure

laurenceandrews · 2024-12-29T23:09:21Z

Thanks, here's what I've done to consistently get to this impasse:

Clone from scratch
Create a venv using Python 3.11.0
Run conda install -c conda-forge pyresample and pip install quartz-solar-forecast
Run pip install pytest fastapi huggingface_hub
Run huggingface_cli login and entered my write access token

I got the following failures, which all seem environment-related. Let me know if there are any specific setup steps you can think of that I might have missed.

test_run_forecast (FileNotFoundError):
Problem: It looks like the test is trying to load model-0.3.0.pkl, but this file is missing from the path open-source-quartz-solar-forecast/venv/Lib/site-packages/quartz_solar_forecast/models/. I only see model-0.4.0.pkl in that folder within venv, but I see model-0.3.0.pkl and model-0.4.0.pkl in root > quartz_solar_forecast > models. Anything I should know about setting up a venv correctly with this project? Quick fix: Manually copying model-0.3.0.pkl into the venv folder fixes this, but I'm not happy with that solution!

test_run_eval and test_get_pv_metadata (EmptyDataError):
Problem: Both tests fail because no columns are found in the metadata.csv file. Again, I'm slightly suspicious that pv.py, which contains get_pv_metadata, sits within the venv, whilst the generated metadata.csv sits in the project's root > data > pv directory.
Potential fix: Venv sync/setup fix or manual download required?

test_get_pv (OSError):
Problem: Similar to above, the test attempts to load a NetCDF file (data/pv/pv.netcdf) that I can see also exists but is empty.
Potential fix: Venv sync/setup fix or manual download required?

test_get_file_path (ValueError):
Problem: The format string used for generating file paths is invalid, due to platform-specificity.
Proper fix: I've updated test_file_path.py to use an f-string for platform-independent formatting.

aryanbhosale · 2025-01-01T05:58:05Z

Is there a way to speed up CI tests? Locally they are a lot quicker for me? Investigate what is taking a long time in CI

@peterdudfield which one of these?

peterdudfield · 2025-01-02T09:07:58Z

The pytest.yaml is slow in CI. For me they are also quick locally

peterdudfield added the good first issue Good for newcomers label Dec 10, 2024

aryanbhosale linked a pull request Jan 3, 2025 that will close this issue

updated cores in pytest workflow #233

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI tests take 20 mins #231

CI tests take 20 mins #231

peterdudfield commented Dec 10, 2024

laurenceandrews commented Dec 14, 2024

peterdudfield commented Dec 14, 2024

peterdudfield commented Dec 15, 2024

laurenceandrews commented Dec 23, 2024

peterdudfield commented Dec 23, 2024

laurenceandrews commented Dec 29, 2024

aryanbhosale commented Jan 1, 2025

peterdudfield commented Jan 2, 2025

CI tests take 20 mins #231

CI tests take 20 mins #231

Comments

peterdudfield commented Dec 10, 2024

laurenceandrews commented Dec 14, 2024

peterdudfield commented Dec 14, 2024

peterdudfield commented Dec 15, 2024

laurenceandrews commented Dec 23, 2024

peterdudfield commented Dec 23, 2024

laurenceandrews commented Dec 29, 2024

aryanbhosale commented Jan 1, 2025

peterdudfield commented Jan 2, 2025