Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Best practices for handling test data in a CI pipeline #17133

Open
1 task done
AndreasAckermannTSystems opened this issue Oct 8, 2024 · 5 comments
Open
1 task done
Assignees
Labels
responded Responded by Conan team type: question

Comments

@AndreasAckermannTSystems

What is your question?

Hi everyone,

I'm working on establishing Conan packages and a CI pipeline for a product line of applications built from a shared set of modules, each contained in its own repository. These modules (e.g. modA and modB) all access a common database via classes contained in a module called orm.

The orm repository also contains database dumps for a test-database, an import script, and a database configuration file used by our applications unit tests. modA and modB's tests expect an initialized test database, and to have been provided this configuration file in a well-known location relative to their test binaries.

The CI pipeline runs in ephemeral Docker containers for each repository, and as such, the test database needs to be recreated by importing the dumps on each run.

My current intended approach is the following:

  • In orm, package the test database dumps, config file and an importer script into the orm package
    • Potential issue: Wasted space due to increased package sizes, as the test database changes rarely
  • In modA, initialize the test database during the conanfile.py's build method, if a CI=1 environment variable is detected, by copying out the config file from orm and executing the data import script contained there as well

Are there best practices / better ways to handle test data with Conan in a CI pipeline setting?

Have you read the CONTRIBUTING guide?

  • I've read the CONTRIBUTING guide
@memsharded memsharded self-assigned this Oct 8, 2024
@memsharded
Copy link
Member

Hi @AndreasAckermannTSystems

Thanks for your question

As a general guideline for CI at scale, the ongoing work in conan-io/docs#3799 might be useful, hopefully it can be published soon, but you might be able to generate the docs locally. This is not really about your questions, but it might be useful for the general issues of defining a CI pipeline.

Regarding your question, indeed you could put more artifacts inside the orm package, but as you pointed out, the size of the dump and the other files might be relevant, specially if you use the orm library artifacts very often without those test artifacts.

If the balance points that this could be a real problem, then there could be some alternatives to consider, like storing the test artifacts in a separate package that can be used as test_requires, or maybe using the "package metadata files" feature. But I think I'd probably start by putting things in the orm package and learn from there (unless you tell me the DB test dump would be like GBs in size)

if a CI=1 environment variable is detected

In general, it is better for Conan to model things more explicitly, like using Conan conf mechanism, the idea is that things can be easily reproduce locally, and tests executed by developers in their machines just by conan install ... -c user.myorg:build_tests=True or something like that. And also, otherwise, you can easily run fast jobs in CI that don't run those heavy tests, but might run other tersts. Note there are also some built-in confs like tools.build:skip_test that could be used in recipes already.

So having a bit more info about the test artifacts sizes and patterns/frequency of usage, could help deciding in one direction or another.

@memsharded
Copy link
Member

Another important aspect to take into account would be the time of building the orm thing.
If it is fast enough, then it wouldn't be a concern to just re-build things to create a separate orm_data package, separate from the orm one containing the actual libraries. That orm_data package could be used for example as test_requires.

@memsharded
Copy link
Member

Hi @AndreasAckermannTSystems

Any further question or clarification? Thanks for the feedback!

@memsharded memsharded added type: question responded Responded by Conan team labels Oct 18, 2024
@AndreasAckermannTSystems
Copy link
Author

Hi @memsharded,

thanks so much for your advice! Unfortunately other tasks came up in the meantime which is why I could not yet fully implement the suggestions, but I'm happy to share an update :)

I went with including the database dumps in the orm package and adopted a user.myorg.myproject:import_test_database_dumps config variable. I then created a new profile with -ci suffix where this is set and use it during the conan create call in the CI pipeline. This works well for executing the tests during the orm build.

I unfortunately did not yet get to setting up the build for the downstream repositories for modA and modB. Once that's done, I'll provide another update.

Thanks again :)

@memsharded
Copy link
Member

Thanks very much for the feedback @AndreasAckermannTSystems !

This works well for executing the tests during the orm build.

Sounds good, happy that it is working.

If you don't need more advice or feedback from us at the moment, it is fine to close the ticket, and when you have any further question or feedback, you can re-open or create a new ticket (depending on the case, as you wish).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
responded Responded by Conan team type: question
Projects
None yet
Development

No branches or pull requests

2 participants