[question] Best practices for handling test data in a CI pipeline #17133

AndreasAckermannTSystems · 2024-10-08T14:27:07Z

What is your question?

Hi everyone,

I'm working on establishing Conan packages and a CI pipeline for a product line of applications built from a shared set of modules, each contained in its own repository. These modules (e.g. modA and modB) all access a common database via classes contained in a module called orm.

The orm repository also contains database dumps for a test-database, an import script, and a database configuration file used by our applications unit tests. modA and modB's tests expect an initialized test database, and to have been provided this configuration file in a well-known location relative to their test binaries.

The CI pipeline runs in ephemeral Docker containers for each repository, and as such, the test database needs to be recreated by importing the dumps on each run.

My current intended approach is the following:

In orm, package the test database dumps, config file and an importer script into the orm package
- Potential issue: Wasted space due to increased package sizes, as the test database changes rarely
In modA, initialize the test database during the conanfile.py's build method, if a CI=1 environment variable is detected, by copying out the config file from orm and executing the data import script contained there as well

Are there best practices / better ways to handle test data with Conan in a CI pipeline setting?

Have you read the CONTRIBUTING guide?

I've read the CONTRIBUTING guide

The text was updated successfully, but these errors were encountered:

memsharded · 2024-10-08T15:15:15Z

Hi @AndreasAckermannTSystems

Thanks for your question

As a general guideline for CI at scale, the ongoing work in conan-io/docs#3799 might be useful, hopefully it can be published soon, but you might be able to generate the docs locally. This is not really about your questions, but it might be useful for the general issues of defining a CI pipeline.

Regarding your question, indeed you could put more artifacts inside the orm package, but as you pointed out, the size of the dump and the other files might be relevant, specially if you use the orm library artifacts very often without those test artifacts.

If the balance points that this could be a real problem, then there could be some alternatives to consider, like storing the test artifacts in a separate package that can be used as test_requires, or maybe using the "package metadata files" feature. But I think I'd probably start by putting things in the orm package and learn from there (unless you tell me the DB test dump would be like GBs in size)

if a CI=1 environment variable is detected

In general, it is better for Conan to model things more explicitly, like using Conan conf mechanism, the idea is that things can be easily reproduce locally, and tests executed by developers in their machines just by conan install ... -c user.myorg:build_tests=True or something like that. And also, otherwise, you can easily run fast jobs in CI that don't run those heavy tests, but might run other tersts. Note there are also some built-in confs like tools.build:skip_test that could be used in recipes already.

So having a bit more info about the test artifacts sizes and patterns/frequency of usage, could help deciding in one direction or another.

memsharded · 2024-10-09T11:18:24Z

Another important aspect to take into account would be the time of building the orm thing.
If it is fast enough, then it wouldn't be a concern to just re-build things to create a separate orm_data package, separate from the orm one containing the actual libraries. That orm_data package could be used for example as test_requires.

memsharded · 2024-10-18T11:35:54Z

Hi @AndreasAckermannTSystems

Any further question or clarification? Thanks for the feedback!

AndreasAckermannTSystems · 2024-10-22T09:44:27Z

Hi @memsharded,

thanks so much for your advice! Unfortunately other tasks came up in the meantime which is why I could not yet fully implement the suggestions, but I'm happy to share an update :)

I went with including the database dumps in the orm package and adopted a user.myorg.myproject:import_test_database_dumps config variable. I then created a new profile with -ci suffix where this is set and use it during the conan create call in the CI pipeline. This works well for executing the tests during the orm build.

I unfortunately did not yet get to setting up the build for the downstream repositories for modA and modB. Once that's done, I'll provide another update.

Thanks again :)

memsharded · 2024-10-22T11:14:57Z

Thanks very much for the feedback @AndreasAckermannTSystems !

This works well for executing the tests during the orm build.

Sounds good, happy that it is working.

If you don't need more advice or feedback from us at the moment, it is fine to close the ticket, and when you have any further question or feedback, you can re-open or create a new ticket (depending on the case, as you wish).

memsharded self-assigned this Oct 8, 2024

memsharded added type: question responded Responded by Conan team labels Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Best practices for handling test data in a CI pipeline #17133

[question] Best practices for handling test data in a CI pipeline #17133

AndreasAckermannTSystems commented Oct 8, 2024

memsharded commented Oct 8, 2024

memsharded commented Oct 9, 2024

memsharded commented Oct 18, 2024

AndreasAckermannTSystems commented Oct 22, 2024

memsharded commented Oct 22, 2024

[question] Best practices for handling test data in a CI pipeline #17133

[question] Best practices for handling test data in a CI pipeline #17133

Comments

AndreasAckermannTSystems commented Oct 8, 2024

What is your question?

Have you read the CONTRIBUTING guide?

memsharded commented Oct 8, 2024

memsharded commented Oct 9, 2024

memsharded commented Oct 18, 2024

AndreasAckermannTSystems commented Oct 22, 2024

memsharded commented Oct 22, 2024