rapids-dependency-file-generator
is a Python CLI tool that generates conda environment.yaml
files and requirements.txt
files from a single YAML file, typically named dependencies.yaml
.
When installed, it makes the rapids-dependency-file-generator
CLI command available which is responsible for parsing a dependencies.yaml
configuration file and generating the appropriate conda environment.yaml
and requirements.txt
dependency files.
rapids-dependency-file-generator
is available on PyPI. To install, run:
pip install rapids-dependency-file-generator
When rapids-dependency-file-generator
is invoked, it will read a dependencies.yaml
file from the current directory and generate children dependency files.
The dependencies.yaml
file has the following characteristics:
- it is intended to be committed to the root directory of repositories
- it can define matrices that enable the output dependency files to vary according to any arbitrary specification (or combination of specifications), including CUDA version, machine architecture, Python version, etc.
- it contains bifurcated lists of dependencies based on the dependency's purpose (i.e. build, runtime, test, etc.). The bifurcated dependency lists are merged according to the description in the "How Dependency Lists Are Merged" section below.
The dependencies.yaml
file has three relevant top-level keys: files
, channels
, and dependencies
. These keys are described in detail below.
The top-level files
key is responsible for determining the following:
- which types of dependency files should be generated (i.e. conda
environment.yaml
files and/orrequirements.txt
files) - where the generated files should be written to (relative to the
dependencies.yaml
file) - which variant files should be generated (based on the provided matrix)
- which of the dependency lists from the top-level
dependencies
key should be included in the generated files
Here is an example of what the files
key might look like:
files:
all: # used as the prefix for the generated dependency file names for conda or requirements files (has no effect on pyproject.toml files)
output: [conda, requirements] # which dependency file types to generate. required, can be "conda", "requirements", "pyproject", "none" or a list of non-"none" values
conda_dir: conda/environments # where to put conda environment.yaml files. optional, defaults to "conda/environments"
requirements_dir: python/cudf # where to put requirements.txt files. optional, but recommended. defaults to "python"
pyproject_dir: python/cudf # where to put pyproject.toml files. optional, but recommended. defaults to "python"
matrix: # (optional) contains an arbitrary set of key/value pairs to determine which dependency files that should be generated. These values are included in the output filename.
cuda: ["11.5", "11.6"] # which CUDA version variant files to generate.
arch: [x86_64] # which architecture version variant files to generate. This value should be the result of running the `arch` command on a given machine.
includes: # a list of keys from the `dependencies` section which should be included in the generated files
- build
- test
- runtime
build: # multiple `files` children keys can be specified
output: requirements
conda_dir: conda/environments
requirements_dir: python/cudf
matrix:
cuda: ["11.5"]
arch: [x86_64]
py: ["3.8"]
includes:
- build
The result of the above configuration is that the following dependency files would be generated:
conda/environments/all_cuda-115_arch-x86_64.yaml
conda/environments/all_cuda-116_arch-x86_64.yaml
python/cudf/requirements_all_cuda-115_arch-x86_64.txt
python/cudf/requirements_all_cuda-116_arch-x86_64.txt
python/cudf/requirements_build_cuda-115_arch-x86_64_py-38.txt
The all*.yaml
and requirements_all*.txt
files would include the contents of the build
, test
, and runtime
dependency lists from the top-level dependency
key. The requirements_build*.txt
file would only include the contents of the build
dependency list from the top-level dependency
key.
The value of output
can also be none
as shown below.
files:
test:
output: none
includes:
- test
When output: none
is used, the conda_dir
, requirements_dir
and matrix
keys can be omitted. The use case for output: none
is described in the Additional CLI Notes section below.
A given file may include an extras
entry that may be used to provide inputs specific to a particular file type
Here is an example:
files:
build:
output: pyproject
includes: # a list of keys from the `dependencies` section which should be included in the generated files
- build
extras:
table: table_name
key: key_name
Currently the supported extras by file type are:
- pyproject.toml
- table: The table in pyproject.toml where the dependencies should be written. Acceptable values are "build-system", "project", and "project.optional-dependencies".
- key: The key corresponding to the dependency list in
table
. This may only be provided for the "project.optional-dependencies" table since the key name is fixed for "build-system" ("requires") and "project" ("dependencies"). Note that this implicitly prohibits including optional dependencies via an inline table under the "project" table.
The top-level channels
key specifies the channels that should be included in any generated conda environment.yaml
files.
It might look like this:
channels:
- rapidsai
- conda-forge
In the absence of a channels
key, some sensible defaults for RAPIDS will be used (see constants.py).
The top-level dependencies
key is where the bifurcated dependency lists should be specified.
Underneath the dependencies
key are sets of key-value pairs. For each pair, the key can be arbitrarily named, but should match an item from the includes
list of any files
entry.
The value of each key-value pair can have the following children keys:
common
- contains dependency lists that are the same across all matrix variationsspecific
- contains dependency lists that are specific to a particular matrix combination
The values of each of these keys are described in detail below.
The common
key contains a list of objects with the following keys:
output_types
- a list of output types (e.g. "conda" forenvironment.yaml
files or "requirements" forrequirements.txt
files) for the packages in thepackages
keypackages
- a list of packages to be included in the generated output file
The specific
key contains a list of objects with the following keys:
output_types
- same asoutput_types
for thecommon
key abovematrices
- a list of objects (described below) which define packages that are specific to a particular matrix combination
Each list item under the matrices
key contains a matrix
key and a packages
key.
The matrix
key is used to define which matrix combinations from files.[*].matrix
will use the associated packages.
The packages
key is a list of packages to be included in the generated output file for a matching matrix.
This is elaborated on in How Dependency Lists Are Merged.
An example of the above structure is exemplified below:
dependencies:
build: # dependency list name
common: # dependencies common among all matrix variations
- output_types: [conda, requirements] # the output types this list item should apply to
packages:
- common_build_dep
- output_types: conda
packages:
- cupy
- pip: # supports `pip` key for conda environment.yaml files
- some_random_dep
specific: # dependencies specific to a particular matrix combination
- output_types: conda # dependencies specific to conda environment.yaml files
matrices:
- matrix:
cuda: "11.5"
packages:
- cudatoolkit=11.5
- matrix:
cuda: "11.6"
packages:
- cudatoolkit=11.6
- matrix: # an empty matrix entry serves as a fallback if there are no other matrix matches
packages:
- cudatoolkit
- output_types: [conda, requirements]
matrices:
- matrix: # dependencies specific to x86_64 and 11.5
cuda: "11.5"
arch: x86_64
packages:
- a_random_x86_115_specific_dep
- matrix: # an empty matrix/package entry to prevent error from being thrown for non 11.5 and x86_64 matches
packages:
- output_types: requirements # dependencies specific to requirements.txt files
matrices:
- matrix:
cuda: "11.5"
packages:
- another_random_dep=11.5.0
- matrix:
cuda: "11.6"
packages:
- another_random_dep=11.6.0
test:
common:
- output_types: [conda, requirements]
packages:
- pytest
The information from the top-level files
and dependencies
keys are used to determine which dependencies should be included in the final output of the generated dependency files.
Consider the following top-level files
key configuration:
files:
all:
output: conda
conda_dir: conda/environments
requirements_dir: python/cudf
matrix:
cuda: ["11.5", "11.6"]
arch: [x86_64]
includes:
- build
- test
In this example, rapids-dependency-file-generator
will generate two conda environment files: conda/environments/all_cuda-115_arch-x86_64.yaml
and conda/environments/all_cuda-116_arch-x86_64.yaml
.
Since the output
value is conda
, rapids-dependency-file-generator
will iterate through any dependencies.build.common
and dependencies.test.common
list entries and use the packages
of any entry whose output_types
key is conda
or [conda, ...]
.
Further, for the 11.5
and x86_64
matrix combination, any build.specific
and test.specific
list items whose output includes conda
and whose matrices
list items matches any of the definitions below would also be merged:
specific:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
packages:
- some_dep1
- some_dep2
# or
specific:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
arch: "x86_64"
packages:
- some_dep1
- some_dep2
# or
specific:
- output_types: conda
matrices:
- matrix:
arch: "x86_64"
packages:
- some_dep1
- some_dep2
Every matrices
list must have a match for a given input matrix (only the first matching matrix in the list of matrices
will be used).
If no matches are found for a particular matrix combination, an error will be thrown.
In instances where an error should not be thrown, an empty matrix
and packages
list item can be used:
- output_types: conda
matrices:
- matrix:
cuda: "11.5"
arch: x86_64
py: "3.8"
packages:
- a_very_specific_115_x86_38_dep
- matrix: # an empty matrix entry serves as a fallback if there are no other matrix matches
packages:
Merged dependency lists are sorted and deduped.
Invoking rapids-dependency-file-generator
without any arguments is meant to be the default behavior for RAPIDS developers. It will generate all of the necessary dependency files as specified in the top-level files
configuration.
However, there are CLI arguments that can augment the files
configuration values before the files are generated.
Consider the example when output: none
is used:
files:
test:
output: none
includes:
- test
The test
file generated by the configuration above is useful for CI, but it might not make sense to necessarily commit those files to a repository. In such a scenario, the following CLI arguments can be used:
ENV_NAME="cudf_test"
rapids-dependency-file-generator \
--file-key "test" \
--output "conda" \
--matrix "cuda=12.5;arch=$(arch)" > env.yaml
mamba env create --file env.yaml
mamba activate "$ENV_NAME"
# install cudf packages built in CI and test them in newly created environment...
The --file-key
argument is passed the test
key name from the files
configuration. Additional flags are used to generate a single dependency file. When the CLI is used in this fashion, it will print to stdout
instead of writing the resulting contents to the filesystem.
The --file-key
, --output
, and --matrix
flags must be used together. --matrix
may be an empty string if the file that should be generated does not depend on any specific matrix variations.
Where multiple values for the same key are passed to --matrix
, e.g. cuda_suffixed=true;cuda_suffixed=false
, only the last value will be used.
Where --file-key
is supplied multiple times in the same invocation, the output printed to stdout
will contain a union (without duplicates) of all of the corresponding dependencies. For example:
rapids-dependency-file-generator \
--file-key "test" \
--file-key "test_notebooks" \
--output "conda" \
--matrix "cuda=12.5;arch=$(arch)" > env.yaml
The --prepend-channel
argument accepts additional channels to use, like rapids-dependency-file-generator --prepend-channel my_channel --prepend-channel my_other_channel
.
If both --output
and --prepend-channel
are provided, the output format must be conda.
Prepending channels can be useful for adding local channels with packages to be tested in CI workflows.
Running rapids-dependency-file-generator -h
will show the most up-to-date CLI arguments.