A python package with container support for Toil pipelines.
Check the example! This package was built to support the 🍪 cookiecutter-toil repository. Singularity versions that have been tested against:
- Singularity 2.4.2
- Singularity 2.6.1
-
📦 Easy Installation
pip install toil_container
-
🐳 Container System Calls
docker_call
andsingularity_call
are functions that run containerized commands with the same calling signature. Be default theexit code
is returned, however you can get thestdout
withcheck_output=True
. You can also set theenv
,cwd
,volumes
andworking_dir
for the container call.working_dir
is used as the/tmp
directory inside the container.from toil_container import docker_call from toil_container import singularity_call cmd = ["cowsay", "hello world"] status = docker_call("docker/whalesay", cmd) output = docker_call("docker/whalesay", cmd, check_output=True) status = singularity_call("docker://docker/whalesay", cmd) output = singularity_call("docker://docker/whalesay", cmd, check_output=True)
-
🛳 Container Job Class
ContainerJob
is a Toil Job Class with acall
method that executes commands with eitherDocker
,Singularity
orSubprocess
depending on image availability. Check out this simple whalesay example! The Job must be constructed with anoptions
argument of the typeargparse.Namespace
that may have the following attributes:attribute action description options.docker
use docker name or path to image options.singularity
use singularity name or path to image options.workDir
set as container /tmp
path to work directory options.volumes
volumes to be mounted list of src, dst tuples -
🔌 Extended LSF functionality
By running with
--batchSystem custom_lsf
, it provides 2 features:- Allows to pass its own
runtime (int)
to each job in LSF using-W
. - Automatic retry of the job by doubling the initial runtime, if the job is killed by
TERM_RUNLIMIT
.
Additionally, it provides an optimization to cache running jobs status from calling all current jobs (
bjobs
) once, instead of one by one.NOTE: The original
toil.Job
class, doesn't provide an option to setruntime
per job. You could only set a wall runtime globally by adding-W <runtime>
inTOIL_LSF_ARGS
. (see: BD2KGenomics/toil#2065). Please note that our hack, encodes theruntime
requirements in the job'sunitName
, so your log files will have a longer name. Let us know if you need more custom parameters or if you know of a better solution 😄 .You can set a default runtime in minutes with environment variableTOIL_CONTAINER_RUNTIME
. Configurecustom_lsf
with the following environment variables:ContainerJob
option description TOIL_CONTAINER_RUNTIME set a default runtime in minutes TOIL_CONTAINER_RETRY_MEM retry memory in integer GB (default "60") TOIL_CONTAINER_RETRY_RUNTIME retry runtime in integer minutes (default "40000") TOIL_CONTAINER_RUNTIME_FLAG bsub runtime flag (default "-W") TOIL_CONTAINER_LSF_PER_CORE 'Y' if lsf resources are per core, and not per job - Allows to pass its own
-
📘 Container Parser With Short Toil Options
ContainerArgumentParser
adds the--docker
,--singularity
and--volumes
arguments to the options namespace. This parser only prints the required toil arguments when using--help
. However, the full list of toil rocketry is printed with--help-toil
. If you don't need the container options but want to use--help-toil
useToilShortArgumentParser
.whalesay.py --help-container usage: whalesay [-h] [-v] [--help-toil] [TOIL OPTIONAL ARGS] jobStore optional arguments: -h, --help show this help message and exit --help-toil show help with toil arguments and exit --help-container show help with container arguments and exit container arguments: --docker name/path of the docker image available in daemon --singularity name/path of the singularity image available in deamon --volumes tuples of (local path, absolute container path) toil arguments: TOIL OPTIONAL ARGS see --help-toil for a full list of toil parameters jobStore the location of the job store for the workflow [REQUIRED]
whalesay.py
is an example that runs a toil pipeline with the famous whalesay docker container. The pipeline can now be executed with either docker, singularity or subprocess.
# whalesay.py
from toil_container import ContainerJob
from toil_container import ContainerArgumentParser
class WhaleSayJob(ContainerJob):
def run(self, fileStore):
"""Run `cowsay` with Docker, Singularity or Subprocess."""
msg = self.call(["cowsay", self.options.msg], check_output=True)
fileStore.logToMaster(msg)
def main():
parser = ContainerArgumentParser()
parser.add_argument("-m", "--msg", default="Hello from the ocean!")
options = parser.parse_args()
job = WhaleSayJob(options=options)
ContainerJob.Runner.startToil(job, options)
if __name__ == "__main__":
main()
Then run:
# run with docker
whalesay.py jobstore -m 'hello world' --docker docker/whalesay
# run with singularity
whalesay.py jobstore -m 'hello world' --singularity docker://docker/whalesay
# if cowsay is available in the environment
whalesay.py jobstore -m 'hello world'
If you want to convert a docker image into a singularity image instead of using the docker://
prefix, check docker2singularity, and use -m '/shared-fs-path /shared-fs-path'
to make sure your shared file system is mounted inside the singularity image.
Contributions are welcome, and they are greatly appreciated, check our contributing guidelines! Make sure you add your name to the contributors list:
- 🐋 Juan S. Medina @jsmedmar
- 🐴 Juan E. Arango @juanesarango
- 🐒 Max F. Levine @mflevine
- 🐼 Joe Zhou @zhouyangyu
- This repo was inspired by toil's implementation of a
Docker Call
and toil_vg interface ofSingularity Calls
. - This package was initiated with Cookiecutter and the audreyr/cookiecutter-pypackage project template.