-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use datalad containers with sandbox #157
Comments
Never encountered such need. Could you please elaborate a bit - what commands you run to "unpack"? Any sample public singularity image (if image type specific)? |
Hi Yaroslav, of course. The command I call (being located in the root of a subject specific ephemeral clone) is The container installed with datalad containers-add is The containers-add command is
Environment variables used here: PROJ_DIR=. (superdataset root) During test runs on a single subject the container works well when unpacked, but until then ~15 minutes pass. When I run multiple subjects on one node in parallel converting the containers does take hours rendering the whole computation effectively sequential. That's why I want to avoid unpacking for every subject run. We are aiming for computationally optimized processing of imaging data on our HPC using datalad for provenance tracking and collaboration. My scripts are pipelines_submission.txt pipelines_parallelization.txt pipelines_processing.txt smriprep.txt Hope that clarifies a bit what my problem is and what I am trying to achieve. Using datalad for all these things is a little bit overwhelming for me at the moment. Thanks a lot in advance. Regards, |
I see -- so it is conversion of container from docker to singularity upon each run. here is a test on smaller container to confirm thatsmaug:/tmp
$> datalad create containers-add-docker-test
[INFO ] Creating a new annex repo at /tmp/containers-add-docker-test
[INFO ] Scanning for unlocked files (this may take some time)
create(ok): /tmp/containers-add-docker-test (dataset)
2 5442.....................................:Thu 29 Jul 2021 09:44:42 AM EDT:.
smaug:/tmp
$> cd containers-add-docker-test
2 5443.....................................:Thu 29 Jul 2021 09:44:43 AM EDT:.
(git)smaug:/tmp/containers-add-docker-test[master]
$> datalad containers-add --url docker://neurodebian:nd100 test-docker
[INFO ] Building Singularity image for docker://neurodebian:nd100 (this may take some time)
INFO: Starting build...
Getting image source signatures
Copying blob 627b765e08d1 skipped: already exists
Copying blob ff66d7acb9e0 skipped: already exists
Copying blob 4ac627f2d764 skipped: already exists
Copying blob b33c3e9e07dc skipped: already exists
Copying blob 2c9c4b1dfc17 skipped: already exists
Copying config 6a5f86f6be done
Writing manifest to image destination
Storing signatures
2021/07/29 09:44:57 info unpack layer: sha256:627b765e08d177e63c9a202ca4991b711448905b934435c70b7cbd7d4a9c7959
2021/07/29 09:44:59 info unpack layer: sha256:ff66d7acb9e05c47e0621027afb45a8dfa4665301a45f8f794a16bd8c8ae8205
2021/07/29 09:44:59 info unpack layer: sha256:4ac627f2d764f56d7380099754dd943f54a31247e9400d632a215b8bc4ec5fa2
2021/07/29 09:44:59 info unpack layer: sha256:b33c3e9e07dc85213c696c81aff06d286ba379571b799845954b6cc776457e5f
2021/07/29 09:44:59 info unpack layer: sha256:2c9c4b1dfc179a7cb39a5bcbff0f4d529c84d1fab9395e799ab8a0350c620e58
INFO: Creating SIF file...
INFO: Build complete: image
[WARNING] Got jobs=6 but we cannot use threads with Pythons versions prior 3.8.0. Will run serially
add(ok): .datalad/config (file)
add(ok): .datalad/environments/test-docker/image (file)
save(ok): . (dataset)
containers_add(ok): /tmp/containers-add-docker-test/.datalad/environments/test-docker/image (file)
action summary:
add (ok: 2)
containers_add (ok: 1)
save (ok: 1)
datalad containers-add --url docker://neurodebian:nd100 test-docker 24.41s user 2.23s system 165% cpu 16.096 total
2 5444.....................................:Thu 29 Jul 2021 09:45:08 AM EDT:.
(git)smaug:/tmp/containers-add-docker-test[master]
$> datalad containers-run -n test-docker ls
[INFO ] Making sure inputs are available (this may take some time)
[INFO ] == Command start (output follows) =====
[INFO ] == Command exit (modification check follows) =====
[WARNING] Got jobs=6 but we cannot use threads with Pythons versions prior 3.8.0. Will run serially
action summary:
get (notneeded: 1)
save (notneeded: 1) so I am still not 100% sure what conversion in relation to singularity we are dealing with. May be you have some output/logging which shows ?
so are you talking about those 15 minutes as the time of concern? My guess it is |
I think there is a misunderstanding. I install the singularity container from a local datalad dataset containing predownloaded singularity containers (from dockerhub) called ENV_DIR before executing a SLURM submission. I do not download any singularity containers from dockerhub on the fly during the respective job. It's the conversion of the singularity container image to a sandbox that is enforced on our HPC (I think because of incompatibility of the file system with singularity) that takes a long time or is practically impossible when done multiple times in parallel.
The workaround I am now establishing is to execute |
sorry we forgot about this issue discussion.
still not clear to me (please point/cite specific lines) on what exactly such conversion entails? My only guess: copy singularity container from a partition where they cannot be executed from (e.g.
well, |
Thanks a lot for your reply. By now we have established another solution foregoing datalad. What I mean by a sandbox is the conversion of the singularity container image to a writable directory (https://sylabs.io/guides/3.0/user-guide/build_a_container.html#creating-writable-sandbox-directories). Automated conversion to of the containers to sandbox directories before using them is enforced on our HPC and due to its very slow filesystem this process takes forever hampering computations. |
FWIW, as the command to actually execute the container is fully configurable in |
Hi,
our HPC enforces unpacking singularity containers to sandboxes which takes a really long time if done multiple times in parallel. One way to circumvent unpacking all the time would be to use a preconverted sandboxes. Is there a way to use datalad containers with a sandbox?
Thanks!
The text was updated successfully, but these errors were encountered: