Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker adapter: Provide better handling of non-unique references #105

Closed
kyleam opened this issue Sep 17, 2019 · 5 comments · Fixed by #206
Closed

docker adapter: Provide better handling of non-unique references #105

kyleam opened this issue Sep 17, 2019 · 5 comments · Fixed by #206
Assignees
Labels
docker Issues relating to docker support easy

Comments

@kyleam
Copy link
Contributor

kyleam commented Sep 17, 2019

As I was working on gh-98, I hit into a case where save resulted in a multi-image directory that load can't handle.

To trigger this, set up at least two images so that a reference without a tag will match both.

$ docker pull busybox:musl
$ docker pull busybox:latest
$ docker images busybox
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
busybox             musl                8cd3c91eb512        13 days ago         1.44MB
busybox             latest              19485c79a9bb        13 days ago         1.22MB

If we call save with the image ID or a tag that narrows things to a single image, things work fine:

$ cd $(mktemp -d --tmpdir dc-XXXXXXX)
$ python -m datalad_container.adapters.docker save busybox:latest latest
[INFO   ] Saved busybox:latest to latest 
Saved busybox:latest to latest
$ python -m datalad_container.adapters.docker run latest ls
latest  notag

But if we leave off the tag, save will dump everything matched, and load will rightly complain that the directory doesn't have a unique image ID.

$ python -m datalad_container.adapters.docker save busybox notag
[INFO   ] Saved busybox to notag
Saved busybox to notag
$ python -m datalad_container.adapters.docker run notag ls
[...]
  File "/home/kyle/src/python/datalad-container/datalad_container/adapters/docker.py", line 95, in load
    image_id = "sha256:" + get_image(path)
[...]
ValueError: Could not find a unique JSON configuration object in notag

The tree output for both these directories is included below. Note the multiple top-level json files.

save should check that the directory it produces only has one image and abort if it doesn't.

tree output
notag
|-- 0520e8d0c59f650aa60000987f6ac9ce99121c38c4283ef6eb92689e28c45144
|   |-- json
|   |-- layer.tar
|   `-- VERSION
|-- 19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d.json
|-- 65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542
|   |-- json
|   |-- layer.tar
|   `-- VERSION
|-- 8cd3c91eb5121065cdbae44e77b70a0b2848904d0d75fd8a9ecffb2747b3d741.json
|-- manifest.json
`-- repositories
latest
|-- 19485c79a9bbdca205fce4f791efeaa2a103e23431434696cc54fdd939e9198d.json
|-- 65836406f9479e26bb2dc27439df3efdae3c298edd1ea781dcb3ac7a7baae542
|   |-- json
|   |-- layer.tar
|   `-- VERSION
|-- manifest.json
`-- repositories
@kyleam kyleam self-assigned this Sep 17, 2019
@yarikoptic yarikoptic added the docker Issues relating to docker support label Aug 30, 2022
@yarikoptic yarikoptic assigned jwodder and unassigned kyleam Mar 10, 2023
@yarikoptic
Copy link
Member

I ran into exactly the same problem only now (4 years later) with alpine image.

Here is the content of `.datalad/environments/alpine/image/manifest.json` I got after `datalad containers-add --url dhub://alpine alpine`
[
  {
    "Config": "af341ccd2df8b0e2d67cf8dd32e087bfda4e5756ebd1c76bbf3efa0dc246590e.json",
    "RepoTags": [
      "alpine:3.10"
    ],
    "Layers": [
      "71dba1fabbde4baabcdebcde4895d3f3887e388b09cef162f8159cf7daffa1b8/layer.tar"
    ]
  },
  {
    "Config": "43773d1dba76c4d537b494a8454558a41729b92aa2ad0feb23521c3e58cd0440.json",
    "RepoTags": [
      "alpine:3.6"
    ],
    "Layers": [
      "eea126ee9bd5ff1ae7944088f4d6b32abb9874a6b00935646b5526e1b57ceb86/layer.tar"
    ]
  },
  {
    "Config": "b2aa39c304c27b96c1fef0c06bee651ac9241d49c4fe34381cab8453f9a89c7d.json",
    "RepoTags": [
      "alpine:latest"
    ],
    "Layers": [
      "31f5b0c484b3651f7c883d1be2ef442c2da71d103bc7ab20cd0b1f825e67e4e7/layer.tar"
    ]
  },
  {
    "Config": "a24bb4013296f61e89ba57005a7b3e52274d8edd3ae2077d04395f806b63d83e.json",
    "RepoTags": null,
    "Layers": [
      "fa27ccac46e9a5b7ff1d188e99763db1a5fb45adb34a917c1c482dfde99ad432/layer.tar"
    ]
  }
]

The reason is that docker save IMAGE without tag would export all of them and manifest.json would list all of them. Before we address save issue (not yet sure how), to make it possible for people to use potentially already multiple saved images, I think we should

  • allow for options to python -m datalad_container.adapters.docker run
    • --repo-tag TAG to select among available configs by TAG
    • --config IDPREFIX , where IDPREFIX could be beginning sufficient to tell one config from another. I.e. in my case could be full af341ccd2df8b0e2d67cf8dd32e087bfda4e5756ebd1c76bbf3efa0dc246590e.json or just af since af is already uniquely identifies config
  • in that exception
    • print path to that manifest.json and inform that multiple configs and tags available.
    • inform that user can use --repo-tag or --config option in invocation to run to disambiguate which config to use/run. Inform that such invocation is within .datalad/config

Implementation could be tested on CI on alpine images by docker pulling first a few distinct tags, and then adding that container and trying to run it, adjusting config upon error, and trying to run again.

@jwodder
Copy link
Member

jwodder commented Mar 13, 2023

@yarikoptic Is the TAG argument to --repo-tag supposed to be just the image "version" (e.g. 3.10) or the image name as listed in manifest.json (e.g., alpine:3.10)?

Inform that such invocation is within .datalad/config

I don't know what you mean by this.

@yarikoptic
Copy link
Member

I would say full alpine:3.10 so it is all explicit. who knows -- may be some images would have full tags with different prefix etc.

@yarikoptic Is the TAG argument to --repo-tag supposed to be just the image "version" (e.g. 3.10) or the image name as listed in manifest.json (e.g., alpine:3.10)?

Inform that such invocation is within .datalad/config

I don't know what you mean by this.

Something like, "You might want to edit .datalad/config file containing the invocation of docker adapter and specify --repo-tag or --config to disambiguate".

@yarikoptic
Copy link
Member

#202 is about to resolve run issue (thank you @jwodder).

For the save issue:I think docker runs :latest tag if none is specified explicitly, so let's do for save:

  • if no :TAG provided , assume :latest (since that is AFAIK what docker run does/uses)

The test within https://github.com/datalad/datalad-container/pull/202/files#diff-62bc40a7a5373426a633182d1a054499477c5154895e45a0251d3a141a42579fR118 would no longer be able to create a problematic case it is testing unfortunately, but could just simulate by creating a sample manifest.json with multiple versions inplace.

The test for this change would be just in that unittest that it would export only the :latest

yarikoptic added a commit that referenced this issue Mar 13, 2023
Add `--repo-tag` and `--config` to `datalad_container.adapters.docker run`
@github-actions
Copy link

Issue fixed in 1.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker Issues relating to docker support easy
Projects
None yet
3 participants