This repo is a sandbox for support code for running, testing workflows on Dockstore, and indexing GA4GH tool registries. Send issues to the main dockstore repo.
Your environment needs to have the following items:
Before you can run the script, you must generate the jar files.
cd toolbackup && mvn clean install
After you have installed Maven and Docker, you may wish to run the tests; in which case, you need S3Proxy.
docker pull andrewgaul/s3proxy
docker run -d --publish 8080:80 --env S3PROXY_AUTHORIZATION=none andrewgaul/s3proxy
To use the script, you must provide an endpoint in ~/.toolbackup/config.ini
token = XXX
server-url = https://dockstore.org:443/api
endpoint = XXX
By default the token is empty. The default value for the server-url is shown above. These two values are for retrieving dockstore tools.Only the endpoint is mandatory. You can set up your config like so:
endpoint = XXX
If you do not have ~/.aws/credentials, during testing, the script will generate this file with only the default profile. The default profile is necessary for this script's tests. If you have the credentials file but it is missing the default profile, you must add it in.
[default]
aws_access_key_id=MOCK_ACCESS_KEY
aws_secret_access_key=MOCK_SECRET_KEY
You must supply the file with a dockstore profile and the proper keys.
[default]
aws_access_key_id=MOCK_ACCESS_KEY
aws_secret_access_key=MOCK_SECRET_KEY
[dockstore]
aws_access_key_id=MOCK_ACCESS_KEY
aws_secret_access_key=MOCK_SECRET_KEY
This is the script to backup dockstore images from quay.io into Openstack
java -jar target/client.jar --bucket-name clientbucket --key-prefix client --local-dir /home/ubuntu/clientEx --test-mode-activate true
We are running with test mode activated which means we will not download all dockstore images. The dockstore images targeted will be stored on Openstack in the bucket clientbucket and in the key-prefix client within the bucket. The bucket and key-prefix need not have been created. The directory, /home/ubuntu/clientEx will act as temporary storage and it need not to have already been created.
Client will pull all the GA4GH tools from the server-url and save them locally. It will not pull a docker image again if its size has not changed. It will then detect which files need to be uploaded to OpenStack based on file sizes. However, the script will always upload report files in the report subdirectory. This subdirectory will be generated in the local directory that the user specified. For this example, /home/ubuntu/clientEx/report will be generated containing the HTML report and map.JSON.
[
{
"toolname": "dockstore-tool-bamstats",
"versions": [
{
"version": "develop",
"metaVersion": "2016-11-11 18:17:46.0",
"dockerSize": 541073189,
"fileSize": 558081536,
"valid": true,
"timesOfExecution": [
"10-02-2017 10:39:55",
"10-02-2017 10:40:59"
],
"path": ""
},
{
"version": "develop",
"metaVersion": "2016-11-11 18:17:46.0",
"dockerSize": 681073489,
"fileSize": 692073489,
"valid": true,
"timesOfExecution": [
"10-02-2017 11:28:39"
],
"path": "/home/kcao/clientEx/quay.io/collaboratory/dockstore-tool-bamstats/develop.tar"
}
}
]
Here is an example map.JSON file. It keeps track of tools and their versions and meta-versions. It also contains information about an image's size on docker and its file size when saved locally. If an image was not able to be pulled, its valid field would be false. The timesOfExecution tracks the times the script has executed and the version of this tool's version has remained the same. The path refers to the saved image's local file path. If the image changes, as shown here, there would be a new version object. If the modified image is valid, the old version will have an empty path.
The index.html is the main menu for all GA4GH tools. It also displays how many GBs have been added to cloud as well as how many GBs were previously on the cloud. The individual tool reports which can be accessed on index.html displays the information in map.JSON in a more readable format with the following columns:
- Version
- Meta-Version (API)
- Size (GB)
- Recent Executions
- Availability
- File Path
Please note that in Recent Executions in the report, it will show at most three times of execution. Availability is the same as valid in the JSON file.
This is the script to download images from OpenStack to the user's local file system.
java -jar target/downloader.jar --bucket-name clientbucket --key-prefix client --destination-dir /home/ubuntu/downloaderEx
We are downloading everything in the key-prefix client within the bucket clientbucket into a directory that need not have already been created, /home/ubuntu/downloaderEx.
The tests do not require a configuration file, but if you wish to set up the values yourself, you can add to the aforementioned ~/.toolbackup/config.ini The default values are shown here.
bucket = testbucket
prefix = testprefix
img = docker/whalesay
baseDir = /home/ubuntu/dockstore-saver
dir = /home/ubuntu/dockstore-saver/dir
checkSizeDir = /home/ubuntu/dockstore-saver/checkSize
[nonexistent]
bucket = dockstore-saver-gibberish
dir = dockstore-saver-gibberish
img = dockstore-saver-gibberish
- bucket: Amazon bucket you wish to use for testing the client and downloader
- prefix: Consider it a "subdirectory" of the bucket
- img: A valid image that can be pulled by any environment which has Docker
- baseDir: A local directory which will not be deleted
- dir: A local directory which will be deleted, if not specified it will be ~/...baseDir.../dir
- checkSizeDir: A local directory to test that the calculation of files' sizes is correct
- nonexistent.bucket: A non-existent bucket
- nonexistent.dir: A non-existent local directory
- nonexistent.img: A non-existent Docker image
The tests will clean up everything but the baseDir. It would be best if you specify directories which do not currently exist, even for the baseDir.