You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current workflow to run algorithm X on dataset Y is something like this:
python install.py builds docker container
python create_dataset.py sets up datasets
python run.py --dataset Y --algorithm X mounts data/, results/, benchmark/ into the container for X
it takes care of parsing the definitions file and checking present runs to figure out which runs to carry out.
py-docker is used to spawn the container from within the Python process
results are written to results/
python plot.py / data_export.py / ... to evaluate the results
Given @harsha-simhadri's and @sourcesync's frustrations and some directions discussed in other meetings, I think we should relax step 3 a bit and allow more flexibility in the container setup. One direction could look like this:
python install.py builds docker container, participants are expected to overwrite the entry point to point to their own implementation (file algorithms/X/Dockerfile)
python create_dataset.py sets up datasets
A python/shell script that contains the logic to run the container for X, (in algorithms/X/run.{py,sh})
as arguments, we provide task, dataset, where the results should be written, and some additional parameters
we mount data/, results/, and the config file that is used by the implementation (algorithms/X/config.yaml, maybe task specific)
The following is done by the implementation in the container:
a. file I/O in the container, loading/building index
b. running the experiment and providing timings
c. writing results in a standard format (as before results/Y/X/run_identifier.hdf5)
python plot.py / data_export.py / ... to evaluate the results
We provide a default run script for inspiration, which would be pretty close to the current setup. Putting all the logic into the container could mean a lot of code duplication, but isolated containers will allow for a much easier orchestration.
I can provide a proof-of-concept if this sounds promising.
The text was updated successfully, but these errors were encountered:
The current workflow to run algorithm X on dataset Y is something like this:
python install.py
builds docker containerpython create_dataset.py
sets up datasetspython run.py --dataset Y --algorithm X
mountsdata/
,results/
,benchmark/
into the container for Xpython plot.py / data_export.py / ...
to evaluate the resultsGiven @harsha-simhadri's and @sourcesync's frustrations and some directions discussed in other meetings, I think we should relax step 3 a bit and allow more flexibility in the container setup. One direction could look like this:
python install.py
builds docker container, participants are expected to overwrite the entry point to point to their own implementation (filealgorithms/X/Dockerfile
)python create_dataset.py
sets up datasetsalgorithms/X/run.{py,sh}
)task
,dataset
, where the results should be written, and some additional parametersdata/
,results/
, and the config file that is used by the implementation (algorithms/X/config.yaml
, maybe task specific)a. file I/O in the container, loading/building index
b. running the experiment and providing timings
c. writing results in a standard format (as before
results/Y/X/run_identifier.hdf5
)python plot.py / data_export.py / ...
to evaluate the resultsWe provide a default run script for inspiration, which would be pretty close to the current setup. Putting all the logic into the container could mean a lot of code duplication, but isolated containers will allow for a much easier orchestration.
I can provide a proof-of-concept if this sounds promising.
The text was updated successfully, but these errors were encountered: