In order to fetch the dataset used for this test you have to run
materialize-data.sh
script, like:
./materialize-data.sh
and you have to run next command in order to materialize the containers which are needed by the workflow:
./materialize-containers.sh
The data will be placed at TCGA_full_data
, fetched from
https://github.com/inab/TCGA_benchmarking_workflow/tree/master/TCGA_sample_data
The data of that remote resource has been derived from the materials of next manuscript:
Comprehensive Characterization of Cancer Driver Genes and Mutations, Bailey et al, 2018, Cell
- Folder data contains benchmarking metrics results from the 2018 TCGA-PanCancer benchmark for the 34 analyzed cancer types. Those files follow the structure of the 'aggregation' datasets from the Elixir Benchmarking Data Model. Json schemas for those datasets can be found here
- Folder metrics_ref_datasets contains the gold standards defined by the community for each of the cancer types.
- Folderpublic_ref contains the reference data used by the community for validation/predictions.
- All_Together.txt is a gene predictions file which can be used as input to test the workflow.