Skip to content

Commit

Permalink
Merge pull request #368 from EGA-archive/develop
Browse files Browse the repository at this point in the history
Adding ri-tools volumes for integrated deployment
  • Loading branch information
costero-e authored Aug 30, 2024
2 parents 442ded8 + 7229a18 commit d7b620e
Show file tree
Hide file tree
Showing 55 changed files with 9,134 additions and 2 deletions.
2 changes: 1 addition & 1 deletion beacon/api_version.yml
Original file line number Diff line number Diff line change
@@ -1 +1 @@
api_version: v2.0-2e81a7e
api_version: v2.0-442ded8
7 changes: 6 additions & 1 deletion deploy/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,17 @@ services:
###########################################

beacon-ri-tools:
image: ghcr.io/ega-archive/beacon2-ri-tools-v2:main
image: ghcr.io/ega-archive/beacon2-ri-tools-v2:latest
hostname: beacon-ri-tools
container_name: ri-tools
networks:
- beacon-priv
tty: true
volumes:
- ./ri-tools/output_docs:/usr/src/app/output_docs
- ./ri-tools/conf:/usr/src/app/conf
- ./ri-tools/files/vcf/files_to_read:/usr/src/app/files/vcf/files_to_read
- ./ri-tools/csv:/usr/src/app/csv

###########################################
# training-ui
Expand Down
7 changes: 7 additions & 0 deletions deploy/ri-tools/conf/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#### Input and Output files config parameters ####
csv_folder = './csv/examples/'
output_docs_folder='./output_docs/'

#### VCF Conversion config parameters ####
allele_frequency=1 # introduce float number, leave 1 if you want to convert all the variants
reference_genome='GRCh38' # Choose one between NCBI36, GRCh37, GRCh38
2,505 changes: 2,505 additions & 0 deletions deploy/ri-tools/csv/examples/analyses.csv

Large diffs are not rendered by default.

2,505 changes: 2,505 additions & 0 deletions deploy/ri-tools/csv/examples/biosamples.csv

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions deploy/ri-tools/csv/examples/cohorts.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
cohortDataTypes|id,cohortDataTypes|label,cohortDesign|id,cohortDesign|label,cohortSize,cohortType,collectionEvents|eventAgeRange|availability,collectionEvents|eventAgeRange|availabilityCount,collectionEvents|eventAgeRange|distribution,collectionEvents|eventDataTypes|availability,collectionEvents|eventDataTypes|availabilityCount,collectionEvents|eventDataTypes|distribution,collectionEvents|eventDiseases|availability,collectionEvents|eventDiseases|availabilityCount,collectionEvents|eventDiseases|distribution,collectionEvents|eventEthnicities|availability,collectionEvents|eventEthnicities|availabilityCount,collectionEvents|eventEthnicities|distribution,collectionEvents|eventGenders|availability,collectionEvents|eventGenders|availabilityCount,collectionEvents|eventGenders|distribution,collectionEvents|eventLocations|availability,collectionEvents|eventLocations|availabilityCount,collectionEvents|eventLocations|distribution,collectionEvents|eventPhenotypes|availability,collectionEvents|eventPhenotypes|availabilityCount,collectionEvents|eventPhenotypes|distribution,collectionEvents|eventTimeline|availability,collectionEvents|eventTimeline|availabilityCount,collectionEvents|eventTimeline|distribution,exclusionCriteria|ageRange|end|iso8601duration,exclusionCriteria|ageRange|start|iso8601duration,exclusionCriteria|diseaseConditions|ageOfOnset,exclusionCriteria|diseaseConditions|diseaseCode|id,exclusionCriteria|diseaseConditions|diseaseCode|label,exclusionCriteria|diseaseConditions|familyHistory,exclusionCriteria|diseaseConditions|notes,exclusionCriteria|diseaseConditions|severity|id,exclusionCriteria|diseaseConditions|severity|label,exclusionCriteria|diseaseConditions|stage|id,exclusionCriteria|diseaseConditions|stage|label,exclusionCriteria|ethnicities|id,exclusionCriteria|ethnicities|label,exclusionCriteria|genders|id,exclusionCriteria|genders|label,exclusionCriteria|locations|id,exclusionCriteria|locations|label,exclusionCriteria|phenotypicConditions|evidence|evidenceCode,exclusionCriteria|phenotypicConditions|evidence|reference,exclusionCriteria|phenotypicConditions|excluded,exclusionCriteria|phenotypicConditions|featureType|id,exclusionCriteria|phenotypicConditions|featureType|label,exclusionCriteria|phenotypicConditions|modifiers,exclusionCriteria|phenotypicConditions|notes,exclusionCriteria|phenotypicConditions|onset,exclusionCriteria|phenotypicConditions|resolution,exclusionCriteria|phenotypicConditions|severity|id,exclusionCriteria|phenotypicConditions|severity|label,exclusionCriteria|type|availability,exclusionCriteria|type|availabilityCount,id,inclusionCriteria|ageRange|end|iso8601duration,inclusionCriteria|ageRange|start|iso8601duration,inclusionCriteria|diseaseConditions|ageOfOnset,inclusionCriteria|diseaseConditions|diseaseCode|id,inclusionCriteria|diseaseConditions|diseaseCode|label,inclusionCriteria|diseaseConditions|familyHistory,inclusionCriteria|diseaseConditions|notes,inclusionCriteria|diseaseConditions|severity|id,inclusionCriteria|diseaseConditions|severity|label,inclusionCriteria|diseaseConditions|stage|id,inclusionCriteria|diseaseConditions|stage|label,inclusionCriteria|ethnicities|id,inclusionCriteria|ethnicities|label,inclusionCriteria|genders|id,inclusionCriteria|genders|label,inclusionCriteria|locations|id,inclusionCriteria|locations|label,inclusionCriteria|phenotypicConditions|evidence|evidenceCode,inclusionCriteria|phenotypicConditions|evidence|reference,inclusionCriteria|phenotypicConditions|excluded,inclusionCriteria|phenotypicConditions|featureType|id,inclusionCriteria|phenotypicConditions|featureType|label,inclusionCriteria|phenotypicConditions|modifiers,inclusionCriteria|phenotypicConditions|notes,inclusionCriteria|phenotypicConditions|onset,inclusionCriteria|phenotypicConditions|resolution,inclusionCriteria|phenotypicConditions|severity|id,inclusionCriteria|phenotypicConditions|severity|label,inclusionCriteria|type|availability,inclusionCriteria|type|availabilityCount,name
,,,,,study-defined,,,,,,,TRUE,1705,"{""diseases"": {""acute bronchitis"": 121,""agranulocytosis"": 111,""asthma"": 134,""bipolar affective disorder"": 134,""cardiomyopathy"": 133,""dental caries"": 139,""eating disorders"": 134,""fibrosis and cirrhosis of liver"": 132,""gastro-oesophageal reflux disease"": 140,""haemorrhoids"": 127,""influenza due to certain identified influenza virus"": 135,""insulin-dependent diabetes mellitus"": 165,""iron deficiency anaemia"": 142,""multiple sclerosis"": 125,""obesity"": 136,""sarcoidosis"": 136,""schizophrenia"": 138,""thyroiditis"": 141,""varicose veins of lower extremities"": 139}}",TRUE,2287,"{""ethnicities"": {""African"": 119,""Any other Asian background"": 120,""Any other Black background"": 104,""Any other mixed background"": 92,""Any other white background"": 114,""Asian or Asian British"": 125,""Bangladeshi"": 96,""Black or Black British"": 131,""British"": 114,""Caribbean"": 127,""Chinese"": 100,""Indian"": 110,""Irish"": 111,""Mixed"": 127,""Other ethnic group"": 116,""Pakistani"": 115,""White"": 105,""White and Asian"": 114,""White and Black African"": 115,""White and Black Caribbean"": 132}}",TRUE,1597,"{""genders"": {""female"": 1271,""male"": 1233}}",TRUE,1597,"{""locations"": {""England"": 322,""Northern Ireland"": 317,""Republic of Ireland"": 311,""Scotland"": 308,""Wales"": 339}}",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,CINECA_synthetic_cohort_UK1,P65Y,P18Y,,,,,,,,,,,,NCIT:C16576|NCIT:C20197,female|male,GAZ:00150372,UK,,,,,,,,,,,,,,CINECA synthetic cohort UK1
2 changes: 2 additions & 0 deletions deploy/ri-tools/csv/examples/datasets.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
createDateTime,dataUseConditions|duoDataUse|description,dataUseConditions|duoDataUse|id,dataUseConditions|duoDataUse|label,dataUseConditions|duoDataUse|modifiers,dataUseConditions|duoDataUse|version,description,externalUrl,id,info,name,updateDateTime,version
2021-12-29T20:33:40Z,,"DUO:0000019,DUO:00000020","publication required,publication required",,"7-1-19,7-1-19","Please note: This synthetic data set (with cohort participants / subjects marked with FAKE) has no identifiable data and cannot be used to make any inference about cohort data or results. The purpose of this dataset is to aid development of technical implementations for cohort data discovery, harmonization, access, and federated analysis. In support of FAIRness in data sharing, this dataset is made freely available under the Creative Commons Licence (CC-BY). Please ensure this preamble is included with this dataset and that the CINECA project (funding: EC H2020 grant 825775) is acknowledged. For any questions please contact [email protected] or [email protected] This dataset (CINECA_synthetic_cohort_EUROPE_UK1) consists of 2521 samples which have genetic data based on 1000 Genomes data (https://www.nature.com/articles/nature15393), and synthetic subject attributes and phenotypic data derived from UKBiobank (https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001779). These data were initially derived using the TOFU tool (https://github.com/spiros/tofu), which generates randomly generated values based on the UKBiobank data dictionary. Categorical values were randomly generated based on the data dictionary, continuous variables generated based on the distribution of values reported by the UK Biobank showcase, and date / time values were random. Additionally we split the phenotypes and attributes into 4 main classes - general, cancer, diabetes mellitus, and cardiac. We assigned the general attributes to all the samples, and the cardiac / diabetes mellitus / cancer attributes to a proportion of the total samples. Once the initial set of phenotypes and attributes were generated, the data data was checked for consistency and where possible dependent attributes were calculated from the independent variables generated by TOFU. For example, BMI was calculated from height and weight data, and age at death generated by date of death and date of birth. These data were then loaded to the development instance of Biosamples (https://www.ebi.ac.uk/biosamples/) which accessioned each of the samples. The genetic data are derived from the 1000 Genomes Phase 3 release (https://www.internationalgenome.org/category/phase-3/). The genotype data consists of a single joint call vcf files with call genotypes for all 2504 samples, plus bed, bim, fam, and nosex files generated via plink for these samples and genotypes. The genotype data has had a variety of errors introduced to mimic real data and as a test for quality control pipelines. These include gender mismatches, ethnic background mislabelling and low call rates for a randomly chosen subset of sample data as well as deviations from Hardy Weinberg equilibrium and low call rates for a random selection of variants. Additionally 40 samples have raw genetic data available in the form of both bam and cram files, including unmapped data. The gender of the samples in the 1000 genomes data has been matched to the synthetic phenotypic data generated for these samples. The genetic data was then linked to the synthetic data in BioSamples, and submitted to EGA."",""externalUrl"": ""https://ega-archive.org/datasets/EGAD00001006673/",https://ega-archive.org/datasets/EGAD00001006673/,CINECA_synthetic_cohort_EUROPE_UK1,"{""beacon"": {""contact"": ""[email protected]"",""mapping"": ""Manuel Rueda"",""version"": ""v2.0""},""dataset"": {""derived"": [{""EGA"": {""contact"": ""[email protected]"",""externalUrl"": ""https://ega-archive.org/datasets/EGAD00001006673"",""license"": {""$ref"": ""#/dataUseConditions/duoDataUse""}}},{""BioSamples"": {""contact"": ""[email protected]"",""externalUrl"": ""https://www.ebi.ac.uk/biosamples"",""license"": ""Creative Commons Licence (CC-BY)""}}],""origin"": [{""CINECAprojectEU"": {""contact"": ""[email protected]"",""externalUrl"": ""https://www.cineca-project.eu/cineca-synthetic-dataset"",""license"": ""Creative Commons Licence (CC-BY)"",""managers"": "" Coline Thomas, Isuru Liyanage and Dylan Spalding""}},{""1000Genomes"": {""externalUrl"": ""https://www.internationalgenome.org/category/phase-3"",""license"": ""CC BY-NC-SA 3.0"",""version"": ""v5a phase 3 VCF""}}]}}",CINECA_synthetic_cohort_EUROPE_UK1,2021-12-29T20:33:40Z,v1.0
Loading

0 comments on commit d7b620e

Please sign in to comment.