This directory contains various analysis modules in the OpenPBTA project. See the README of an individual analysis modules for more information about that module.
The table below is intended to help project organizers quickly get an idea of what files (and therefore types of data) are consumed by each analysis module, what the module does, and what output files it produces that can be consumed by other analysis modules.
In addition, this table reflects which analyses are included in the OpenPBTA manuscript.
This is in service of documenting interdependent analyses.
In the field Output Files Consumed by Other Analyses
, if the given data file is marked (included in data download)
, that means the analysis module created the data file, but the relevant "other analyses" will read that file in from the data release directly, not from that module's internal results.
Note that nearly all modules use the harmonized clinical data file (pbta-histologies.tsv
) even when it is not explicitly included in the table below.
Module | Input Files | Brief Description | Output Files Consumed by Other Analyses | Analysis included in manuscript? | Produces files for data release? |
---|---|---|---|---|---|
chromosomal-instability |
pbta-histologies.tsv pbta-sv-manta.tsv.gz pbta-cnv-cnvkit.seg.gz |
Evaluates chromosomal instability by calculating chromosomal breakpoint densities and by creating circular plot visuals | analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv |
Yes | No |
chromothripsis |
pbta-sv-manta.tsv.gz pbta-cnv-consensus.seg.gz independent-specimens.wgs.primary-plus.tsv figures/palettes/histology_label_color_table.tsv analyses/chromosomal-instability/breakpoint-data/cnv_breaks_densities.tsv analyses/chromosomal-instability/breakpoint-data/sv_breaks_densities.tsv |
This module runs ShatterSeek, identifies chromothripsis regions, and visualizes the results. | N/A | Yes | No |
cnv-chrom-plot |
pbta-cnv-consensus-gistic.zip analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg |
Plots genome wide visualizations relating to copy number results | N/A | Yes | No |
cnv-comparison |
Earlier version of SEG files | Deprecated; compared earlier version of the CNV methods. | N/A | No | No |
collapse-rnaseq |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds gencode.v27.primary_assembly.annotation.gtf.gz |
Collapses RSEM FPKM matrices such that gene symbols are de-duplicated. | results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds (included in data download; too large for tracking via GitHub) results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds (included in data download; too large for tracking via GitHub) |
Yes | Yes |
comparative-RNASeq-analysis |
pbta-gene-expression-rsem-tpm.polya.rds pbta-gene-expression-rsem-tpm.stranded.rds pbta-histologies.tsv pbta-mend-qc-manifest.tsv pbta-mend-qc-results.tar.gz |
Produces expression outlier profiles per #229 | N/A | No | No |
compare-gistic |
analyses/run-gistic/results/pbta-cnv-consensus-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-hgat-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-lgat-gistic.zip analyses/run-gistic/results/pbta-cnv-consensus-medulloblastoma-gistic.zip |
Comparison of the GISTIC results of the entire cohort with the GISTIC results of three individual histolgies, namely, LGAT, HGAT and medulloblastoma (#547 | N/A | No | No |
copy_number_consensus_call |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-sv-manta.tsv.gz |
Produces consensus copy number calls per #128 and a set of excluded regions where CNV calls are not made | results/cnv_consensus.tsv results/pbta-cnv-consensus.seg.gz (included in data download) ref/cnv_excluded_regions.bed ref/cnv_callable.bed |
Yes | Yes |
count-contributions |
N/A - uses Git logs | Counts Git contributions to the repository | N/A | No | No |
create-subset-files |
All files | This module contains the code to create the subset files used in continuous integration | All subset files for continuous integration | Not directly | No |
focal-cn-file-preparation |
pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.gz |
Maps from copy number variant caller segments to "most focal unit" | results/cnvkit_annotated_cn_autosomes.tsv.gz results/cnvkit_annotated_cn_x_and_y.tsv.gz results/controlfreec_annotated_cn_autosomes.tsv.gz results/controlfreec_annotated_cn_x_and_y.tsv.gz results/consensus_seg_annotated_cn_autosomes.tsv.gz (included in data download) results/consensus_seg_annotated_cn_x_and_y.tsv.gz (included in data download) results/consensus_seg_with_status.tsv (included in data download) |
Yes | Yes |
fusion_filtering |
pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Standardizes, filters, and prioritizes fusion calls | results/pbta-fusion-putative-oncogenic.tsv (included in data download) results/pbta-fusion-recurrent-fusion-byhistology.tsv (included in data download) results/pbta-fusion-recurrent-fusion-bysample.tsv (included in data download) |
Yes | Yes |
fusion-summary |
pbta-histologies.tsv pbta-fusion-putative-oncogenic.tsv pbta-fusion-arriba.tsv.gz pbta-fusion-starfusion.tsv.gz |
Generate summary tables from fusion files (#398; #623) | results/fusion_summary_embryonal_foi.tsv (included in data download) results/fusion_summary_ependymoma_foi.tsv (included in data download) results/fusion_summary_ewings_foi.tsv (included in data download) |
Yes | Yes |
gene-set-enrichment-analysis |
analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/collapse-rnaseq/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Updated gene set enrichment analysis with appropriate RNA-seq expression data | results/gsva_scores_stranded.tsv results/gsva_scores_polya.tsv for stranded, polya expression data respectively |
Yes | No |
hotspots-detection |
pbta-snv-strelka2.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-vardict.vep.maf.gz pbta-snv-lancet.vep.maf.gz |
Scavenges cancer any hotspot calls from each caller and merges with consensus (3/3) calls if it was missed in snv-caller workflow. | pbta-snv-hotspots-mutation.maf.tsv.gz (included in data download) |
Yes | Yes |
immune-deconv |
pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Immune/Stroma characterization across PBTA (part of #15) | results/quantiseq_deconv-output.rds |
Yes | No |
independent-samples |
pbta-histologies.tsv |
Generates independent specimen lists for WGS/WXS samples | results/independent-specimens.wgs.primary.tsv (included in data download) results/independent-specimens.wgs.primary-plus.tsv (included in data download) results/independent-specimens.wgswxs.primary.tsv (included in data download) results/independent-specimens.wgswxs.primary-plus.tsv (included in data download) |
Yes | Yes |
interaction-plots |
independent-specimens.wgs.primary-plus.tsv pbta-snv-consensus-mutation.maf.tsv.gz |
Creates interaction plots for mutation mutual exclusivity/co-occurrence #13; may be updated to include other data types (e.g., fusions) | N/A | Yes | No |
molecular-subtyping-ATRT |
analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz pbta-snv-consensus-mutation-tmb-all.tsv pbta-cnv-consensus-gistic.zip |
Summarizing data into tabular format in order to molecularly subtype ATRT samples #244; this analysis did not work | N/A | No | No |
molecular-subtyping-CRANIO |
pbta-histologies-base.tsv pbta-snv-consensus-mutation.maf.tsv.gz pbta-snv-scavenged-hotspots.maf.tsv.gz |
Molecular subtyping of craniopharyngiomas samples #810 | results/CRANIO_molecular_subtype.tsv |
Yes | No |
molecular-subtyping-EPN |
pbta-histologies-base.tsv pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-cnv-consensus-gistic.zip analyses/chromosomal-instability/breakpoint-data/union_of_breaks_densities.tsv fusion_summary_ependymoma_foi.tsv analyses/gene-set-enrichment-analysis/results/gsva_scores_stranded.tsv |
Molecular subtyping of ependymoma tumors | results/EPN_all_data_withsubgroup.tsv |
Yes | No |
molecular-subtyping-EWS |
pbta-histologies-base.tsv fusion_summary_ewings_foi.tsv |
Reclassification of tumors based on the presence of defining fusions for Ewing Sarcoma per #623 | results/EWS_samples.tsv |
Yes | No |
molecular-subtyping-HGG |
pbta-histologies-base.tsv pbta-snv-consensus-mutation.maf.tsv.gz pbta-snv-scavenged-hotspots.maf.tsv.gz consensus_seg_annotated_cn_autosomes.tsv.gz pbta-fusion-putative-oncogenic.tsv pbta-cnv-consensus-gistic.zip pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Molecular subtyping of high-grade glioma samples #249 | results/HGG_molecular_subtype.tsv |
Yes | No |
molecular-subtyping-LGAT |
pbta-histologies-base.tsv pbta-snv-consensus-mutation.maf.tsv.gz pbta-snv-scavenged-hotspots.maf.tsv.gz analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv pbta-fusion-recurrently-fused-genes-bysample.tsv |
Molecular subtyping of Low-grade astrocytic tumor samples #631 | results/lgat_subtyping.tsv |
Yes | No |
molecular-subtyping-MB |
pbta-histologies-base.tsv pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Molecular classification of Medulloblastoma subtypes (part of #731) | results/MB_molecular_subtype.tsv results/MB_batchcorrected_molecular_subtype.tsv for uncorrected and batch-corrected input matrix |
Yes | No |
molecular-subtyping-SHH-tp53 |
pbta-histologies pbta-snv-consensus-mutation.maf.tsv.gz |
Deprecated; Identify the SHH-classified medulloblastoma samples that have TP53 mutations #247 | N/A | No | No |
molecular-subtyping-chordoma |
consensus_seg_annotated_cn_autosomes.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
In progress; identifying poorly-differentiated chordoma samples per #250 | N/A | Yes | No |
molecular-subtyping-embryonal |
pbta-histologies-base.tsv fusion_summary_embryonal_foi.tsv pbta-sv-manta.tsv.gz consensus_seg_annotated_cn_x_and_y.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors #251 | results/embryonal_tumor_molecular_subtypes.tsv |
Yes | No |
molecular-subtyping-integrate |
pbta-histologies-base.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv |
Add molecular subtype information to base histology | results/pbta-histologies.tsv (included in data download) |
Yes | Yes |
molecular-subtyping-neurocytoma |
pbta-histologies-base.tsv |
Molecular subtyping of Neurocytoma samples #805 | results/neurocytoma_subtyping.tsv |
Yes | No |
molecular-subtyping-pathology |
analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv analyses/molecular-subtyping-EWS/results/EWS_samples.tsv analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv analyses/molecular-subtyping-chordoma/results/chordoma_smarcb1_status.tsv |
Compile output from other molecular subtyping modules and incorporate pathology feedback #645 | results/compiled_molecular_subtyping_with_clinical_feedback.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv results/compiled_molecular_subtypes_with_clinical_pathology_feedback_and_report_info.tsv |
Yes | No |
mutational-signatures |
pbta-snv-consensus-mutation.maf.tsv.gz |
Performs three separate analyses of mutational signatures: 1) Analyzes COSMIC and Alexandrov et al. mutational signatures using the consensus SNV data; 2) Performs de novo signature extraction using only the WGS samples from the consensus SNV data; 3) Fits known CNS signatures to the WGS samples from the consensus SNV data | N/A | Yes | No |
mutect2-vs-strelka2 |
pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz |
Deprecated; comparison of only two SNV callers, subsumed by snv-callers |
N/A | No | No |
oncoprint-landscape |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-fusion-putative-oncogenic.tsv consensus_seg_annotated_cn_autosomes.tsv.gz consensus_seg_annotated_cn_x_and_y.tsv.gz independent-specimens.* |
Combines mutation, copy number, and fusion data into an OncoPrint plot | N/A | Yes | No |
rna-seq-composition |
pbta-gene-expression-rsem-tpm.stranded.rds pbta-histologies.tsv pbta-mend-qc-results.tar.gz pbta-mend-qc-manifest.tsv pbta-star-log-manifest.tsv pbta-star-log-final.tar.gz |
Analyzes the fraction of read types that comprise each RNA-Seq sample; flags samples with unusual composition | N/A | No | No |
run-gistic |
pbta-histologies.tsv pbta-cnv-consensus.seg.gz |
Runs GISTIC 2.0 on SEG files | pbta-cnv-consensus-gistic.zip (included in data download) |
Yes | Yes |
sample-distribution-analysis |
pbta-histologies.tsv |
Produces plots and tables that illustrate the distribution of different histologies in the PBTA data | N/A | No | No |
selection-strategy-comparison |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds |
Deprecated; Comparison of RNA-seq data from different selection strategies | N/A | No | No |
sex-prediction-from-RNASeq |
pbta-gene-expression-kallisto.stranded.rds pbta-histologies.tsv |
Predicts genetic sex using RNA-seq data (#84) | N/A | No | No |
snv-callers |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz pbta-snv-vardict.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz |
Generates consensus SNV and indel calls for PBTA and TCGA data; calculates tumor mutation burden using the consensus calls | results/consensus/pbta-snv-consensus-mutation.maf.tsv.gz (included in data download; too large for tracking via GitHub) results/consensus/pbta-snv-consensus-mutation-tmb-all.tsv (included in data download) results/consensus/pbta-snv-consensus-mutation-tmb-coding.tsv (included in data download; too large for tracking via GitHub) results/consensus/tcga-snv-consensus-mutation.maf.tsv.gz (included in data download) results/consensus/tcga-snv-mutation-tmb.tsv (included in data download) results/consensus/tcga-snv-mutation-tmb-coding.tsv (included in data download) |
Yes | Yes |
ssgsea-hallmark |
pbta-gene-counts-rsem-expected_count.stranded.rds |
Deprecated; performs GSVA using Hallmark gene sets | N/A | No, subsumed by gene-set-enrichment-analysis |
No |
survival-analysis |
pbta-histologies.tsv independent-specimens.wgswxs.primary.tsv tp53_altered_status.tsv (results from tp53_nf1_score module) quantiseq_deconv-output.rds (results from immune-deconv module) pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds |
Performs kaplan-meier, log rank, and/or cox regression univariate or multivariate survival modeling | N/A | Yes | No |
telomerase-activity-prediction |
pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds pbta-gene-counts-rsem-expected_count.stranded.rds pbta-gene-counts-rsem-expected_count.polya.rds |
Quantify telomerase activity across pediatric brain tumors (part of #148) | results/TelomeraseScores_PTBAPolya_counts results/TelomeraseScores_PTBAPolya_FPKM.txt results/TelomeraseScores_PTBAStranded_counts.txt results/TelomeraseScores_PTBAStranded_FPKM.txt results/EXTENDScores_{broad_histology}.tsv |
Yes | No |
tmb-compare |
pbta-snv-consensus-mutation-tmb-coding.tsv |
Deprecated. Compares PBTA tumor mutation burden to adult TCGA data. | N/A | Not directly, similar figure generated in figures/ |
No |
tp53_nf1_score |
pbta-snv-consensus-mutation.maf.tsv.gz pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds pbta-gene-expression-rsem-fpkm-collapsed.polya.rds |
Applies TP53 inactivation, NF1 inactivation, and Ras activation classifiers to RNA-seq data #165 | N/A | Yes | No |
transcriptomic-dimension-reduction |
pbta-gene-expression-rsem-fpkm.polya.rds pbta-gene-expression-rsem-fpkm.stranded.rds pbta-gene-expression-kallisto.polya.rds pbta-gene-expression-kallisto.stranded.rds |
Dimension reduction and visualization of RNA-seq data | N/A | Yes | No |
tcga-capture-kit-investigation |
pbta-snv-lancet.vep.maf.gz pbta-snv-mutect2.vep.maf.gz pbta-snv-strelka2.vep.maf.gz tcga-snv-lancet.vep.maf.gz tcga-snv-mutect2.vep.maf.gz tcga-snv-strelka2.vep.maf.gz pbta-histologies.tsv pbta-tcga-manifest.tsv WGS.hg38.lancet.unpadded.bed WGS.hg38.strelka2.unpadded.bed WGS.hg38.mutect2.vardict.unpadded.bed |
Deprecated; Investigation of the TMB discrepancy between PBTA and TCGA data | results/*.bed |
No | No |
tumor-purity-exploration |
pbta-histologies.tsv |
This modules explores tumor purity distributions and potential covariates, as well as establishes a cancer-group specific threshold for selecting high tumor purity samples. | thresholded_rna_stranded_same-extraction.tsv |
Yes | No |