Penn-CNV Instructions REQUIRED FILES: -hg_38.pfb -cnv.sif -hhal.hmm
Run the following commands for each trio by running the following in the terminal: sbatch raw_cnv_batch Sheet in V2_OFFICIAL_2024 Olfson_CNV_Calls_data_and commands associated with commands: indiv-rawcnv-command Example of command it will run: apptainer exec cnv.sif perl -test -hmm lib/hhall.hmm -pfb hg_38.pfb Signal/207166290017_R01C02 Signal/207166290017_R02C02 Signal/207166290017_R03C02 - log Signals/Complete_RawCNV-logs/85.log -out Signals/Complete_RawCNV/85-OmniExpress.rawcnv
Run the following commands for each trio by running the following in the terminal:sbatch trio_cnv_batch and sbatch trio_cnv_batch_part2 Sheet in V2_2024 Olfson_CNV_Calls_data_and commands associated with commands: trio-command Example of command it will run: apptainer exec cnv.sif perl -trio -hmm lib/hhall.hmm -pfb hg_38.pfb -cnv Signals_Complete_Fina/Complete_RawCNV/402-OmniExpress.rawcnv Signals_Complete_Final/207166300081_R07C02 Signals_Complete_Final/207166300081_R08C02 Signals_Complete_Final/207166300081_R09C02 -log Signals_Complete_Fina/Complete_RawCNV-logs/402-OmniExpress.log -out Signals_Complete_Final/Complete_TrioCNV/402-OmniExpress.triocnv
Run the following in the terminal in the triocnv directory: cd signal/TrioCNV cat *.triocnv > OCD73trioCNVconcat cd RawCNV-logs cat *.log > OCD73-sample-logs-concat
Quality control parameters to talk:
apptainer exec cnv.sif perl -length 30k OCD73trioCNVconcat -output OCD_QC1
apptainer exec cnv.sif perl OCD_QC1 --qclogfile OCD73-sample-logs-concat --qclrrsd 0.3 --qcpassout sampleall.qcpass -qcsumout sampleall.qsum -out sampleall.goodcnvs
apptainer exec cnv.sif perl sampleall.goodcnvs imm_region -minqueryfrac 0.5 > cnvcall.imm; fgrep -v -f cnvcall.imm sampleall.goodcnvs > cnvcall.clean
apptainer exec cnv.sif perl cnvcall.clean centromeric_telomeric_regions -minqueryfrac 0.5 > cnvcall.imm; fgrep -v -f cnvcall.imm cnvcall.clean > sampleall.clean
apptainer exec cnv.sif perl sampleall.clean repeatmaskerregions.txt -minqueryfrac 0.5 > cnvcall.repeat; fgrep -v -f cnvcall.repeat sampleall.clean > sampleallclean.clean
awk '{print $2":"$3"-"$4}' genomicSuperDups.txt > formatted_dupgenomic_regions.txt
apptainer exec cnv.sif perl sampleallclean.clean formatted_dupgenomic_regions.txt -minqueryfrac 0.5 > cnvcall.dup; fgrep -v -f cnvcall.dup sampleallclean.clean > FINALcleanCNV.clean
sbatch polyNregion_download
awk '{print "chr"$1":"$2"-"$3}' human_g1k_v37-N.bed > human_g1k_v37-N.txt
apptainer exec cnv.sif perl FINALcleanCNV.clean human_g1k_v37-N.txt -minqueryfrac 0.5 > cnvcall.polyN; fgrep -v -f cnvcall.dup FINALcleanCNV.clean > FINALQCCNV.clean
sbatch Merging_Large_CNVs
Annotation: apptainer exec cnv.sif perl merged_sampleall.clean -knowngene knownGene_hg38txt -kgxref kgXref_hg38.txt > OlfsonCleanedAnnotated.rg38
Bed conversion: apptainer exec cnv.sif perl OlfsonCleanedAnnotated.rg38 -format bed -output Olfson_cleaned_rg38
Run frequnecy calculation: sbatch n /gpfs/gibbs/project/olfson/srg52/Project_3_Emily/CNV_Emily/Signals_Complete_Final/STOP2-AFQC/BED_CONVERSION
############# Large, rare CNV #############
############# De Novo Calling ############# 6. Download results and filter in excel by "Offspring only". Reupload new spreadsheet.
Run the following script in rstudio: Merging_family_for_de_novo_calling.R and download the spreadsheet. This will be the information you use to modify the commands in the Sheet in Official_2024 Olfson_CNV_Calls_data_and commands associated with commands: De_novo-command. These commands will be added to each batch script below in bunches.
Run the following commands for each trio by running the following in the terminal:sbatch denov0_cnv_batch_part1, denovo_cnv_batch 8part2 denovo_cnv_batch _ part3 denovo_cnv_batch _part4 denovo_cnv_batch _part5
Example of command it will run: sbatch trio_cnv_batch and sbatch trio_cnv_batch_part2: apptainer exec cnv.sif perl -pfb hg_38.pfb -hmm lib/hhall.hmm -denovocn 3 Signals_Complete_Final/207221090005_R04C02 Signals_Complete_Final/207221090005_R05C02 Signals_Complete_Final/207221090005_R06C02 -start GSA-rs72917720 -end GSA-rs77445654 -log Signals_Complete_Final/Complete_DeNovo/log_207221090005_R06C02_chr2:193228891-193446437 -out Signals_Complete_Final/Complete_DeNovo/207221090005_R06C02_chr2:193228891-193446437
- Run in terminal to select for variants marked as likely de novo: cat 20* > combined_file.csv cat log* > merged_data.csv Run Unique_Denovo_Trios.R in /gpfs/gibbs/project/olfson/srg52/Project_3_Emily/CNV_Emily/Signals_Complete_Final/Complete_DeNovo