Skip to content

Major Release: Better Integration of PRSet

Compare
Choose a tag to compare
@choishingwan choishingwan released this 17 May 14:54
· 850 commits to master since this release

Update Log

General

  • Standardize command line parameters. For any parameters that act on files other than target, they will contain a prefix of the file name. For example, --base-info will perform INFO score filtering on base file, --ld-info will perform INFO score filtering on the LD reference file and --info will perform INFO score filtering on the target file.
  • Changed --cov-file and --pheno-file to --cov and --pheno because I am lazy
  • Removed --se and --prslice because we don't use those options. Might add them back when we introduce new function
  • Add --id-delim to allow more flexible control of sample ID concatenation
  • --maf and --ld-maf calculation now restricted in founder similar to PLINK.
  • Restructured the code to allow easier diagnosis
  • Add full unit testing for some of the classes, such as Region and SNP. Don't have time for all other classes.
  • Slight optimization of the GLM algorithm.
  • Executable for OS X and Linux are now compiled with Intel MKL library, which should provide some speed boost
  • Fix some of the usage and log messages
  • Update and reorganized our user manual

Default Changed

  • Default for --clump-kb changed to --clump-kb 1M from --clump-kb 250K
  • Default for --lower changed from --lower 0.0001 to --lower 5e-08

PRSet

  • Add documentation for --wind-3 and --wind-5, which pad each genomic regions at the 3' or 5' end respectively. (was available since 2.1.9, but forgot to provide document)
  • Combine --snp-set and --snp-sets into --snp-set. PRSet will now automatically detect if the input contain one column (therefore the whole file is one gene set), or if the input contain more than one column (therefore each row is one gene set).
  • Add documentation for --background. Use --background to specify a background region for competitive p-value calculation
  • Add parameter of --full-back, which info PRSet to use the whole genome as the background

Note: if --full-back and --background isn't provided, and --gtf and --set-perm is specified, we will use the GTF file to construct the background. If --gtf is missing, then we cannot perform competitive p-value calculation

BGEN

  • Change --hard-thres and --ld-hard-thres parameter. They are now use to specify the hardcall threshold. i.e. A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved. See out documentation for more information
  • Add --dose-thres and --ld-dose-thres parameter. They are similar to our old --hard-thres, for any SNPs, if the highest probability of any dosage is less than what's specified in --dose-thres, it will be set as missing.
  • We have performed manual check. Scores generated from PRSice when --hard is used are now identical to those generated from PLINK. Scores generated using dosage also have high correlation with those generated from --hard.
  • Support both SNP_ID and RS_ID for BGEN format. If RS_ID not found in base, we will try to match with SNP_ID