Major Release: Better Integration of PRSet
Update Log
General
- Standardize command line parameters. For any parameters that act on files other than target, they will contain a prefix of the file name. For example,
--base-info
will perform INFO score filtering on base file,--ld-info
will perform INFO score filtering on the LD reference file and--info
will perform INFO score filtering on the target file. - Changed
--cov-file
and--pheno-file
to--cov
and--pheno
because I am lazy - Removed
--se
and--prslice
because we don't use those options. Might add them back when we introduce new function - Add
--id-delim
to allow more flexible control of sample ID concatenation --maf
and--ld-maf
calculation now restricted in founder similar to PLINK.- Restructured the code to allow easier diagnosis
- Add full unit testing for some of the classes, such as Region and SNP. Don't have time for all other classes.
- Slight optimization of the GLM algorithm.
- Executable for OS X and Linux are now compiled with Intel MKL library, which should provide some speed boost
- Fix some of the usage and log messages
- Update and reorganized our user manual
Default Changed
- Default for
--clump-kb
changed to--clump-kb 1M
from--clump-kb 250K
- Default for
--lower
changed from--lower 0.0001
to--lower 5e-08
PRSet
- Add documentation for
--wind-3
and--wind-5
, which pad each genomic regions at the 3' or 5' end respectively. (was available since 2.1.9, but forgot to provide document) - Combine
--snp-set
and--snp-sets
into--snp-set
. PRSet will now automatically detect if the input contain one column (therefore the whole file is one gene set), or if the input contain more than one column (therefore each row is one gene set). - Add documentation for
--background
. Use--background
to specify a background region for competitive p-value calculation - Add parameter of
--full-back
, which info PRSet to use the whole genome as the background
Note: if
--full-back
and--background
isn't provided, and--gtf
and--set-perm
is specified, we will use the GTF file to construct the background. If--gtf
is missing, then we cannot perform competitive p-value calculation
BGEN
- Change
--hard-thres
and--ld-hard-thres
parameter. They are now use to specify the hardcall threshold. i.e. A hardcall is saved when the distance to the nearest hardcall is less than the hardcall threshold. Otherwise a missing code is saved. See out documentation for more information - Add
--dose-thres
and--ld-dose-thres
parameter. They are similar to our old--hard-thres
, for any SNPs, if the highest probability of any dosage is less than what's specified in--dose-thres
, it will be set as missing. - We have performed manual check. Scores generated from PRSice when
--hard
is used are now identical to those generated from PLINK. Scores generated using dosage also have high correlation with those generated from--hard
. - Support both SNP_ID and RS_ID for BGEN format. If RS_ID not found in base, we will try to match with SNP_ID