Skip to content

Set based permutation speed up, memory mapping and more

Compare
Choose a tag to compare
@choishingwan choishingwan released this 11 Sep 15:44
· 600 commits to master since this release

Update Log

  • PRSice can now recognize gz files without without the gz suffix
  • Now have better check in place for parameters related to distance
  • Cleaning up some codes. Try to make the code base more readable
  • Fix some memory leak problem related to #128, #131, #137
  • Completely remove Pearson Correlation clumping. We don't have the manpower to maintain the code base, and Pearson correlation clumping does not provide enough benefit for us to consider supporting it.
  • Add in memory map feature (--enable-mmap). When large amount of memory is available, and when all genotypes are stored in the same file, --enable-mmap might help to speed things up a bit
  • Updated code for linear regression. Now adopted codes from RcppEigen
  • Update internal variable types for all score printing. We can now generate an all score file for more sample and thresholds. We can allow 5.270498e+17 samples if there's one threshold, or 1.844674e+18 thresholds if there are 500k samples.
  • Update internal variable types for p-value thresholding. Previously, if user require an ultra small step size (e.g. < 1e-20), PRSice will generate abnormal thresholds (see here). To accommodate this use case, PRSice will now detect whether the number of threshold required exceed what we can store and use a slower alternative to generate the thresholds.
  • Set based permutation were too slow to be practical. We perform some algebra tricks to speed the process up. For more detail, you can refer to our full manual.