Add support for alternative chromosome styles #158
Merged
+309
−265
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This update replaces the
rmv_chrPrefix
parameter informat_sumstats()
with achr_style
parameter, as discussed in PR https://github.com/neurogenomics/MungeSumstats/pull/157.The new
chr_style
parameter supports all four of the chromosome naming styles available in GenomeInfoDb (NCBI, UCSC, dbSNP, and Ensembl), with Ensembl being the default. Thus, the default setting matches the previous default behavior of MSS while more robustly handling improperly-formatted chromosome names. I have also updated all relevant documentation, vignettes, and tests.The updated functions worked as expected on my GWAS, but it would be great if you can also double-check the code.
Note that the
chr_check()
function now serves to standardize chromosome names to the default Ensembl/NCBI style and remove SNPs on nonstandard or user-specified chromosomes. Chromosome name mapping tochr_style
, on the other hand, is performed directly informat_sumstats()
right before the final summary statistics data.table is written to disk. This order of operations is necessary because several of the checks that come afterchr_check()
(e.g. liftover) expect chromosome names to be formatted according to the default Ensembl/NCBI style, and I did not want to modify them.