Skip to content

How do I create synthetic VCF data to test the scoring engine across multiple scoring models? #289

Answered by nebfield
tiwalayo asked this question in Q&A
Discussion options

You must be logged in to vote

I'm not an expert (that's @smlmbrt!), but HAPNEST might be helpful to get you started making synthetic genomes:

https://academic.oup.com/bioinformatics/article/39/9/btad535/7255913

https://github.com/intervene-EU-H2020/synthetic_data there's lots of configuration options here, including limiting generated variants to a specific subset

It won't output a VCF, but it's easy to create a VCF from hapnest output (see plink2 --recode)

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@smlmbrt
Comment options

Answer selected by smlmbrt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants