Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Compare cbgen performance to bgen-reader-py #24

Open
eric-czech opened this issue Sep 25, 2020 · 0 comments
Open

Compare cbgen performance to bgen-reader-py #24

eric-czech opened this issue Sep 25, 2020 · 0 comments

Comments

@eric-czech
Copy link
Collaborator

This will primarily be helpful to understand if using dask for parallelism over a thread-safe reader library has any obvious disadvantages over putting parallelism in the reader library itself. Drawing this conclusion from different libraries won't be ideal, but Carl had these numbers handy for bgen-reader-py so we should make sure cbgen is comparable once #20 is done:

I added multithreading to the Numpy-inspired reader. Using this API, on my 6 processor machine from a SSD, I was able to read 109 variants/second (53 million distributions/second). This was on file ‘merged_487400x220000.bgen’, which tries to be like the UKBio Bank data.
(Single threaded performance is 31 variants/second and 15 million distributions/second. We also verified that the cbgen interface is thread-safe.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant