Compare cbgen performance to bgen-reader-py #24

eric-czech · 2020-09-25T15:38:42Z

This will primarily be helpful to understand if using dask for parallelism over a thread-safe reader library has any obvious disadvantages over putting parallelism in the reader library itself. Drawing this conclusion from different libraries won't be ideal, but Carl had these numbers handy for bgen-reader-py so we should make sure cbgen is comparable once #20 is done:

I added multithreading to the Numpy-inspired reader. Using this API, on my 6 processor machine from a SSD, I was able to read 109 variants/second (53 million distributions/second). This was on file ‘merged_487400x220000.bgen’, which tries to be like the UKBio Bank data.
(Single threaded performance is 31 variants/second and 15 million distributions/second. We also verified that the cbgen interface is thread-safe.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare cbgen performance to bgen-reader-py #24

Compare cbgen performance to bgen-reader-py #24

eric-czech commented Sep 25, 2020

Compare cbgen performance to bgen-reader-py #24

Compare cbgen performance to bgen-reader-py #24

Comments

eric-czech commented Sep 25, 2020