You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.
This will primarily be helpful to understand if using dask for parallelism over a thread-safe reader library has any obvious disadvantages over putting parallelism in the reader library itself. Drawing this conclusion from different libraries won't be ideal, but Carl had these numbers handy for bgen-reader-py so we should make sure cbgen is comparable once #20 is done:
I added multithreading to the Numpy-inspired reader. Using this API, on my 6 processor machine from a SSD, I was able to read 109 variants/second (53 million distributions/second). This was on file ‘merged_487400x220000.bgen’, which tries to be like the UKBio Bank data.
(Single threaded performance is 31 variants/second and 15 million distributions/second. We also verified that the cbgen interface is thread-safe.)
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This will primarily be helpful to understand if using dask for parallelism over a thread-safe reader library has any obvious disadvantages over putting parallelism in the reader library itself. Drawing this conclusion from different libraries won't be ideal, but Carl had these numbers handy for bgen-reader-py so we should make sure cbgen is comparable once #20 is done:
The text was updated successfully, but these errors were encountered: