-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database I/O and conformer/fingerprint storage #36
Comments
The most comprehensive benchmarks we've run with E3FP are in Table S3 and Figure S10 of the supplement of the paper. I've included them below: The code that ran these benchmarks is here: https://github.com/keiserlab/e3fp-paper/tree/master/project/benchmark. As you can see, we haven't rigorously benchmarked on more than 308,315 molecules (ChEMBL20). The runtime should scale linearly with database size. Note that when we scaled from 10,000 to 308,315 molecules, E3FP still takes on average ~83s and ~0.7s per molecule for conformer generation and fingerprinting, respectively. While runtime of fingerprinting scales sub-linearly with the number of heavy atoms, conformer generation scales super-linearly with the same heavy atoms, so if your database contains very large, flexible molecules (e.g. peptides), these will tend to take a long time to run conformer generation, and that could use up all of your processors. |
Regarding storage sizes, I haven't run any benchmarks in this area. E3FP's default storage approach is described here. Since it's just a light wrapper of a |
I'd like to use e3fp fingerprints on a very large database of molecules (~millions, possibly billions).
I was wondering if you had any benchmarks on speed and conformer/fingerprint storage sizes. Whats the largest dataset you've applied this to?
Thanks!
The text was updated successfully, but these errors were encountered: