Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gwas_tools.py is very slow #1

Open
ofrei opened this issue Feb 10, 2023 · 1 comment
Open

gwas_tools.py is very slow #1

ofrei opened this issue Feb 10, 2023 · 1 comment

Comments

@ofrei
Copy link
Owner

ofrei commented Feb 10, 2023

reprosteps: download prerequisites as described in http://localhost:8891/notebooks/gwas101.ipynb, then execute

from gwas_tools import read_bed, write_bed
os.system('plink2 --bfile chr21 --chr 21 --from-kb 32000 --to-kb 34000 --make-bed --out chr17_chunk')
geno = read_bed('chr17_chunk')
write_bed(geno, 'chr17_chunk_copy.bed')

This is quite slow. Can read_bed & write_bed be optimized?

@espenhgn
Copy link

Won't try and optimize immediately, but ran a quick profile exposing the main culprits:

geno = read_bed('chr21_chunk')
----
20768929 function calls (20768861 primitive calls) in 49.810 seconds

   Ordered by: cumulative time
   List reduced from 416 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   49.810   49.810 {built-in method builtins.exec}
        1    0.002    0.002   49.810   49.810 <string>:1(<module>)
        1    0.380    0.380   49.808   49.808 gwas_tools.py:55(read_bed)
     8275   31.166    0.004   48.885    0.006 gwas_tools.py:28(get_snp_geno)
 20704050   17.472    0.000   17.647    0.000 memmap.py:333(__getitem__)
        2    0.000    0.000    0.530    0.265 _decorators.py:170(wrapper)
      4[/2](https://vscode-remote+ssh-002dremote-002b158-002e39-002e200-002e231.vscode-resource.vscode-cdn.net/2)    0.000    0.000    0.530    0.265 _decorators.py:308(wrapper)
        2    0.000    0.000    0.530    0.265 readers.py:854(read_csv)
        2    0.000    0.000    0.530    0.265 readers.py:571(_read)
        2    0.000    0.000    0.337    0.169 readers.py:1395(__init__)
write_bed(geno, 'chr21_chunk_copy.bed')
----
         20722114 function calls in 40.105 seconds

   Ordered by: cumulative time
   List reduced from 29 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   40.105   40.105 {built-in method builtins.exec}
        1    0.000    0.000   40.104   40.104 <string>:1(<module>)
        1    0.102    0.102   40.104   40.104 gwas_tools.py:87(write_bed)
     8275    2.999    0.000   38.640    0.005 gwas_tools.py:42(get_snp_bytes)
 20695775   35.636    0.000   35.636    0.000 gwas_tools.py:48(<genexpr>)
     8276    1.355    0.000    1.355    0.000 {method 'write' of '_io.BufferedWriter' objects}
     8293    0.004    0.000    0.004    0.000 {built-in method builtins.len}
        9    0.000    0.000    0.004    0.000 {built-in method builtins.print}
       18    0.000    0.000    0.004    0.000 iostream.py:526(write)
       18    0.000    0.000    0.003    0.000 iostream.py:456(_schedule_flush)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants