Performance problems with large insertions #9

martijnvermaat · 2015-05-31T15:56:24Z

As you can see, comparing AAAA... with A is instant, but comparing A with AAAA... takes a lot of time:

In [5]: %timeit extractor.describe_dna('A' * 10000, 'A')
10000 loops, best of 3: 129 µs per loop

In [6]: %timeit extractor.describe_dna('A', 'A' * 10000)
1 loops, best of 3: 1.13 s per loop

Perhaps more importantly, memory usage also sky rockets. I couldn't run this test with a 50 Kbp sample sequence on a machine with 4G memory, completely freezing my machine for half a minute. I would like to prevent this from happening on the server.

I didn't look into this further, but I suspect it tries to find the inserted sequence in the original sequence, which of course is not possible. Could this be an easy case to optimize?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance problems with large insertions #9

Performance problems with large insertions #9

martijnvermaat commented May 31, 2015

Performance problems with large insertions #9

Performance problems with large insertions #9

Comments

martijnvermaat commented May 31, 2015