Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results when reading sequences from files #9

Open
threadless-screw opened this issue Mar 28, 2019 · 1 comment
Open

Incorrect results when reading sequences from files #9

threadless-screw opened this issue Mar 28, 2019 · 1 comment

Comments

@threadless-screw
Copy link

threadless-screw commented Mar 28, 2019

I ran into some issues with the pairwise alignment of sequences read from different files. Although the exact issue varies from case to case, it seems that the first alignment produces the correct result, while further alignments generally do not. Here's an example:

If a first file "f1" contains the following (identical) sequences:

NEEDLE
NEEDLE
NEEDLE

and a second file "f2" contains the following (identical) sequences:

HAYSTACK NEEDLE HAYSTACK
HAYSTACK NEEDLE HAYSTACK
HAYSTACK NEEDLE HAYSTACK

then running the command "smith_waterman --files f1 f2" returns the output:

== Alignment 0 lengths (6, 24):

hit 0.0 score: 12
  NEEDLE  [pos: 0; len: 6]
  NEEDLE  [pos: 9; len: 6]

==
== Alignment 1 lengths (6, 24):

==
== Alignment 2 lengths (6, 24):

==

The results for Alignment 0 are correct, but the fact that results are absent for Alignments 1 and 2 seems off. Interestingly, the dynamic programming matrices (rendered as output with the --printmatrices option) for the three alignments are the same, as they should be. The problem is therefore likely to be found at a superficial level.

@threadless-screw
Copy link
Author

I have unfortunately not gotten to the bottom of the issue. As far as I can tell, the re-use of the sw and result variables [see sw_cmdline.c] when iterating over the multiple sequences in the files is at the heart of it; it seems a previous iteration leaves its traces into the next. Practically, the problem can be overcome by wrapping each align() function call in its own initialization/destruction routines, like so:

    sw = smith_waterman_new();
    result = alignment_create(256);

    align( ....);
  
    smith_waterman_free(sw);
    alignment_free(result);

Since this somewhat wasteful solution is no more than a patch that leaves the underlying issue unresolved, I have not made a pull request.

Steph042 referenced this issue in Steph042/seq-align Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant