Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bam diff when 2 reads have indentical names AND start positions? #47

Open
anderspitman opened this issue Jan 3, 2018 · 1 comment
Open

Comments

@anderspitman
Copy link

I'm trying to automatically compare BAM files being output by bowtie2 (for a continuous integration system). The input data is identical, but when running on 2 different machines I'm noticing sometimes 2 lines in the BAM files will have the same name/start position, but represent different reads. For whatever reason, bowtie2 sometimes reverses the order of the lines, so bamUtil's algorithm calls a mismatch.

Is there any way to get bam diff to hold off for a few lines to see if there is another record with the same name/start pos that actually matches?

@mktrost
Copy link
Contributor

mktrost commented Jan 13, 2018

Bam diff matches by name and fragment from the flag. In your case, do you have multiple reads in a single file that have the same name and fragment flag such that it is a linear template rather than just paired-end? Bam diff was written with the assumption of paired-end, and currently won't work well (as you are probably seeing) if there are multiple non-first/last reads in the linear template.

Bam diff should hold onto reads until it finds a matching name/flag combination in the other file or until the maximum base pair position between records (posDiff) has been reached or until it reaches the maximum number of records it can hold onto (recPoolSize).

Would you be able to confirm that the issue you are seeing is for linear templates when both 0x40 & 0x80 are set (or both not set) in the flag within multiple records in a single file? If that is the issue you are seeing, I'll look more at linear templates to see if there is an easy way to match beyond just the flag, and also see if there is an easy way to expand the code to enable multiple reads with the same fragment flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants