Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve unmapped reads through SetReadSection(chr,beg,end) failed! #11

Open
Griffan opened this issue Mar 6, 2016 · 4 comments
Open

Comments

@Griffan
Copy link

Griffan commented Mar 6, 2016

I tried to retrieve unmapped reads via setting SetReadSection("*",-1,-1) or SetReadSection("",-1,-1), but failed in both ways. I wonder if I used it in a wrong way.
I also tested it using bam file only contains unmapped reads, which results in failure at SamRecord.cpp:625 line

if(ifread(filePtr, &(myRecordPtr->myReferenceID), myRecordPtr->myBlockSize) != (unsigned int)myRecordPtr->myBlockSize)
{
// Error reading the record. Reset it and return failure.
resetRecord();
std::string statusMsg = "Failed to read the record, ";
statusMsg += filePtr->getFileName();
statusMsg += ".";
myStatus.setStatus(SamStatus::FAIL_IO, statusMsg.c_str());
return(SamStatus::FAIL_IO);
}

The snapshot in GDB:
(gdb) p myRecordPtr->myBlockSize $13 = 21840194

Anybody experienced this problem before?

@mktrost
Copy link
Contributor

mktrost commented Mar 6, 2016

Either of those settings should work (and did when I just tested them). Is your index valid? That block size seems very large. Can you point me to your test files or your code that is calling SetReadSection, and I'll see if I can figure out what is going wrong.

@Griffan
Copy link
Author

Griffan commented Mar 6, 2016

The bam file and index:
/net/wonderland/home/fanzhang/1000g/data/9.bam2Fastq/read_data_test/unmap.bam
/net/wonderland/home/fanzhang/1000g/data/9.bam2Fastq/read_data_test/unmap.bam.bai

The source files lies in:
/net/wonderland/home/fanzhang/1000g/data/9.bam2Fastq/gotcloud/src/bamUtil/src/MateVectorByRN.cpp

SetReadSection function called at line 165

Thanks a lot!

@mktrost
Copy link
Contributor

mktrost commented Mar 6, 2016

Are you only running into this problem when you have a file that is just unmapped reads?
That is something I hadn't thought of before - the index file contains no offset into the file. When the code asks the Bam Index file where the previous chromosome ends, it returns 0, and the code assumes unmapped starts at 0 (after the last chromosome).

So yes, I guess that is a bug - it isn't the easiest fix as it will require a change in logic. See BamIndex.cpp lines 234-244. MaxOffset it coming back as 0. We may need to adjust SamFile around 1194 to special handle if chunk_beg is 0 and we are looking for unmapped reads - in that case, it needs to jump to after the header.

I'm heading out on vacation and won't be able to make changes to fix this right now.

@Griffan
Copy link
Author

Griffan commented Mar 6, 2016

I added some mapped reads into this file, and bypassed this problem. Normal bam file should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants