Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentfault when aligning 25X e.coli single-end reads #16

Open
i-xiaohu opened this issue Oct 29, 2020 · 8 comments
Open

Segmentfault when aligning 25X e.coli single-end reads #16

i-xiaohu opened this issue Oct 29, 2020 · 8 comments
Assignees

Comments

@i-xiaohu
Copy link

Hi, whisper developers.
I run the command whisper ref/ref data1.fastq, and whisper (released version 2.0.1) results in

***** Preprocessing of reads *****
100.0%
Completing the preprocessing (could take a minute or so)
Preprocessing time: 2.39412s
** Loading reference and index **
***** Reads mapping *****
** End of mapping **
Main processing time: 43.4175s
***** Postprocessing *****
** Loading reference **
Segmentation fault (core dumped)

The ref is a common e.coli reference sequence, and the data1.fq is 593M, see the first reads down below.

@SRR1562082.1 HWI-ST1336:80:C3CJUACXX:1:1101:2018:2193/1
ATCGCATCCGGGCAGTAGTATTTTGCTTTTTTCAGAAAATAATCAAAAAAAGTTAGCGTGGTGAATCGATACTTTACCGGTTGAATTTGCATCAATTTCAT
+
@B@FFFFDFHHHHJJGFHHFHGGJHIJIJJJJIJJJJJGIIIJJJJJJJJJFEEHHFFFDDAB@CC@BBBABCDECDCBBBBBDCADDDDEEDDDDECCEE

Whisper finally gives an empty SAM file.

Thanks!
i-xiaohu

@agudys agudys self-assigned this Nov 5, 2020
@agudys
Copy link
Member

agudys commented Nov 5, 2020

Hello,

I'll take a look on that ASAP.

Regards,
Adam

@quito418
Copy link

quito418 commented Nov 27, 2021

Hello,

I am also experiencing the same issue.

I followed the guide in the Quick start and met Segmentation Fault.

my commands:

src/whisper-index human ~/human_ref/human_g1k_v37.fasta ./index ./temp/
src/whisper -r -out mappings ./index/human ~/ERR3239276.fq

Error log:

***** Preprocessing of reads *****
100.0%
Completing the preprocessing (could take a minute or so)
Preprocessing time: 2.44478s
** Loading reference and index **
***** Reads mapping *****
** End of mapping **
Main processing time: 201.566s
***** Postprocessing *****
** Loading reference **
Segmentation fault (core dumped)

and when I used GDB I get the below result.

(gdb) bt
#0  0x0000000000486f85 in CSamGenerator::store_mapped_read(unsigned char*, unsigned char*, unsigned char*, unsigned char*, unsig
ned int, unsigned int, unsigned int, unsigned int, unsigned char*&) ()
#1  0x0000000000489190 in CSamGenerator::process_group_se() ()
#2  0x000000000048f828 in CSamGenerator::operator()() ()
#3  0x000000000054da14 in execute_native_thread_routine ()
#4  0x000000000041fb19 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x0000000000615ab3 in clone ()

Thank you!

@agudys
Copy link
Member

agudys commented Dec 7, 2021

Hello,

Sorry it took me so long. I was able to reproduce the error. I'll let you know once it's fixed (this time, I promise to do this sooner ;)).

Adam

@agudys
Copy link
Member

agudys commented Dec 19, 2021

@quito418 @i-xiaohu
I have just commited a fix for the bug you reported. Please let me know if now the single-end mode works properly.

Btw, you don't need to specify -r option at all for the single-end mapping.

@quito418
Copy link

@quito418 @i-xiaohu I have just commited a fix for the bug you reported. Please let me know if now the single-end mode works properly.

Btw, you don't need to specify -r option at all for the single-end mapping.

Thank you for your time.

I will let you know if I have a problem.

Best Regards,

@quito418
Copy link

quito418 commented Dec 20, 2021

@agudys
Hi,

Thank you, I checked it runs well without segfault after the fix.

I just want to make sure everything is working fine.

In particular, I am currently running Whisper with 48 threads for the human genome using 800M 101bp short reads.

./src/whisper -rs -out mappings -t 48 -temp ./temp/ ./index/human /ssd/ERR194147_1.fastq.gz

The post-processing stage takes really long (currently running for like 2 hours) compared to the preceding 2 steps (Preprocessing 735 sec, Read mapping 844 sec).

So I wonder if it is supposed to be like that or if there is a recommendation for the number of threads.

image

  • htop command shows the cores are not fully utilized when using 48 threads
  • I checked that I/O is not a bottleneck by using iotop command
  • I am currently running in the machine with 256GB RAM, Whisper uses ~35GB of memory

I would appreciate any advice.

Best Regards,

@agudys
Copy link
Member

agudys commented Dec 20, 2021

@quito418
I must admit that postprocessing time look strange. In our experiments on 32 cores, approximately 3 hours were needed to perform full paired-end mappings of ~100GB human reads in gz. Maybe there is still something wrong with the single-end mode... Is 48 the physical or logical number of cores at your machine? In the latter case, you could try to reduce number of threads to 24.

Adam

@quito418
Copy link

@agudys
Thanks for the information.

I was using 24 physical cores and 48 logical cores for the experiment.

I will reduce the number of threads for my experiment and update the result here!

Best Regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants