Major changes:

The project has moved here:

https://github.com/bioinfo-pf-curie/mpiBWA

Your are in the master branch of the mpiBWA project.

We recommand you to go to the Experimental branch (git checkout Experimental + git pull). The experimental branch has full reproducibility and a better scaling. This branch will become the master in futur.

Other branches (master, FULLMPI, LAZYCHUNK ) are history.

Release notes

Release 1.0 from the 10/07/2018

Changes in Experimental branch

Fix a bug during the mapping in shared memory of the reference genome This bug didn't appear with openMPI version but Intel compiler complains.
Creation of a google group
https://groups.google.com/forum/#!forum/hpc-bioinformatics-pipeline
To improve performances on Lustre file system removed the “suid” mount option (rw,nosuid,flock,lazystatfs)
With Beegfs set the "flock" to "on" for reproducibility.

Release 1.0 from the 30/04/2018

Changes in Experimental branch

Add support for trimmed reads.
Be aware of the flock mode on parallel file system (Lustre, beegfs): flock must be on. Otherwise reproducibility is not guaranteed.

Release 1.0 from the 21/03/2018

Fix an inversion in file handle (line 1039 and 1056)

Release 1.0 from 23/12/2017

changes in Experimental branch

100% reproducibility with the pipeline control (bwa mem -t 8)
remove memory limits now offset are computed on 1gb buffer
remove some memory leaks
optimization of code

Release 1.0 from 04/12/2017

Changes in the branch Experimental.

Fix a memory leak.

Release 1.0 from 01/12/2017

Changes in the branch Experimental.
Fix the mpi_read_at buffer limit size.

Release 1.0 from 29/11/2017

Add a new branch called Experimental.
Warning: This is experimental work do not use in production. But test it and send us reports.

Rationnal:

The master and FULLMPI branches are made for full reproducibility (independant to the number of jobs) and accuracy but they reach the Amdah'ls law point. Indeed the locking file RMA implementation (the serialization when computing offsets) introduces a bottle neck we are not able to overpass with RMA technics.

This is why we have implemented a new algorithm. Now a lot more master jobs are responsible for chuncking the data the way bwa does and present it to bwa-mem aligner. And instead of doing it linearly on the fastq now they do it independently and in parallel. With a little inter communication they adjust the chunks offets and sizes. This method removes the serialization bottle neck.

As in the master and FULLMPI branches we obtain a full reproducibility and with a better efficiency and scalability.

When testing this branch make sure the total number of jobs you take is (master jobs) * 8.
8 is the number of aligner threads used by bwa-mem.
According to bwa-mem policy all chunks are 10e6 by the number of threads nucleotide bases big.

Remark: The initial buffer of each master jobs is limited to 2gb (due to mpi_read_at buffer size).
This version does not work on trimmed reads.

First results test on broadwell.

Sample:

NA12878 Illumina 300X 2x150 WGS from GIAB chinese trio.

The alignement is done with 352*8 = 2816 cpu
352 = (Forward Fastq size in gb) / 2g, 8 is thenumber of bwa-mem aligner jobs (8 per master jobs)

MPI parameters :
mpi_run -n 352 -c 8
or :
MSUB -n 352
MSUB -c 8

alignment time: 26 mn
Time to compute chunks: 8s

Reproducibility: the pipeline has been tested tested with 5632 and 2816 cpu results are the same.

Next step:

remove initial buffer size limitation
support trimmed reads
need tests on low throughput infrastructures

Release 1.0 from 21/11/2017

Changes in LAZYCHUNCK branch:

forget a end condition

Release 1.0 from 20/11/2017

Changes in LAZYCHUNCK branch:

remove memory leaks
update results section
test with 600Gb (x2) fastq files and 100 jobs: ok. (20 mn to approximate chuncks offset)
reproducibility with constant number of jobs: ok.

Notes: If want 100% reproducibility whatever the number of jobs use the master branch or the FULLMPI branch. But if you want go faster use LAZYCHUNK branch.

Release 1.0 from 17/11/2017

Changes in LAZYCHUNCK branch

Improvement of the algorithm for window sizing.
Change the number of bases for the estimation line 541.

Release 1.0 from 15/11/2017

Changes in LAZYCHUNK branch

Due to a mmap in the previous version the virtual memory may be big when the fastq file is large. Some scheduler like Torque doesn't like this. We review the algorithm of computing chuncks. first we divide the file in window of size (number of jobs) * 1Gb. On each window jobs approximate chuncks of 10 mega bases. this way the virtual memory stay low.
We also saw problem on some architecture with MPI file read shared we replace it with MPI file read at.
We have also other projects of parallelization (sorting, marking duplicates, clustering... ) and we are looking for people willing to help us for developments and tests. Don't hesitate to contact me ([email protected]).

Release 1.0 from 30/06/2017

Major changes:

28/11/2017

884482b956169fabe0e8696021fc0092c403d86f

First release

We have implemented a new algorithm. In this algorithm you have master jobs and aligner jobs. Master jobs are responsible for computing offset and chunk sizes. This way all the chuncks have the same number of bases. Then master jobs send chunks to bwa-mem and are aligned. Finally master jobs write ni the result sam file.

the total number of jobs muste be: (number of master) * 8 8 is the number of threads used by bwa-mem.

First result test on broadwell of TGCC

Condition: Due to mpi_read_at buffer limit of 2g. You should limit the initial buffer read to 2gb per master jobs. Futur release will remove this condition.

Tested on the SRR2052 WGS with 352*8 cpu alignment time: 26 mn Time to compute chunks: 8s Scalability : ok Reproducibility : ok Command line :mpi_run -n 352 -c 8 or : MSUB -n 352 MSUB -c 8

This version does not support trimmed read yet.

Installation

git clone https://github.com/fredjarlier/mpiBWA.git git checkout Experimental git pull .export PATH=/PATH_TO/automake-1.15/bin:/PATH_TO/autoconf-2.69/bin:$PATH ./configure CC=/PATH_TO/mpicc make && make intall

Requirements

You need a C compiler as required for classic BWA program.

You need to install a version of openMPI. (see: https://www.open-mpi.org/)

You need a mpi compiler too. to check your mpi installation tell in a command window whereis mpirun, normally it is installed in /usr/bin/mpirun.

You need a low latency network and a parallel file system to use this branch

Your reads should be paired.

Options

All the BWA-MEM options are available in this version, of course according to the bwa release wrapped.

Known issues:

Primary hits are reproduced between the serial version and the parallel but you can see differences in mapping position for alternate contigs. This problem stems from the randomization of multi-hits reads. When running with the same number of MPI jobs alternative positions are reproduced but when the number of jobs varies the positions can switch for secondary alignments.

How to integrate further version

This version of mpiBWA has been build with 0.7.15 BWA version. To integrate the 0.7.+:

git clone the 0.7.+ of BWA.
in the folder of bwa copy-pass the following function from mpiBWA:

makefile main_parallel_version.c pidx.c

then make.

Compilation

You need automake 1.15 for the installation. You can install automake and autoconf in differents directories and export the path like this: export PATH=../automake-1.15/bin:../autoconf-2.69/bin:$PATH

Download from git. In the folder mpiBWA type: ./configure && make install && make

or for distribution: make dist tar xzf .tar.gz cd pbwa7-1.0 ./configure && make install && make

for passing mpi path: ./configure CC=mpi_bin_path add --prefix in configure if you need

Results are 2 executables pidx and pbwa7.

Build a reference

After the creation of the reference file with BWA, you need to build a mapped reference genome. To do that: pidx my_ref.fa (where my_ref.fa has been build with BWA). Pidx build a reference with the extension .map (my_ref.fa.map). This reference will be mapped in share memory.

The pidx needs the following files my_ref.fa.sa, my_ref.fa.bwt, my_ref.fa.ann, my_ref.fa.pac, my_ref.fa.amb to construct the fa.map. Those files are generated with bwa index. It works also with the fasta.

How to manage the multithreading

You have the option to use the multithreading options. To do that you fix the number of nodes with mpirun and then the number of threads per nodes with the -t.

Examples:

1 server with 8 threads per server: mpirun -n 1 pbwa7 mem -t 8...

2 servers with 10 threads per servers mpirun -n 2 pbwa7 mem -t 10...

And so on.

Example of command lines

Example of bash to run the pbwa7 with openmpi 1.10.7

We launch 30 master jobs with 8 threads each it make a total of 240 jobs

ERROR=$PATH/error.txt SAM=$PATH/mysample.sam TASKS=30 PPN=8 mpirun -np $TASKS --map-by ppr:$PPN:socket:pe=$PE $PBWA mem -t 8 -o $SAM $REF $FASTQ1 $FASTQ2 &> $ERROR

Results

Here results wo obtain from test realized with TGCC (Très Grand Centre de Calcul - Bruyères le Chatel - France).

Remarks

Do not type the .map extension when you give the reference to pbwa7.
Hybrid mode is possible with -t options. MPI rank fix the number of servers and -t options the number of threads per job.
For Lustre or parallel file system users: If you intend to run the mpiSort after the alignment you can tell the striping of the results. This is done according to the striping you set in the mpiSort program with the Lustre "lfs setstripe" command (lfs setstripe -c -1 -s 2gb .). The "-c -1" option tells Lustre to use all the file system servers and and "-s 2gb" is the size of contigues data blocks. For speed purpose reading commands (particularly MPI commands) are aligned on those blocks.

Notes: The striping (like with Hadoop) is the way your data is distributed among servers. This technic accelerates access files and meta file information.

Make sure you have the same version of MPI for compiling and running.
From our experiences some scheduler do not interpret shared memory and jobs ends up stuck for overloading memory. In this case take less than 1gb per job per threads (if you take 8 threads). This is approximately the total reference plus the chunk size.

Questions:

We have created a google forum for questions.

https://groups.google.com/forum/#!forum/hpc-bioinformatics-pipeline

Feel free to ask any question

Future work:

Overlap writing, reading and aligning.

References:

This work is based on the original bwa aligner written by Li et al.

Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. [PMID: 20080505]

Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v1 [q-bio.GN]

Latham R. et al. (2007) Implementing MPI-IO Atomic Mode and Shared File Pointers Using MPI One-Sided Communication Authors

The program has been developed by

Frederic Jarlier from Institut Curie and Nicolas Joly from Institut Pasteur

and supervised by

Philippe Hupe from Institut Curie

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
fastq_samples		fastq_samples
.gitexcludes		.gitexcludes
.gitignore		.gitignore
INSTALL		INSTALL
LICENSE.txt		LICENSE.txt
Makefile.am		Makefile.am
Makefile.in		Makefile.in
QSufSort.c		QSufSort.c
QSufSort.h		QSufSort.h
README.md		README.md
README.txt		README.txt
Results_TGCC_Broadwell.jpg		Results_TGCC_Broadwell.jpg
TODO		TODO
aclocal.m4		aclocal.m4
ar-lib		ar-lib
bamlite.c		bamlite.c
bamlite.h		bamlite.h
bntseq.c		bntseq.c
bntseq.h		bntseq.h
bwa.c		bwa.c
bwa.h		bwa.h
bwamem.c		bwamem.c
bwamem.h		bwamem.h
bwamem_extra.c		bwamem_extra.c
bwamem_pair.c		bwamem_pair.c
bwape.c		bwape.c
bwase.c		bwase.c
bwase.h		bwase.h
bwaseqio.c		bwaseqio.c
bwashm.c		bwashm.c
bwt.c		bwt.c
bwt.h		bwt.h
bwt_gen.c		bwt_gen.c
bwt_lite.c		bwt_lite.c
bwt_lite.h		bwt_lite.h
bwtaln.c		bwtaln.c
bwtaln.h		bwtaln.h
bwtgap.c		bwtgap.c
bwtgap.h		bwtgap.h
bwtindex.c		bwtindex.c
bwtsw2.h		bwtsw2.h
bwtsw2_aux.c		bwtsw2_aux.c
bwtsw2_chain.c		bwtsw2_chain.c
bwtsw2_core.c		bwtsw2_core.c
bwtsw2_main.c		bwtsw2_main.c
bwtsw2_pair.c		bwtsw2_pair.c
compat.c		compat.c
compile		compile
configure		configure
configure.ac		configure.ac
depcomp		depcomp
example.c		example.c
fastmap.c		fastmap.c
install-sh		install-sh
is.c		is.c
kbtree.h		kbtree.h
khash.h		khash.h
kopen.c		kopen.c
kseq.h		kseq.h
ksort.h		ksort.h
kstring.c		kstring.c
kstring.h		kstring.h
ksw.c		ksw.c
ksw.h		ksw.h
kthread.c		kthread.c
kvec.h		kvec.h
main.c		main.c
main_parallel_version.c		main_parallel_version.c
malloc_wrap.c		malloc_wrap.c
malloc_wrap.h		malloc_wrap.h
missing		missing
pemerge.c		pemerge.c
pidx.c		pidx.c
utils.c		utils.c
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Release notes

Major changes:

Installation

Requirements

Options

Known issues:

How to integrate further version

Compilation

Build a reference

How to manage the multithreading

Example of command lines

Results

Remarks

Questions:

Future work:

References:

About

Releases

Packages

Contributors 2

Languages

License

fredjarlier/mpiBWA

Folders and files

Latest commit

History

Repository files navigation

Release notes

Major changes:

Installation

Requirements

Options

Known issues:

How to integrate further version

Compilation

Build a reference

How to manage the multithreading

Example of command lines

Results

Remarks

Questions:

Future work:

References:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages