Skip to content

Approximate string matching (Hamming distance) for genomic sequences, running on GPU

License

Notifications You must be signed in to change notification settings

fagostini/Gungnir

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gungnir

Approximate string matching (Hamming distance) for genomic sequences, running on GPU

Requirements

  • NVIDIA CUDA-capable GPU (the list of capable graphic cards is available here)
  • NVIDIA CUDA Toolkit (installation instructions can be found here)

Clone and compile

Clone the repository:

git clone [email protected]:fagostini/Gungnir.git

Compile the source code:

cd Gungnir
mkdir bin
make

Troubleshoot

In case the compilation fails or there is the need for specific parameters, it might be necessary to modify some hardcoded values in the following files.

Makefile

  • The CUDA root directory is set to the default installation path (i.e., '/usr/local/cuda');
  • The default compute capability is 6.1 (i.e., sm_61).

main.cu

  • The maximum length for the input sequences (MAXLEN) and number of mismatches (MISMATCH) are set to 100 and 9, respectively;
  • The maximum total reference length (MAXREF) is set to 3 billion characters;
  • The block size (BLOCKSIZE) is set to 32 in order to maximise the parallel threads.

Usage

The previous step will generate an executable file, namely runHamming, that can be run using the following synthax:

./runHamming <query FASTA> <reference FASTA>

Test [optional]

Run the main test:

make test

The output on the console should be 'Test PASSED', while the output file results.tsv should contain the following lines:

>Reference  TCCAGCGCCCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTG  1  1  1  1  1  0  0  0  0  0
>HamDist_1  TCCAGCGCGCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTG  1  2  1  1  0  0  0  0  0  0
>HamDist_2  TCCAGCGCGCGAGCCGTCCAGGCGCCCAGCAGGAGCAGTG  1  2  2  0  0  0  0  0  0  0
>HamDist_3  TCCAGCGCGCGAGCCGTCCAGGCGCCCAGCTGGAGCAGTG  1  2  1  1  0  0  0  0  0  0
>HamDist_4  TCCAGCGTGCGAGCCGTCCAGGCGCCCAGCTGGAGCAGTG  1  1  1  1  1  0  0  0  0  0

Troubleshoot

If the results.tsv file contains only zeros:

  • Decrease the block size to either 16 or 8
  • Re-compile
  • Re-run the test.

Run the 1'000 vs 10'000 sequences test:

make test10k

Run the 10'000 vs 100'000 sequences test:

make test100k

Run all tests:

make testall

Profile the execution using nvprof (included in CUDA Toolkit >= 10.0):

make profile

The command produces some basic profiling results on the console and saves a more detailed version in the 'extra/runHamming-analysis.nvprof' file, which can be imported into either nvprof or the NVIDIA Visual Profiler (nvvp).

Bugs and Issues

Report bugs as issues on the GitHub repository

Author

License

This project is licensed under the MIT License - see the LICENSE file for details

About

Approximate string matching (Hamming distance) for genomic sequences, running on GPU

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published