Approximate string matching (Hamming distance) for genomic sequences, running on GPU
- NVIDIA CUDA-capable GPU (the list of capable graphic cards is available here)
- NVIDIA CUDA Toolkit (installation instructions can be found here)
Clone the repository:
git clone [email protected]:fagostini/Gungnir.git
Compile the source code:
cd Gungnir
mkdir bin
make
Troubleshoot
In case the compilation fails or there is the need for specific parameters, it might be necessary to modify some hardcoded values in the following files.
- The CUDA root directory is set to the default installation path (i.e., '/usr/local/cuda');
- The default compute capability is 6.1 (i.e., sm_61).
- The maximum length for the input sequences (MAXLEN) and number of mismatches (MISMATCH) are set to 100 and 9, respectively;
- The maximum total reference length (MAXREF) is set to 3 billion characters;
- The block size (BLOCKSIZE) is set to 32 in order to maximise the parallel threads.
The previous step will generate an executable file, namely runHamming, that can be run using the following synthax:
./runHamming <query FASTA> <reference FASTA>
Run the main test:
make test
The output on the console should be 'Test PASSED', while the output file results.tsv should contain the following lines:
>Reference TCCAGCGCCCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTG 1 1 1 1 1 0 0 0 0 0
>HamDist_1 TCCAGCGCGCGAGCCGTCCAGGCGGCCAGCAGGAGCAGTG 1 2 1 1 0 0 0 0 0 0
>HamDist_2 TCCAGCGCGCGAGCCGTCCAGGCGCCCAGCAGGAGCAGTG 1 2 2 0 0 0 0 0 0 0
>HamDist_3 TCCAGCGCGCGAGCCGTCCAGGCGCCCAGCTGGAGCAGTG 1 2 1 1 0 0 0 0 0 0
>HamDist_4 TCCAGCGTGCGAGCCGTCCAGGCGCCCAGCTGGAGCAGTG 1 1 1 1 1 0 0 0 0 0
Troubleshoot
If the results.tsv file contains only zeros:
- Decrease the block size to either 16 or 8
- Re-compile
- Re-run the test.
Run the 1'000 vs 10'000 sequences test:
make test10k
Run the 10'000 vs 100'000 sequences test:
make test100k
Run all tests:
make testall
Profile the execution using nvprof (included in CUDA Toolkit >= 10.0):
make profile
The command produces some basic profiling results on the console and saves a more detailed version in the 'extra/runHamming-analysis.nvprof' file, which can be imported into either nvprof or the NVIDIA Visual Profiler (nvvp).
Report bugs as issues on the GitHub repository
This project is licensed under the MIT License - see the LICENSE file for details