Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producing an index for a very large and very fragmented genome assembly #7

Open
andreaswallberg opened this issue Sep 25, 2019 · 1 comment
Assignees

Comments

@andreaswallberg
Copy link

Dear developers,

I would like to try out Whisper for mapping paired end short reads against a very large and fragmented genome assembly (100s of thousands of contigs). It is inconvenient and a file system hog to split such a reference sequence into one individual file for every contig, but my understanding of the instructions for genome indexing is that this is needed as the program does not index FASTA files with multiple accessions in them.

Do I understand the instructions correct? If so, I would like to make a feature request such that Whispercan index mutiple fasta sequence files :-)

Cheers!

@agudys
Copy link
Member

agudys commented Sep 30, 2019

Dear Andreas,

Whisper supports FASTA files with multiple accessions, though we haven't tested it on datasets having so many contigs. If you have problems with running it, please let us know.

Regards,
Adam

@agudys agudys self-assigned this Sep 30, 2019
agudys added a commit that referenced this issue Dec 19, 2021
* GitHub-hosted and self-hosted tests added.
* Small c++17 compatibility issue fixed
* Reads longer than 128bp works properly on CPUs without AVX2.
* Lot of compilation warinings removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants