-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for reading from standard input? #29
Comments
Hi mikemc, It is (unofficially) supported. :) You can pipe fasta format into BURST, but be sure to use "/dev/stdin" as the input filename. This feature hasn't been tested much. Thanks, |
One more comment -- BURST is designed to map multiple samples of shallow shotgun sequencing simultaneously, rather than one sample at a time. This is the reason for its fasta format requirement. Preprocessing is expected to take place with SHI7 (my QC toolkit), as this toolkit multiplexes multiple samples into a single "combined_seqs.fna" fasta file, much like QIIME1. In this way, there is little distinction between amplicon and shallow shotgun sequencing -- both can be preprocessed by SHI7 in the same way, multiplexed in the same way, and aligned all together with BURST. BURST performs faster, and its capitalist algorithm performs better, when there are more queries in the same input file (scaling sublinearly with sequence depth, provided there is enough RAM to house the entire combined_seqs.fna). It is not the expectation that users simply convert existing fastq's to fasta format and run alignment (although doing so would still work). The expected pipeline is:
Hope this clarifies! |
Thanks Gabe, both clarifications are very helpful. I have been BURST as a point of comparison for BBMap/BBSplit in a BBTools-based pipeline, so haven't bothered with SHI7 but will look into it for future use, especially if I were to use Capitalist. (Allpaths and Forage modes are most useful for my current needs) |
My understanding is that currently BURST can only take in reads as an unzipped Fasta file. I am wondering if taking reads from standard input might be implemented at some point. Here are two use cases where I would find it helpful to be able to pass the reads in from standard input:
My sequence data is stored in Gzipped Fastq files, but BURST wants unzipped Fasta files. It would be handy if I could simply pipe the output from a tool like
seqtk
(which can convert to unzipped Fasta format) into BURST, rather than having to save the intermediate Fasta files, which I don't need for any other purpose. In this way, allowing standard input partially gets around BURST's atypically need for reads as unzipped Fasta files.I am testing mapping pipelines on simulated reads. It would be handy sometimes to be able to pipe the simulated reads right into BURST.
The text was updated successfully, but these errors were encountered: