-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRATOOLS_FASTERQDUMP failing due to missing fastq file #317
Comments
Can you try to run that ID manually, please? Some files are simply broken. |
If I run I see the same error for all the about 40 SRATOOLS_FASTERQDUMP processes, so I assume the same is happening for all of them |
Can you show your custom config, please? |
Yes sure. Here it is. custom.conf
|
You overwrote the default args, can you add them back as well.
|
Hi. Using the custom config, I'm adding the If I get this right, I'm not altering the option used by Am I missing something? Can it be that the issue is generated by SRR records that are registered as PAIRED in the SRA Archive but apparently contain only one read? See this example https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR16208945&display=metadata |
That's my bad, I didn't pay attention to you customizing prefetch rather than dump, of course. I can try looking at this ID myself later. In the meantime, can you remove IDs that cause errors and only run the rest? Why did you choose sratools for download rather than files? |
I ran the pipeline with default settings, and my understanding is that the default download method is FTP, right? The pipeline triggered the sratools workflow automatically for some of the entries, I guess when FASTQ files were unavailable from the FTP source? What I can see is that 92 successful executions of the SRA_FASTQ_FTP occurred, and the SRA workflow was also run for some entries. |
I just looked at one of the runs and the read count is reported as zero. So something seems off with that run. https://www.ebi.ac.uk/ena/browser/view/SRR16208975 |
Okay, I guess, the problem has a simple cause. The runs that give you problems are marked as having a paired library layout. However, in reality they seem to be single reads. Due to the pipeline expecting paired reads, the module passes an option for the output without file extension, because normally fasterq-dump will then add suffixes and extensions. def outfile = meta.single_end ? "${prefix}.fastq" : prefix However, since those runs are single reads, the extension is not added, which then causes pigz to fail. Perhaps, a direct way to solve this is, if you create your own process {
withName: SRATOOLS_FASTERQDUMP {
ext.prefix = { "${meta.id}.fastq" }
}
} |
Hi, Yes, this was also my hypothesis. Unfortunately, this is not the first time I have seen this kind of inconsistency in single-cell datasets in SRA. Not sure there is a way for the pipeline to catch this inconsistency from the SRA record metadata (data type == PAIRED, but the number of reads in the record == 1) and then generate a more informative warning message. Thanks for your suggestion. I will give it a try. Meanwhile, I went around it by editing the module script and adding a couple of lines in bash to check for the presence of files with a non-zero size, the expected name, but no fastq extension, and eventually adding the Thanks again for investigating this! |
Description of the bug
I have a list of SRR IDs in a file and am trying to download them using the pipeline. However, I constantly get an error at the SRATOOLS_FASTERQDUMP step.
This is an example of a failing
.command.sh
From the log, it seems that no fastq files are found in the folder when attempting to run
pigz
afterfaster-dump.
Inspecting the working folder, I can see a file namedSRX12493307_SRR16208975
, which contains reads but has no .fastq extension, and thus, the Pigz command fails.Command used and terminal output
Here,
custom.conf
is used to set--max-size 50g
since some of the datasets are larger than 20Gb.This is the error message from Nextflow.
The text was updated successfully, but these errors were encountered: