Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the issue #811 and #812, which leading the pipeline break #813

Closed
wants to merge 7 commits into from

Conversation

Dedaniya08
Copy link

@Dedaniya08 Dedaniya08 commented Dec 9, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

This comment was marked as resolved.

@Dedaniya08 Dedaniya08 changed the base branch from master to dev December 9, 2024 08:39
@Dedaniya08
Copy link
Author

The issue of pipeline breaking is caused at two steps for issue #811 and #812 . I have added the retry max 3 attempt which has lead to resolving the both the error in both the files
second_step

Both the step are resolved.

@d4straub
Copy link
Collaborator

d4straub commented Dec 9, 2024

Hi there, thanks for the PR. However, this is a very bad idea to retry for any error code. This can lead to a lot of confusion. I oppose that change.
We typically only retry on specific error codes, see

errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
.
I do not want to change the behavior that when there is a reproducible, justified error, a retry is launched for no reason. Thats why we do not retry everything just on default. In the overwhelming amount of cases thats just a waste of resources.

If that change solves you problem thats great, but there is no need to change pipeline code instead you can achieve this also by using -c azure.conf that contains:

process {
    errorStrategy = 'retry'
}

or use more process targeted approach as in e.g.

withName:QIIME2_EXTRACT {
cpus = { 12 * task.attempt }
memory = { 12.GB * task.attempt }
time = { 24.h * task.attempt }
}

@Dedaniya08
Copy link
Author

The code is encountering frequent issues. In this PR, step #811 focuses on renaming the files, and step #812 maps the filtered files. I verified the files after a retry, and the output remains consistent. There are no discrepancies in the files before and after applying this PR.

The recurring issue is due to Azure's local environment and the process of extracting files from the data lake. This behavior will persist without the changes introduced here. Therefore, I strongly recommend including this PR to address the problem effectively.

@d4straub
Copy link
Collaborator

Thanks for explaining your reasoning. I do believe that the resulting files are fine and I acknowledge that this proposed change will solve the issue on Azure. But it will negatively affect the execution of the pipeline on any other platform (retry for all exit codes of RENAME_RAW_DATA_FILES & DADA2_FILTNTRIM, which is not desirable). I would prefer a fix that doesn't have a negative impact at all.
I'll try to lure some more people here to get more opinions.

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue seems related to the Azure infrastructure setup and is not general to all users of the pipeline, or really pipeline specific. I do not think that we should apply blanket retry for all users like this.

Alternatives would be setting this on user configs instead, or better still adding to the azurebatch institutional config as a pipeline-specific config.

The latter is already set up in this pipeline so no changes needed here, purely nf-core/configs repo. Need to create a new config file there and then include it here.

modules/local/dada2_filtntrim.nf Outdated Show resolved Hide resolved
modules/local/rename_raw_data_files.nf Outdated Show resolved Hide resolved
@ewels
Copy link
Member

ewels commented Dec 11, 2024

It's also possible that the retry solves the issue here because the retry attempts give the task more memory (the task.attempt multiplier). This could be needed when pulling data from a data lake, which is a slightly non-standard method of data staging.

memory = { 6.GB * task.attempt }

So something else that you could try in the azure-specific pipeline config is to assign more memory to these tasks. Then they might not fail in the first place and you may not need the retry.

Dedaniya08 and others added 2 commits December 11, 2024 15:42
without setting maxRetries as 3 , just considering the errorStrategy

Co-authored-by: Phil Ewels <[email protected]>
@Dedaniya08 Dedaniya08 requested a review from ewels December 11, 2024 10:14
@ewels
Copy link
Member

ewels commented Dec 11, 2024

@d4straub - suggest we close this PR and @Dedaniya08 you can open one to nf-core/configs with the relevant changes.

@d4straub
Copy link
Collaborator

Yes I agree @ewels , better solved in nf-core/configs. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants