Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom config for seadragon cluster of MD Anderson Cancer Center #831

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jiawku
Copy link

@jiawku jiawku commented Jan 18, 2025


name: New Custom Config For Seadragon
about: A new cluster config For MD Anderson Cancer Center Seadragon Cluster

Please follow these steps before submitting your PR:

  • If your PR is a work in progress, include [WIP] in its title
  • Your PR targets the master branch
  • You've included links to relevant issues, if any

Steps for adding a new config profile:

  • Add your custom config file to the conf/ directory
  • Add your documentation file to the docs/ directory
  • Add your custom profile to the nfcore_custom.config file in the top-level directory
  • Add your custom profile to the README.md file in the top-level directory
  • Add your profile name to the profile: scope in .github/workflows/main.yml
  • OPTIONAL: Add your custom profile path and GitHub user name to .github/CODEOWNERS (**/<custom-profile>** @<github-username>)

@jfy133
Copy link
Member

jfy133 commented Jan 18, 2025

@nf-core-bot fix linting

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only blocker is the missing resourceLimits options rest are suggestions

}

env {
SINGULARITY_CACHEDIR="/home/$USER/.singularity/cache"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this works, e.g. using --custom_config_base with your pipeline of interest?

For some config scopes Nextflow will interpret that as a Nextflow variable rather than base. I think env is OK though, but likely worth to double check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi James, thank you for the comment. I have tested the config with the RNA seq and Sarek pipeline. Neither of them reported errors here. So it should be fine.

envWhitelist='APPTAINERENV_NXF_TASK_WORKDIR,APPTAINERENV_NXF_DEBUG,APPTAINERENV_LD_LIBRARY_PATH,SINGULARITY_BINDPATH,LD_LIBRARY_PATH,TMPDIR,SINGULARITY_TMPDIR'
autoMounts = true
runOptions = '-B ${TMPDIR:-/tmp}'
cacheDir = "/home/$USER/.singularity/cache"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

Comment on lines 101 to 103
max_memory = 950.GB // Maximum memory based on E80 nodes
max_cpus = 80 // Maximum CPUs based on E80 nodes
max_time = 240.h // Maximum runtime for long queues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this would mean that you will never be able to access the high-mem nodes.

Furthermor,e you need to replicate these values in the nextflow native resourceLimits directive - e.g.

process {
resourceLimits = [
memory: 1992.GB,
cpus: 128,
time: 168.h
]
executor = 'slurm'
queue = 'qbic'
scratch = 'true'
}

max_* have been deprecated in more recent nf-core pipelines, but you should still keep max_ for backards compatibility with older pipelines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uploaded the max_memory to make highmem and vhighmem nodes available

docs/seadragon.md Outdated Show resolved Hide resolved

## Notes

- **Data Storage**: All intermediate files will be stored in the `work/` directory within the job's launch directory. These files can consume significant space, so it is recommended to delete this directory after the pipeline completes successfully.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could instead add cleanup = true to the config, so all files in this directory get deleted when a run completes successfully (if it fails, the intermediate files don't get deleted)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

cleanup = true

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi James, thanks for pointing that out. The reason I keep the work folder is that I may need to use the BAM files from the Sarek pipeline for other custom analyses. Is there a more elegant way to retain the BAM files? Perhaps by custom-defining them as final outputs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants