Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the MRC LMS Jex cluster #561

Merged
merged 1 commit into from
Nov 10, 2023
Merged

Conversation

A-N-Other
Copy link
Contributor

@A-N-Other A-N-Other commented Oct 2, 2023


name: MRC LMS Jex cluster config
about: Adding the MRC LMS Jex cluster

Please follow these steps before submitting your PR:

  • If your PR is a work in progress, include [WIP] in its title
  • Your PR targets the master branch
  • You've included links to relevant issues, if any
  • Requested review from @nf-core/maintainers and/or #request-review on slack

Steps for adding a new config profile:

  • Add your custom config file to the conf/ directory
  • Add your documentation file to the docs/ directory
  • Add your custom profile to the nfcore_custom.config file in the top-level directory
  • Add your custom profile to the README.md file in the top-level directory
  • Add your profile name to the profile: scope in .github/workflows/main.yml

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's check if all tests are successful

conf/jex.config Show resolved Hide resolved
@pontus
Copy link
Contributor

pontus commented Oct 3, 2023

Looks good (although it looks a bit odd to me with a 4 Tbyte node with only 16 cores).

@A-N-Other
Copy link
Contributor Author

A-N-Other commented Oct 3, 2023

Looks good (although it looks a bit odd to me with a 4 Tbyte node with only 16 cores).

@pontus hmem is actually up to 64 but I don't want jobs being pushed into the hmem queue just because of cpu requests (rather than RAM, obviously) when people use { check_max( x * task.attempt, 'cpus' ) } closures for job resubmissions. I set the max_cpus such that all jobs will fit within the cpu partition, therefore. Is there a better way to do this?

// EDIT // Would this work as I'd anticipate ... ?

process {
  executor = 'slurm'
  queue = {
    if ( task.time <= 6.h && task.cpus <= 8 && task.memory <= 64.GB ) {
      'nice'
    } else if ( task.memory > 256.GB ) {
      params.max_cpus = 64
      params.max_time = 7.d
      'hmem'
    } else {
      'cpu'
    }
  }
  clusterOptions = '--qos qos_batch'
}

@pontus
Copy link
Contributor

pontus commented Oct 4, 2023

No, there's no great solution for that. I haven't verified but would expect setting params in such a closure does not work.

One possible (not great) solution could be to add pipeline specific configurations changing the number of cpus requested for the problematic processes, but I maybe would also have taken the shortcut here of just using 16, even though it will waste a lot of cores on the hmem machine (I assume the scheduler is set to allow high-memory low-core jobs).

@A-N-Other
Copy link
Contributor Author

A-N-Other commented Oct 4, 2023

I hadn't seen an example of dynamic setting of max_* in any of the other configs, so that was what I'd figured.

I'd prefer not having the hmem partition clogged at the expense of limiting cpus overall, so I'll leave it as it is. Fairly rare that any nf process is going to be requesting 4T anyway, so it still leaves the rest of the capacity for other jobs that SLURM can backfill into the space.

@jfy133 jfy133 merged commit f1d5daa into nf-core:master Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants