Skip to content

Commit

Permalink
Merge branch 'master' into update-nf-customize
Browse files Browse the repository at this point in the history
  • Loading branch information
vdauwera authored Oct 4, 2024
2 parents 5969095 + 7fc27fc commit 6432f6e
Show file tree
Hide file tree
Showing 44 changed files with 1,214 additions and 636 deletions.
33 changes: 33 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"name": "nfcore",
"image": "nfcore/gitpod:latest",
"remoteUser": "gitpod",

// Configure tool-specific properties.
"customizations": {
// Configure properties specific to VS Code.
"vscode": {
// Set *default* container specific settings.json values on container create.
"settings": {
"python.defaultInterpreterPath": "/opt/conda/bin/python",
"python.linting.enabled": true,
"python.linting.pylintEnabled": true,
"python.formatting.autopep8Path": "/opt/conda/bin/autopep8",
"python.formatting.yapfPath": "/opt/conda/bin/yapf",
"python.linting.flake8Path": "/opt/conda/bin/flake8",
"python.linting.pycodestylePath": "/opt/conda/bin/pycodestyle",
"python.linting.pydocstylePath": "/opt/conda/bin/pydocstyle",
"python.linting.pylintPath": "/opt/conda/bin/pylint"
},

// Add the IDs of extensions you want installed when the container is created.
"extensions": ["ms-python.python", "ms-python.vscode-pylance", "nf-core.nf-core-extensionpack"]
}
},
"portsAttributes": {
"3000": {
"label": "Application",
"onAutoForward": "openPreview"
}
}
}
13 changes: 13 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.1.0"
hooks:
- id: prettier
additional_dependencies:
- [email protected]

- repo: https://github.com/editorconfig-checker/editorconfig-checker.python
rev: "2.7.3"
hooks:
- id: editorconfig-checker
alias: ec
6 changes: 3 additions & 3 deletions docs/advanced/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ These configuration values would be inherited by every run on that system withou

## Overriding for a run - `$PWD/nextflow.config`

Move into the chapter example directory:
Create a chapter example directory:

```
cd configuration
mkdir configuration && cd configuration
```

### Overriding Process Directives
Expand Down Expand Up @@ -72,7 +72,7 @@ Glob pattern matching can also be used:

```groovy
process {
withLabel: '.*:INDEX' {
withName: '.*:INDEX' {
cpus = 2
}
}
Expand Down
167 changes: 134 additions & 33 deletions docs/basic_training/cache_and_resume.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ It’s good practice to organize each **experiment** in its own folder. The main
The `nextflow log` command lists the executions run in the current folder:

```console
$ nextflow log
nextflow log
```

```console title="Output"
Expand Down Expand Up @@ -146,7 +146,7 @@ nextflow log tiny_fermat
The `-f` (fields) option can be used to specify which metadata should be printed by the `log` command:

```console
$ nextflow log tiny_fermat -f 'process,exit,hash,duration'
nextflow log tiny_fermat -f 'process,exit,hash,duration'
```

```console title="Output"
Expand All @@ -167,7 +167,7 @@ nextflow log -l
The `-F` option allows the specification of filtering criteria to print only a subset of tasks:

```console
$ nextflow log tiny_fermat -F 'process =~ /fastqc/'
nextflow log tiny_fermat -F 'process =~ /fastqc/'
```

```console title="Output"
Expand Down Expand Up @@ -283,58 +283,159 @@ process FOO {
val x
output:
tuple val(task.index), val(x)
stdout
script:
"""
sleep \$((RANDOM % 3))
echo -n "$x"
"""
}
process BAR {
input:
val x
output:
stdout
script:
"""
echo -n "$x" | tr '[:upper:]' '[:lower:]'
"""
}
process FOOBAR {
input:
val foo
val bar
output:
stdout
script:
"""
echo $foo - $bar
"""
}
workflow {
channel.of('A', 'B', 'C', 'D') | FOO | view
ch_letters = channel.of('A', 'B', 'C', 'D')
FOO(ch_letters)
BAR(ch_letters)
FOOBAR(FOO.out, BAR.out).view()
}
```

Just like you saw at the beginning of this tutorial with HELLO WORLD or WORLD HELLO, the output of the snippet above can be:
Processes FOO and BAR receive the same inputs, and return something on standard output. Somebody wrote process FOOBAR to receive those processed outputs and generate combined output, from matched inputs. But even though we set the order of the input values, the FOO and BAR processes output whenever their tasks are complete, not respecting input order. So FOOBAR receiving the outputs of those channels, will not receive them in the same order, as you can see from the output:

```console title="Output"
[0, A]
[3, C]
[4, D]
[2, B]
[1, A]
...
B - c

D - a

C - d

A - b
```

..and that order will likely be different every time the workflow is run.
So D is matched with 'a' here, which was not the intention. That order will likely be different every time the workflow is run, meaning that the processing will not be deterministic, and caching will also not work, since the inputs to FOOBAR will vary constantly.

!!! question "Exercise"

Re-run the above code a couple of times using `-resume`, and determine if the FOOBAR process reruns, or uses cached results.

??? solution

You should see that while FOO and BAR reliably re-use their cache, FOOBAR will re-run at least a subset of its tasks due to differences in the combinations of inputs it recieves.

Imagine that you now have two processes like this, whose output channels are acting as input channels to a third process. Both channels will be independently random, so the third process must not expect them to retain a paired sequence. If it does assume that the first element in the first process output channel is related to the first element in the second process output channel, there will be a mismatch.
The output will look like this:

A common solution for this is to use what is commonly referred to as a _meta map_. A groovy object with sample information is passed out together with the file results within an output channel as a tuple. This can then be used to pair samples from separate channels together for downstream use. For example, instead of putting just `/some/path/myoutput.bam` into a channel, you could use `['SRR123', '/some/path/myoutput.bam']` to make sure the processes are not incurring into a mismatch. Check the example below:
```console title="Output"
[58/f117ed] FOO (4) [100%] 4 of 4, cached: 4 ✔
[84/e88fd9] BAR (4) [100%] 4 of 4, cached: 4 ✔
[6f/d3f672] FOOBAR (1) [100%] 4 of 4, cached: 2 ✔
D - c

A - d

C - a

B - b
```

A common solution for this is to use what is commonly referred to as a _meta map_. A groovy object with sample information is passed out together with the file results within an output channel as a tuple. This can then be used to pair samples from separate channels together for downstream use.

To illustrate, here is a change to the above workflow, with meta maps added:

```groovy linenums="1" title="snippet.nf"
// For example purposes only.
// These would normally be outputs from upstream processes.
Channel
.of(
[[id: 'sample_1'], '/path/to/sample_1.bam'],
[[id: 'sample_2'], '/path/to/sample_2.bam']
)
.set { bam }
process FOO {
input:
tuple val(meta), val(x)
// NB: sample_2 is now the first element, instead of sample_1
Channel
.of(
[[id: 'sample_2'], '/path/to/sample_2.bai'],
[[id: 'sample_1'], '/path/to/sample_1.bai']
output:
tuple val(meta), stdout
script:
"""
sleep \$((RANDOM % 3))
echo -n "$x"
"""
}
process BAR {
input:
tuple val(meta), val(x)
output:
tuple val(meta), stdout
script:
"""
echo -n "$x" | tr '[:upper:]' '[:lower:]'
"""
}
process FOOBAR {
input:
tuple val(meta), val(foo), val(bar)
output:
stdout
script:
"""
echo $foo - $bar
"""
}
workflow {
ch_letters = channel.of(
[[id: 'A'], 'A'],
[[id: 'B'], 'B'],
[[id: 'C'], 'C'],
[[id: 'D'], 'D']
)
.set { bai }
FOO(ch_letters)
BAR(ch_letters)
FOOBAR(FOO.out.join(BAR.out)).view()
}
```

Now, we define `ch_letters` with a meta map (e.g. `[id: 'A']`). Both FOO and BAR pass the `meta` through and attach it to their outputs. Then, in our call to FOOBAR we can use a `join` operation to ensure that only matched values are passed. Running this code provides us with matched processes, as we'd expect:

```console title="Output"
...
D - d

B - b

A - a

// Instead of feeding the downstream process with these two channels separately, you can
// join them and provide a single channel where the sample meta map is implicitly matched:
bam
.join(bai)
| PROCESS_C
C - c
```

If meta maps are not possible, an alternative is to use the [`fair`](https://nextflow.io/docs/edge/process.html#fair) process directive. When this directive is specified, Nextflow will guarantee that the order of outputs will match the order of inputs (not the order in which the tasks run, only the order of the output channel).
Expand Down
2 changes: 1 addition & 1 deletion docs/basic_training/channels.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ ch.view() // (1)!

!!! question "Exercise"

The script `snippet1.nf` contains the code from above. Execute it with Nextflow and view the output.
The script `snippet.nf` contains the code from above. Execute it with Nextflow and view the output.

??? solution

Expand Down
18 changes: 11 additions & 7 deletions docs/basic_training/containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,26 +85,26 @@ To exit from the container, stop the BASH session with the `exit` command.

### Your first Dockerfile

Docker images are created by using a so-called `Dockerfile`, a simple text file containing a list of commands to assemble and configure the image with the software packages required. For example, a Dockerfile to create a container with `cowsay` installed could be as simple as this:
Docker images are created by using a so-called `Dockerfile`, a simple text file containing a list of commands to assemble and configure the image with the software packages required. For example, a Dockerfile to create a container with `curl` installed could be as simple as this:

```dockerfile linenums="1" title="Dockerfile"
FROM debian:bullseye-slim

LABEL image.author.name "Your Name Here"
LABEL image.author.email "[email protected]"

RUN apt-get update && apt-get install -y curl cowsay
RUN apt-get update && apt-get install -y curl

ENV PATH=$PATH:/usr/games/
```

Once your Dockerfile is ready, you can build the image by using the `build` command. For example:

```bash
docker build -t <my-image> .
docker build -t my-image .
```

Where `<my-image>` is the user-specified name for the container image you plan to build.
Where `my-image` is the user-specified name for the container image you plan to build.

!!! tip

Expand Down Expand Up @@ -193,6 +193,10 @@ docker build -t my-image .

### Run Salmon in the container

!!! tip

If you didn't complete the steps above, use the 'rnaseq-nf' image used elsewhere in these materials by specifying `nextflow/rnaseq-nf` in place of `my-image` in the following examples.

You can run the software installed in the container by using the `run` command. For example, you can check that Salmon is running correctly in the container generated above by using the following command:

```bash
Expand Down Expand Up @@ -609,8 +613,8 @@ Contrary to other registries that will pull the latest image when no tag (versio
You can also install `galaxy-util-tools` and search for _mulled_ containers in your CLI. You'll find instructions below, using conda to install the tool.

```bash
conda activate a-conda-env-you-already-have
conda install galaxy-tool-util
conda create -n galaxy-tool-util -y galaxy-tool-util # Create a new environment with 'galaxy-tool-util' installed
conda activate galaxy-tool-util
mulled-search --destination quay singularity --channel bioconda --search bowtie samtools | grep mulled
```

Expand Down Expand Up @@ -670,7 +674,7 @@ Nextflow automatically sets up an environment for the given package names listed

!!! question "Exercise"

The tools `fastqc` and `salmon` are both available in BioContainers. Add the appropriate `container` directives to the `FASTQC` and `QUANTIFICATION` processes in `script5.nf` to use BioContainers instead of the container image you have been using in this training.
The tools `fastqc` and `salmon` are both available in Biocontainers (`biocontainers/fastqc:v0.11.5` and `quay.io/biocontainers/salmon:1.7.0--h84f40af_0`, respectively). Add the appropriate `container` directives to the `FASTQC` and `QUANTIFICATION` processes in `script5.nf` to use Seqera Containers instead of the container image you have been using in this training.

!!! tip "Hint"

Expand Down
Loading

0 comments on commit 6432f6e

Please sign in to comment.