Skip to content

Commit

Permalink
add raredisease config (#2725)
Browse files Browse the repository at this point in the history
* add raredisease config

* typo

* add samplesheet and pipeline parameters

* update config case function

* update config case function

* update config case function

* reach case_sample

* reach case_sample

* bug fixing

* fix samplesheet creation

* fix samplesheet creation

* fix samplesheet creation

* start plan for raredisease parameters

* fix linting

* use general tower binary path

* change to staticmethod and clean up sonarcloud issue

* start preparing test for cg workflow raredisease config-case

* comment out part of the test

* move option_from_start to common

* add config manipulation

* black

* fix path

* add on-the-fly parameters

* add on-the-fly parameters, fix typo

* fix PipelineParameters

* config: write to stdout

* config: write to stdout

* add newline

* add newline

* black

* black

* fix some code smells

* black

* keep using RnafusionParameters

* black

* change type

* update docstring

* uncomment tests

* adapt server fixture

* get parameters info in dry-run

* remove binary path from raredisease server fixture

* add outdir

* add all pipeline parameter to teh string

* fix issues

* black

* Update cg/cli/workflow/nf_analysis.py

Co-authored-by: Henrik Stranneheim <[email protected]>

* add config file extension and use

* add config file extension and use

* remove config read/write/concat that can be in txt io

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Henrik Stranneheim <[email protected]>

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Henrik Stranneheim <[email protected]>

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Henrik Stranneheim <[email protected]>

* rename function config_case in RarediseaseApi

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Henrik Stranneheim <[email protected]>

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Henrik Stranneheim <[email protected]>

* use named arguments

* types and named args

* use constant for empty string and double quote

* use PlinkPhenotypeStatus + update docstring

* use params instead of config, and overwrite nf-analysis in raredisease case, later to be moved in nf-analysis

* clarify docstring

* Update cg/models/raredisease/raredisease.py

Co-authored-by: Sebastian Diaz <[email protected]>

* use Sex class constants

* move getting the parental id to StatusDB API

* move Pipeline to Workflow

* create RarediseaseSampleSheetHeaders class, similar to Fluffy

* add test raredisease config case

* rename reformat function to clarify

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Sebastian Diaz <[email protected]>

* Update cg/meta/workflow/raredisease.py

Co-authored-by: Sebastian Diaz <[email protected]>

* update docstring

* samplesheet to sample sheet

* fix dependencies

* fix dependencies

* workflow

* workflow

* fix dependencies

* adapt concat

* black

* update dry-run message in tests

* black

* debugging

* debugging

* debugging

* debugging

* black

* black

* debugging

* add test for concat

* continue addind concat tests

* continue addind concat tests

* fix typo

* fix tests

* black

* add test for sample sheet creation

* add test for sample sheet creation

* test writing

* test for writing config file

* black

* test for writing config file

* type for raredisease config to str

* test for writing config file

* black

* black

* test for writing config file

* black

* test for writing config file

* test for writing config file

* black

* make reformat_sample_content a property

* fix error

* Update cg/io/config.py

Co-authored-by: ChristianOertlin <[email protected]>

* Update cg/meta/workflow/raredisease.py

Co-authored-by: ChristianOertlin <[email protected]>

* key to parameters

* Update cg/io/txt.py

Co-authored-by: ChristianOertlin <[email protected]>

* Update cg/store/models.py

Co-authored-by: ChristianOertlin <[email protected]>

* black

* parental id as property

* black

* paternal/maternal ids cannot be None, should return empty string

---------

Co-authored-by: Annick Renevey <[email protected]>
Co-authored-by: Henrik Stranneheim <[email protected]>
Co-authored-by: Sebastian Diaz <[email protected]>
Co-authored-by: ChristianOertlin <[email protected]>
  • Loading branch information
5 people authored Mar 4, 2024
1 parent 36cf6b6 commit dd59c1a
Show file tree
Hide file tree
Showing 28 changed files with 584 additions and 52 deletions.
7 changes: 7 additions & 0 deletions cg/cli/workflow/nf_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,13 @@
default=None,
help="NF-Tower ID of run to relaunch. If not provided the latest NF-Tower ID for a case will be used.",
)
OPTION_FROM_START = click.option(
"--from-start",
is_flag=True,
default=False,
show_default=True,
help="Start workflow from start without resuming execution",
)


@click.command("metrics-deliver")
Expand Down
21 changes: 20 additions & 1 deletion cg/cli/workflow/raredisease/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@
import logging

import click
from pydantic.v1 import ValidationError

from cg.cli.utils import echo_lines
from cg.cli.workflow.commands import ARGUMENT_CASE_ID, OPTION_DRY
from cg.constants.constants import MetaApis
from cg.constants.constants import DRY_RUN, MetaApis
from cg.meta.workflow.analysis import AnalysisAPI
from cg.meta.workflow.raredisease import RarediseaseAnalysisAPI
from cg.models.cg_config import CGConfig
from cg.exc import CgError


LOG = logging.getLogger(__name__)

Expand All @@ -22,6 +25,22 @@ def raredisease(context: click.Context) -> None:
context.obj.meta_apis[MetaApis.ANALYSIS_API] = RarediseaseAnalysisAPI(config=context.obj)


@raredisease.command("config-case")
@ARGUMENT_CASE_ID
@DRY_RUN
@click.pass_obj
def config_case(context: CGConfig, case_id: str, dry_run: bool) -> None:
"""Create sample sheet file and params file for a given case."""
analysis_api: RarediseaseAnalysisAPI = context.meta_apis[MetaApis.ANALYSIS_API]
LOG.info(f"Creating config files for {case_id}.")
try:
analysis_api.status_db.verify_case_exists(case_internal_id=case_id)
analysis_api.write_config_case(case_id=case_id, dry_run=dry_run)
except (CgError, ValidationError) as error:
LOG.error(f"Could not create config files for {case_id}: {error}")
raise click.Abort() from error


@raredisease.command("panel")
@OPTION_DRY
@ARGUMENT_CASE_ID
Expand Down
2 changes: 1 addition & 1 deletion cg/cli/workflow/rnafusion/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from cg.cli.workflow.nf_analysis import (
OPTION_COMPUTE_ENV,
OPTION_CONFIG,
OPTION_FROM_START,
OPTION_LOG,
OPTION_PARAMS_FILE,
OPTION_PROFILE,
Expand All @@ -22,7 +23,6 @@
report_deliver,
)
from cg.cli.workflow.rnafusion.options import (
OPTION_FROM_START,
OPTION_REFERENCES,
OPTION_STRANDEDNESS,
)
Expand Down
8 changes: 0 additions & 8 deletions cg/cli/workflow/rnafusion/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,6 @@

from cg.constants.constants import Strandedness

OPTION_FROM_START = click.option(
"--from-start",
is_flag=True,
default=False,
show_default=True,
help="Start workflow from start without resuming execution",
)

OPTION_STRANDEDNESS = click.option(
"--strandedness",
type=str,
Expand Down
5 changes: 4 additions & 1 deletion cg/cli/workflow/taxprofiler/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from cg.cli.workflow.nf_analysis import (
OPTION_COMPUTE_ENV,
OPTION_CONFIG,
OPTION_FROM_START,
OPTION_LOG,
OPTION_PARAMS_FILE,
OPTION_PROFILE,
Expand All @@ -19,7 +20,9 @@
metrics_deliver,
report_deliver,
)
from cg.cli.workflow.taxprofiler.options import OPTION_FROM_START, OPTION_INSTRUMENT_PLATFORM
from cg.cli.workflow.taxprofiler.options import (
OPTION_INSTRUMENT_PLATFORM,
)
from cg.constants import EXIT_FAIL, EXIT_SUCCESS
from cg.constants.constants import DRY_RUN, CaseActions, MetaApis
from cg.constants.nf_analysis import NfTowerStatus
Expand Down
8 changes: 0 additions & 8 deletions cg/cli/workflow/taxprofiler/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,6 @@

from cg.constants.sequencing import SequencingPlatform

OPTION_FROM_START = click.option(
"--from-start",
is_flag=True,
default=False,
show_default=True,
help="Start workflow from the start",
)

OPTION_INSTRUMENT_PLATFORM = click.option(
"--instrument-platform",
show_default=True,
Expand Down
1 change: 1 addition & 0 deletions cg/constants/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ class HastaSlurmPartitions(StrEnum):
class FileExtensions(StrEnum):
BED: str = ".bed"
COMPLETE: str = ".complete"
CONFIG: str = ".config"
CRAM: str = ".cram"
CSV: str = ".csv"
FASTQ: str = ".fastq"
Expand Down
17 changes: 17 additions & 0 deletions cg/io/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Module to read or write config files"""

from pathlib import Path
from typing import Any
from cg.constants.symbols import EMPTY_STRING


def write_config_nextflow_style(content: dict[str, Any] | None) -> str:
"""Write content to stream accepted by Nextflow config files with non-quoted booleans and quoted strings."""
string: str = EMPTY_STRING
double_quotes: str = '"'
for parameter, value in content.items():
if isinstance(value, Path):
value: str = value.as_posix()
quotes = double_quotes if type(value) is str else EMPTY_STRING
string += f"params.{parameter} = {quotes}{value}{quotes}\n"
return string
17 changes: 16 additions & 1 deletion cg/io/txt.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""Module to read or write txt files."""

from pathlib import Path
from typing import Any
from typing import List, Optional
from cg.constants.symbols import EMPTY_STRING


def read_txt(file_path: Path, read_to_string: bool = False) -> list[str] | str:
Expand All @@ -19,3 +20,17 @@ def write_txt(content: list[str] | str, file_path: Path) -> None:
file.writelines(content)
else:
file.write(content)


def concat_txt(
file_paths: list[Path], target_file: Path, str_content: Optional[List[str]] = None
) -> None:
"""Concatenate files and eventual string content."""
content: str = EMPTY_STRING
if str_content:
for txt in str_content:
content += f"{txt}\n"
for file_path in file_paths:
file_content: str = read_txt(file_path, read_to_string=True)
content += f"{file_content}\n"
write_txt(content=content, file_path=target_file)
3 changes: 3 additions & 0 deletions cg/meta/workflow/nf_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@ def get_workdir_path(self, case_id: str, work_dir: Path | None = None) -> Path:
return work_dir.absolute()
return Path(self.get_case_path(case_id), NFX_WORK_DIR)

def set_cluster_options(self, case_id: str) -> str:
return f'process.clusterOptions = "-A {self.account} --qos={self.get_slurm_qos_for_case(case_id=case_id)}"\n'

@staticmethod
def extract_read_files(
metadata: list[FastqFileMeta], forward_read: bool = False, reverse_read: bool = False
Expand Down
136 changes: 136 additions & 0 deletions cg/meta/workflow/raredisease.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,26 @@
"""Module for Raredisease Analysis API."""

import logging
from typing import Any
from pathlib import Path

from cg.io.txt import concat_txt
from cg.io.config import write_config_nextflow_style
from cg.constants import GenePanelMasterList, Workflow
from cg.constants.constants import FileExtensions
from cg.constants.subject import PlinkPhenotypeStatus, PlinkSex
from cg.constants.gene_panel import GENOME_BUILD_37
from cg.meta.workflow.analysis import add_gene_panel_combo
from cg.meta.workflow.nf_analysis import NfAnalysisAPI
from cg.models.cg_config import CGConfig
from cg.models.fastq import FastqFileMeta
from cg.models.raredisease.raredisease import (
RarediseaseSampleSheetEntry,
RarediseaseSampleSheetHeaders,
)
from cg.models.nf_analysis import WorkflowParameters
from cg.store.models import Case, CaseSample


LOG = logging.getLogger(__name__)

Expand All @@ -22,6 +35,129 @@ def __init__(
workflow: Workflow = Workflow.RAREDISEASE,
):
super().__init__(config=config, workflow=workflow)
self.root_dir: str = config.raredisease.root
self.nfcore_workflow_path: str = config.raredisease.workflow_path
self.references: str = config.raredisease.references
self.profile: str = config.raredisease.profile
self.conda_env: str = config.raredisease.conda_env
self.conda_binary: str = config.raredisease.conda_binary
self.config_platform: str = config.raredisease.config_platform
self.config_params: str = config.raredisease.config_params
self.config_resources: str = config.raredisease.config_resources
self.tower_binary_path: str = config.tower_binary_path
self.tower_workflow: str = config.raredisease.tower_workflow
self.account: str = config.raredisease.slurm.account
self.compute_env: str = config.raredisease.compute_env
self.revision: str = config.raredisease.revision

def write_config_case(
self,
case_id: str,
dry_run: bool,
) -> None:
"""Create a parameter (.config) files and a Nextflow sample sheet input for Raredisease analysis."""
self.create_case_directory(case_id=case_id, dry_run=dry_run)
sample_sheet_content: list[list[Any]] = self.get_sample_sheet_content(case_id=case_id)
workflow_parameters: WorkflowParameters = self.get_workflow_parameters(case_id=case_id)
if dry_run:
LOG.info("Dry run: nextflow sample sheet and parameter file will not be written")
return
self.write_sample_sheet(
content=sample_sheet_content,
file_path=self.get_sample_sheet_path(case_id=case_id),
header=RarediseaseSampleSheetHeaders.headers(),
)
self.write_params_file(case_id=case_id, workflow_parameters=workflow_parameters.dict())

def get_sample_sheet_content_per_sample(
self, case: Case = "", case_sample: CaseSample = ""
) -> list[list[str]]:
"""Get sample sheet content per sample."""
sample_metadata: list[FastqFileMeta] = self.gather_file_metadata_for_sample(
case_sample.sample
)
fastq_forward_read_paths: list[str] = self.extract_read_files(
metadata=sample_metadata, forward_read=True
)
fastq_reverse_read_paths: list[str] = self.extract_read_files(
metadata=sample_metadata, reverse_read=True
)
sample_sheet_entry = RarediseaseSampleSheetEntry(
name=case_sample.sample.internal_id,
fastq_forward_read_paths=fastq_forward_read_paths,
fastq_reverse_read_paths=fastq_reverse_read_paths,
sex=self.get_sex_code(case_sample.sample.sex),
phenotype=self.get_phenotype_code(case_sample.status),
paternal_id=case_sample.get_paternal_sample_id,
maternal_id=case_sample.get_maternal_sample_id,
case_id=case.internal_id,
)
return sample_sheet_entry.reformat_sample_content

def get_sample_sheet_content(
self,
case_id: str,
) -> list[list[Any]]:
"""Return Raredisease nextflow sample sheet content for a case."""
case: Case = self.status_db.get_case_by_internal_id(internal_id=case_id)
sample_sheet_content = []
LOG.info("Getting sample sheet information")
LOG.info(f"Samples linked to case {case_id}: {len(case.links)}")
for link in case.links:
sample_sheet_content.extend(
self.get_sample_sheet_content_per_sample(case=case, case_sample=link)
)
return sample_sheet_content

def get_workflow_parameters(self, case_id: str) -> WorkflowParameters:
"""Return parameters."""
LOG.info("Getting parameters information")
return WorkflowParameters(
sample_sheet_path=self.get_sample_sheet_path(case_id=case_id),
outdir=self.get_case_path(case_id=case_id),
)

def get_params_file_path(self, case_id: str, params_file: Path | None = None) -> Path:
"""Return parameters file or a path where the default parameters file for a case id should be located."""
if params_file:
return params_file.absolute()
case_path: Path = self.get_case_path(case_id)
return Path(case_path, f"{case_id}_params_file{FileExtensions.CONFIG}")
# This function should be moved to nf-analysis to replace the current one when all nextflow pipelines are using the same config files approach

def write_params_file(self, case_id: str, workflow_parameters: dict) -> None:
"""Write params-file for analysis."""
LOG.debug("Writing parameters file")
config_files_list = [self.config_platform, self.config_params, self.config_resources]
extra_parameters_str = [
write_config_nextflow_style(workflow_parameters),
self.set_cluster_options(case_id=case_id),
]
concat_txt(
file_paths=config_files_list,
target_file=self.get_params_file_path(case_id=case_id),
str_content=extra_parameters_str,
)

@staticmethod
def get_phenotype_code(phenotype: str) -> int:
"""Return Raredisease phenotype code."""
LOG.debug("Translate phenotype to integer code")
try:
code = PlinkPhenotypeStatus[phenotype.upper()]
except KeyError:
raise ValueError(f"{phenotype} is not a valid phenotype")
return code

@staticmethod
def get_sex_code(sex: str) -> int:
"""Return Raredisease sex code."""
LOG.debug("Translate sex to integer code")
try:
code = PlinkSex[sex.upper()]
except KeyError:
raise ValueError(f"{sex} is not a valid sex")
return code

@property
def root(self) -> str:
Expand Down
8 changes: 6 additions & 2 deletions cg/models/cg_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,10 +168,14 @@ class MipConfig(BaseModel):
script: str


class RareDiseaseConfig(CommonAppConfig):
class RarediseaseConfig(CommonAppConfig):
binary_path: str | None = None
compute_env: str
conda_binary: str | None = None
conda_env: str
config_platform: str
config_params: str
config_resources: str
launch_directory: str
workflow_path: str
profile: str
Expand Down Expand Up @@ -339,7 +343,7 @@ class CGConfig(BaseModel):
mip_rd_dna: MipConfig = Field(None, alias="mip-rd-dna")
mip_rd_rna: MipConfig = Field(None, alias="mip-rd-rna")
mutant: MutantConfig = None
raredisease: RareDiseaseConfig = Field(None, alias="raredisease")
raredisease: RarediseaseConfig = Field(None, alias="raredisease")
rnafusion: RnafusionConfig = Field(None, alias="rnafusion")
taxprofiler: TaxprofilerConfig = Field(None, alias="taxprofiler")

Expand Down
10 changes: 5 additions & 5 deletions cg/models/nf_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@
from cg.exc import SampleSheetError


class PipelineParameters(BaseModel):
clusterOptions: str = Field(..., alias="cluster_options")
priority: str
class WorkflowParameters(BaseModel):
input: Path = Field(..., alias="sample_sheet_path")
outdir: Path = Field(..., alias="outdir")


class NextflowSampleSheetEntry(BaseModel):
"""Nextflow samplesheet model.
"""Nextflow sample sheet model.
Attributes:
name: sample name, corresponds to case_id
name: sample name, or case id
fastq_forward_read_paths: list of all fastq read1 file paths corresponding to sample
fastq_reverse_read_paths: list of all fastq read2 file paths corresponding to sample
"""
Expand Down
Loading

0 comments on commit dd59c1a

Please sign in to comment.