Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raredisease configurator - Create services for file creation #4260

Open
wants to merge 39 commits into
base: dev-start-pipelines
Choose a base branch
from

Conversation

diitaz93
Copy link
Contributor

@diitaz93 diitaz93 commented Mar 3, 2025

Description

Alternate version of #4255

Need to solve: The Nextflow configurators need to create the following files:

  1. Sample sheet
  2. Config file
  3. Params file

Some pipelines need to create more files.

The process of file creation follows a common structure:

  1. Define a Path
  2. Create content
  3. Write content to a file in the path

The content creation is specific for each pipeline and for each file type.

The file creation process was defined in the 3 steps mentioned above. In its abstract form, content creation (step 2) requires a class called FileContentCreator, which is implemented for every pipeline and file type.

Added

Changed

Fixed

How to prepare for test

  • Ssh to relevant server (depending on type of change)
  • Use stage: us
  • Paxa the environment: paxa
  • Install on stage (example for Hasta):
    bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_cg -t cg -b [THIS-BRANCH-NAME] -a

How to test

  • Do ...

Expected test outcome

  • Check that ...
  • Take a screenshot and attach or copy/paste the output.

Review

  • Tests executed by
  • "Merge and deploy" approved by
    Thanks for filling in who performed the code review and the test!

This version is a

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

Implementation Plan

  • Document in ...
  • Deploy this branch on ...
  • Inform to ...

Copy link
Contributor

@islean islean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue looking after lunch!

def __init__(self, store: Store, lims: LimsAPI, params: str):
self.store = store
self.lims = lims
self.params = params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "params"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a path to a file in hasta. It is pipeline-specific. We get it from the RareDiseaseConfig class in the CGConfig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR and probably not something we can change without cooperation from biodev, but it would be nice to have it type hinted as a Path and a name called parameters_file or something.

@diitaz93 diitaz93 marked this pull request as ready for review March 7, 2025 12:57
@diitaz93 diitaz93 requested a review from a team as a code owner March 7, 2025 12:57
Copy link

sonarqubecloud bot commented Mar 7, 2025



class PipelineExtension:
def configure(self, case_id: str, case_path: Path):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make an effort to give this a better name



class RarediseaseExtension(PipelineExtension):
"""Configurator for Raredisease analysis."""
Copy link
Contributor Author

@diitaz93 diitaz93 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Configurator for Raredisease analysis."""
"""Class holding all additional actions needed to create a RarediseaseCaseConfig not specified in the NextflowConfigurator."""


@staticmethod
def get_file_path(case_id: str, case_path: Path) -> Path:
"""Return the path to the nextflow config file."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Return the path to the nextflow config file."""
"""Return the path to the Nextflow config file."""

return Path(case_path, f"{case_id}_nextflow_config").with_suffix(FileExtensions.JSON)

def create(self, case_id: str, case_path: Path) -> None:
"""Create the nextflow config file for a case."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Create the nextflow config file for a case."""
"""Create the Nextflow config file for a case."""

content: str = self._get_content(case_path=case_path)
write_json(file_path=file_path, content=content)

def _get_content(self, case_path: Path) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_content(self, case_path: Path) -> str:
def _get_content(self, case_id: str, case_path: Path) -> str:

Comment on lines +54 to +55
Add gene panels combinations for gene panels being part of gene panel combination and
return updated gene panels.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

Comment on lines +20 to +25
@staticmethod
def get_file_path(case_path: Path) -> Path:
return Path(case_path, "gene_panels").with_suffix(FileExtensions.BED)

def create(self, case_path: Path) -> None:
file_path: Path = self.get_file_path(case_path=case_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This static method is not used anywhere else, we could remove it

Suggested change
@staticmethod
def get_file_path(case_path: Path) -> Path:
return Path(case_path, "gene_panels").with_suffix(FileExtensions.BED)
def create(self, case_path: Path) -> None:
file_path: Path = self.get_file_path(case_path=case_path)
def create(self, case_path: Path) -> None:
file_path= Path(case_path, "gene_panels").with_suffix(FileExtensions.BED)

def get_file_path(case_id: str, case_path: Path) -> Path:
pass

@abstractmethod
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add _get_content method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with our AI overlords and private methods are very rarely a part of an interface
image

Copy link
Contributor

@islean islean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice works with the tests!

self.account = account

@staticmethod
def get_file_path(case_id: str, case_path: Path) -> Path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get rid of passing both case_id and case_path one could consider setting a root path in servers and having two methods, get_case_path = Path(self.root_path, case_id) and get_file_path=Path(self.root_path, case_id, <file_name>.

vep_filters_scout_fmt=f"{case_path}/{ScoutExportFileName.PANELS}",
)

def _get_data_analysis_type(self, case_id: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this method should be in the store instead?

Comment on lines +79 to +81
"""
Return the target bed file from LIMS or use default capture kit for WHOLE_GENOME_SEQUENCING.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Return the target bed file from LIMS or use default capture kit for WHOLE_GENOME_SEQUENCING.
"""
"""
Return the target bed file from LIMS or use default capture kit for WHOLE_GENOME_SEQUENCING.
Raises:
ValueError if not capture kit can be assigned to the case.
"""

maternal_id=case_sample.get_maternal_sample_id,
case_id=case_sample.case.internal_id,
)
return sample_sheet_entry.reformat_sample_content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cryptic return statement to me. What does reformat sample content do?

Comment on lines +121 to +124
if forward_read and not reverse_read:
read_direction = 1
elif reverse_read and not forward_read:
read_direction = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but surely we should only have one boolean as input for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

return case_path.name


def get_genome_build(workflow: Workflow) -> GenePanelGenomeBuild:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider having this as a bare dict

from cg.constants.gene_panel import GenePanelGenomeBuild


def get_case_id_from_path(case_path: Path) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not anymore

Comment on lines +83 to +88
def _get_case_priority(self, case_id: str) -> SlurmQos:
"""Get case priority."""
case: Case = self.store.get_case_by_internal_id(case_id)
return SlurmQos(case.slurm_priority)

def _get_case_workflow(self, case_id: str) -> Workflow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel like we should have these methods in this class. They seem like store methods to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion for nor against

@@ -2962,6 +2942,7 @@ def raredisease_deliverables_file_path(raredisease_dir, raredisease_case_id) ->
).with_suffix(FileExtensions.YAML)


# TODO: Take a look at this for the tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be resolved in this PR?


@pytest.fixture
def raredisease_extension() -> PipelineExtension:
return PipelineExtension()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do a small PR and test the actual RarediseaseExtension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

write_yaml_nextflow_style(file_path=file_path, content=content)

def _get_content(self, case_id: str, case_path: Path, sample_sheet_path: Path) -> dict:
"""Create parameters file for a case."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Create parameters file for a case."""
"""Return the content of a params file for a case."""

case_workflow_parameters: dict = self._get_case_parameters(
case_id=case_id, case_path=case_path, sample_sheet_path=sample_sheet_path
).model_dump()
workflow_parameters: any = read_yaml(self.params)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix this type hint

import re


def replace_values_in_params_file(workflow_parameters: dict) -> dict:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having an inheritance and utils module may be redundant. We could include these functions in the parent class and get rid of this module


class RarediseaseSampleSheetEntry(NextflowSampleSheetEntry):
"""Raredisease sample model is used when building the sample sheet."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up here, I removed a sex attribute that was a str

from cg.store.models import Case, CaseSample, Sample
from cg.store.store import Store

LOG = logging.getLogger(__name__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove LOG

def create(self, case_id: str, case_path: Path) -> None:
"""Create the sample sheet for a case."""
file_path: Path = self.get_file_path(case_id=case_id, case_path=case_path)
content: any = self._get_content(case_id=case_id, case_path=case_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
content: any = self._get_content(case_id=case_id, case_path=case_path)
content: list[list[str]] = self._get_content(case_id=case_id, case_path=case_path)

content: any = self._get_content(case_id=case_id, case_path=case_path)
write_csv(file_path=file_path, content=content)

def _get_content(self, case_id: str, case_path: Path) -> list[list[str]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_content(self, case_id: str, case_path: Path) -> list[list[str]]:
def _get_content(self, case_id: str) -> list[list[str]]:


def __init__(
self,
config: CommonAppConfig,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
config: CommonAppConfig,
pipeline_config: CommonAppConfig,

params_file_creator: ParamsFileCreator,
pipeline_extension: PipelineExtension = PipelineExtension(),
):
self.root_dir: str = config.root
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can just inject the root dir instead of the whole pipeline config

self.params_file_creator = params_file_creator

def create_config(self, case_id: str) -> NextflowCaseConfig:
"""Create a Nextflow case config."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve docstring with a numbered list of the steps involved in the configuration. It is not just "creating a config", it involves much more



@pytest.fixture
def expected_raredisease_params_file_content(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in the wrong file

"configurator_fixture, case_id_fixture",
[("raredisease_configurator", "raredisease_case_id")],
"configurator_fixture, case_id_fixture, case_path_fixture",
[("raredisease_configurator", "raredisease_case_id", "raredisease_case_path")],
ids=["raredisease"],
)
def test_create_nextflow_config_file_exists(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an integration test that checks that all required files were created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants