Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homogenise tags and better define tag usage #108

Open
ChrOertlin opened this issue Apr 15, 2024 · 4 comments
Open

Homogenise tags and better define tag usage #108

ChrOertlin opened this issue Apr 15, 2024 · 4 comments
Labels
Effort L Effort large Gain S Gain small Urgency S Urgency small

Comments

@ChrOertlin
Copy link
Contributor

ChrOertlin commented Apr 15, 2024

Description

A question on slack brought up a discussion on usage of tags in housekeeper.

The question was whether to add samplesheet as a tag for samplesheets used in workflows. However, curretnyl samplesheet is "reserved" or "limited to" flow-cell samplesheets. Currently the solution is to add a new tag using: nextflow-samplesheet.

Basically we are creating a new tag consisting of two tags, which to me seems counterintuitive. Ideally these should be two tags.
nextflow and samplesheet. Furthermore, this pattern of tag-tag seems to exist for files like nextflow-config.

To do

Discuss the design patterns of tags, decide what to do, document and implement decision.

Some other points brought up

Especially with upcoming new technologies in production that possibly also require samplesheet´s (pacbio, ONT, Sephyr) and other files we likely need to introduce new tags and retrospectively alter illumina tags.

Example of inefficient tag usage / construction

The VariantTags in hermes - althought here might be some additional step involved that I do not fully understand, yet.
image

@diitaz93
Copy link

diitaz93 commented Apr 17, 2024

I think for the upcoming technologies makes sense to do ONT samplesheet and illumina samplesheet, but I think the workflow sample sheet is completely different from the sequencing sample sheet. It is just unfortunate that they have the same name. I think that putting the sequencing sample sheet and the workflow sample sheet under the same tag will mix up irreversibly these two file types.

@ChrOertlin
Copy link
Contributor Author

I would argue that they while their purpose are different, both the samplesheet of the flow cells as well as the workflows are samplesheets. It is just a file that contains samples and sample metadata that is consumed in one way or another.

@ChrOertlin
Copy link
Contributor Author

ChrOertlin commented Apr 17, 2024

Points

  1. Workflows are using different approaches to identify files. Some use 'vcf', 'index' others 'vcf-index' to fetch similar files.
  2. Splitting tags has an upfront cost of knowing what tags to add
  3. Making unique tags is easer, however can lead to problem in point 1.

Decision

  1. Investigate tag usage of "not yet in production' workflows. (short term, @ivadym , @ChrOertlin )
  2. Investigate workflow files e.g. balsamic and identify if multiple more general tags are adequate enough to uniquely describe specific files (short term, @ivadym , @ChrOertlin )
  3. From 1. Setup a framework, done by either pipeline developers or system development that describes tags used for files. (short term, @ivadym , @ChrOertlin )
  4. Make the framework available to other so that it can be easily found and understood
  5. Implement future tags
  6. refactor past (long term)
  7. Setup Project

@henrikstranneheim
Copy link
Contributor

henrikstranneheim commented Apr 18, 2024

If we use StrEnum and go for many tags we can do:

class Prerequisite (StrEnum)
  CONFIG = auto()
  SAMPLESHEET = auto()

which is more MERRy (Maintainable, Extendable, Readable and Robust) and pythonic.

@ivadym ivadym transferred this issue from Clinical-Genomics/housekeeper Apr 18, 2024
@ivadym ivadym added Effort L Effort large Gain S Gain small Urgency S Urgency small labels Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Effort L Effort large Gain S Gain small Urgency S Urgency small
Projects
None yet
Development

No branches or pull requests

4 participants