Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate FOHM and GISAID uploads for mutant analyses #3887

Closed
1 task
beatrizsavinhas opened this issue Oct 24, 2024 · 4 comments
Closed
1 task

Automate FOHM and GISAID uploads for mutant analyses #3887

beatrizsavinhas opened this issue Oct 24, 2024 · 4 comments
Assignees

Comments

@beatrizsavinhas
Copy link
Contributor

beatrizsavinhas commented Oct 24, 2024

As a production user,
I want the upload of mutant results to FOHM and GISAID to be automatic,
So that it does not require running a command manually.

Work impact

Answer the following questions:

  • Is there currently a workaround for this issue? If so, what is it?
    • Manually running the command bellow (in Notes)
  • How much time would be saved by implementing this feature on a weekly basis?
    • About 5 minutes per command, which does not include the waiting.
  • How many users are affected by this issue?
    • ProdBioinfo
  • Are customers affected by this issue?
    • No, not directly.

Acceptance Criteria

  • Results from Mutant analyses that have passed the QC check are uploaded automatically to FOHM and GISAID

Notes

Command we currently run to do this upload:

cg upload fohm preprocess-all -c <case_id> -c <case_id> |& tee -a /home/proj/production/logs/GISAID-upload.log

Since the automation has its own log, /home/proj/production/logs/GISAID-upload.log should not be necessary anymore.

Related to #3659

@ChrOertlin
Copy link
Contributor

ChrOertlin commented Dec 12, 2024

Currently the pre-process all takes a manual input of a list of cases (see code below). To automate this we likely need a set of requirements and a flow on how to identify the cases that should go to FOHM and GSAID. So we can fetch them from the database.

also @karlnyr mentioned the FOHM upload should happen in a daily batch. Should the upload then be separate from the customer delivery?

So basically I need these questions answered:

  1. What are the requirements or characteristics of a mutant workflow case that is ready to be uploaded to FOHM / GSAID
  2. Is the FOHM / GSAID upload to be considered as a separate flow from the customer delivery.
@fohm.command("preprocess-all")
@OPTION_CASES
@DRY_RUN
@ARGUMENT_DATE
@click.pass_obj
def preprocess_all(
    context: CGConfig, cases: list, dry_run: bool = False, datestr: str | None = None
):
    """Create all FOHM upload files, upload to GISAID, sync SFTP and mail reports for all provided cases."""
    fohm_api = FOHMUploadAPI(
        config=context,
        dry_run=dry_run,
        datestr=datestr,
    )
    gisaid_api = GisaidAPI(config=context)
    cases = list(cases)
    upload_cases = []
    for case_id in cases:
        try:
            gisaid_api.upload(case_id=case_id)
            fohm_api.update_upload_started_at(case_id=case_id)
            LOG.info(f"Upload of case {case_id} to GISAID was successful")
            upload_cases.append(case_id)
        except Exception as error:
            LOG.error(
                f"Upload of case {case_id} to GISAID unsuccessful {error}, case {case_id} "
                f"will be removed from delivery batch"
            )
    try:
        fohm_api.aggregate_delivery(upload_cases)
    except ValidationError as error:
        LOG.warning(error)
    fohm_api.sync_files_sftp()
    fohm_api.send_mail_reports()
    for case_id in upload_cases:
        fohm_api.update_uploaded_at(case_id=case_id)
    LOG.info("Upload to FOHM completed")

@karlnyr
Copy link
Contributor

karlnyr commented Dec 12, 2024

Hey!

  1. The requirements for a case to be uploaded today is that it should pass the QC we perform on completed cases in Trailblazer. The only thing that signifies that a case has passed the QC is that the latest analysis of the case has a commend with the QC results and that we write the report file to hasta:

https://github.com/Clinical-Genomics/cg/blob/master/cg/meta/workflow/mutant/quality_controller/report_generator_utils.py#L13

This can of course be changed. For instance, if we actually set the uploaded at because we automate the delivery to our customers for cases that have passed the QC, then we could at the end of the day only gather together all analyses from today that have the workflow mutant and then deliver those. But that would then require some changes to the uploaded column of the analysis table for Sars-Cov-2 analyses.

  1. Yes, since we would like a case that is delivered to our customers asap it is completed vs the fact that the FOHM and GISAID have to happen once all cases are completed for that day.

@beatrizsavinhas
Copy link
Contributor Author

beatrizsavinhas commented Dec 13, 2024

Adding something that might be relevant:

  1. If a case passes QC, it is stored automatically so the existence of the analysis entry for the case already shows that QC passed (unless this was manually overridden).
    I do agree though that it sounds more logical to use the delivered_at date as criteria to fetch the cases for the day, when the upload is automated as well. A counter point to this is that, currently, it is the FOHM and GISAID upload command that sets this date. If we change it so that the upload to the customer sets it, we would have no distinction in the database for analyses uploaded to FOHM and GISAID and not.

@ChrOertlin
Copy link
Contributor

implemented in this PR #4028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants