Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialise ICAv2 Data Copy Manager #788

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

alexiswl
Copy link
Member

Centralised service for copying data around with ICAv2 Jobs, uses taskTokens for 'synced' events (inside AWS Step Functions).

For example an event might look like

{
  "EventBusName": "OrcaBusMain",
  "Source": "Whatever",
  "DetailType": "ICAv2DataCopySync",
  "Detail": {
    "payload": {
      "sourceUriList": [
        "icav2://project-id-or-name/path-to-data.txt",
        "icav2://project-id-or-name/path-to-folder/"
      ],
      "destinationUri": "icav2://project-id-or-name/path-to-destination/"
    },
    "taskToken": "your-task-token"
  }
}

Or

{
  "EventBusName": "OrcaBusMain",
  "Source": "Whatever",
  "DetailType": "ICAv2DataCopy",
  "Detail": {
    "payload": {
      "sourceUriList": [
        "icav2://project-id-or-name/path-to-data.txt",
        "icav2://project-id-or-name/path-to-folder/"
      ],
      "destinationUri": "icav2://project-id-or-name/path-to-destination/"
    }
  }
}

One can generate the Detail from the source with the following AWS StepFunctions Code

{
  "QueryLanguage": "JSONata",
  ...
  "States": {
    "Copy Files to Destination": {
      "Type": "Task",
      "Resource": "arn:aws:states:::events:putEvents.waitForTaskToken",
      "Arguments": {
        "Entries": [
          {
            "Detail": "{% $merge(\n  [\n    {\n      \"taskToken\": $states.context.Task.Token\n    }, \n    $copy_event_detail\n  ]\n) %}",
            "DetailType": "ICAv2DataCopy",
            "EventBusName": "OrcaBusMain",
            "Source": "Whatever"
          }
        ]
      },
      "Next": ...
    }
  }
}

@alexiswl alexiswl added the feature New feature label Dec 19, 2024
@alexiswl alexiswl self-assigned this Dec 19, 2024
@victorskl
Copy link
Member

Please on hold merge this PR until 0.3.0 release wrap up.

@victorskl victorskl removed the on hold label Jan 3, 2025
@victorskl
Copy link
Member

Depends on task token that implemented in #789

@alexiswl
Copy link
Member Author

Depends on task token that implemented in #789

Ah not quite, this one would have its own task token management system, #789 was for workflow events specifically

@victorskl
Copy link
Member

Similarly to this PR as well. ditto related conclusion comment here.

Please set PR Ready; once finalised. Thanks.

* SDK versions from 2.177 to 2.182
* Added vpc contexts to cdk.context.json
* Upgraded yarn version to 4.7.0
* Prefix prettier with yarn prettier (takes longer but prettier was breaking for me)
Updated toolkits (and typehints) for the following services:
* workflow manager
* metadata manager
* file manager
* fastq manager

Also updated the dynamodb partition table to use pointInTimeRecoverySpecification since pointInTimeRecovery parameter is deprecated
* Comprises fastq objects and linked fastq objects through fastq sets
* Interacts with filemanager to store ingest ids for fastq objects
* Runs QC analysis on samples with Sequali
* Runs fileCompression information allowing easy transition between ORA and GZIP
* Runs NTSM services for easy comparison within a library and for comparison between libraries
Introduced 'ResponseDict' concept to separate Response Classes from Response Dictionaries, model_dump() -> Self isn't always a viable solution. This also allows us to have dynamic logic in our file objects, either with s3 details or without s3 details.

* NTSM Evaluation fixes
Handle pagination for queries
TODO list
* Build stateful stack for this fastapi service
* Integrate API with step functions
Allows other services' step functions to 'hang' on a put event task until the said fastq set has become available.
Handle copy jobs now is one step function that then generates jobs and waits for those jobs to complete.
@alexiswl alexiswl force-pushed the feature/icav2-data-copy-service branch from 7e51bdb to 171e397 Compare March 18, 2025 07:30
@alexiswl
Copy link
Member Author

This one is already ready to be reviewed now

@alexiswl alexiswl marked this pull request as ready for review March 18, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants