Skip to content

Commit

Permalink
adding the AHI indexing project
Browse files Browse the repository at this point in the history
  • Loading branch information
Ubuntu committed Sep 25, 2023
1 parent 4349ad2 commit 9890140
Show file tree
Hide file tree
Showing 68 changed files with 5,841 additions and 1 deletion.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,16 @@ AWS HealthImaging is a new HIPAA-eligible capability that enables healthcare pro

### [S3 StoreSCP](s3-storescp)

This AWS CDK project implements a DICOM [StoreSCP](https://dicom.nema.org/medical/dicom/current/output/html/part04.html#sect_B.2.2) listener capable of receiving DIMSE messages and storing the received SOP instances as DICOM Part10 files on Amazon S3. The listener is deployed as service on [AWS ECS Fargate](https://aws.amazon.com/fargate/). DICOM Part10 files stored on S3 can be then imported into [AWS HealthImaging](https://aws.amazon.com/healthimaging).
This AWS CDK project implements a DICOM [StoreSCP](https://dicom.nema.org/medical/dicom/current/output/html/part04.html#sect_B.2.2) listener capable of receiving DIMSE messages and storing the received SOP instances as DICOM Part10 files on Amazon S3. The listener is deployed as service on [AWS ECS Fargate](https://aws.amazon.com/fargate/). DICOM Part10 files stored on S3 can then be imported into [AWS HealthImaging](https://aws.amazon.com/healthimaging).

### [DICOM Ingestion From On-Prem to AWS HealthImaging](dicom-ingestion-to-s3-healthimaging/)

This [AWS CDK](https://aws.amazon.com/cdk/) project allows to host a DICOM Service to receive data via DICOM-DIMSE and ingest it to S3 and HealthImaging. The on-prem service is hosted as part of [AWS Greengrass IOT service](https://aws.amazon.com/greengrass/). The project also demonstrates how to profile DICOM data, index it into a database and manage a queue of import jobs into AWS HealthImaging.

### [AWS HealthImaging metadata index with RDBMS and Datalake (Athena)](metadata-index/)

This [AWS CDK](https://aws.amazon.com/cdk/) project allows the indexing of DICOM metadata as it is being imported in AWS HealthImaging. The metadata can be stored in a relational database (RDS MySQL) and/or a data lake (Amazon S3 with [AWS Athena](https://aws.amazon.com/athena/)), enabling with query and analytics capabilities.

## Validate/Verify

### [Pixel Data Verification](pixel-data-verification/)
Expand Down
23 changes: 23 additions & 0 deletions metadata-index/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

# AWS HealthImaging Indexing project

A project to index the metadata of DICOM studies stored in AWS HealthImaging to Aurora MYSQL, AWS S3 Datalake or AWS OpenSearch Service.


## Table of Contents

### Architecture
Follow this link for documentation about the solution architecture [Architecture](./doc/architecture/README.md)

### Solution deployment
Follow this link to know more about this project and how to deploy it in your AWS account: [Deployment](./doc/deployment/README.md)

### Data models
Each mode has a slightly different data model due to the target data store characterisitcs. The data model for each data store is described in the following sections:<br />
- [RDBMS data model and Lambda parser](./doc/data_models/rdbms/README.md)<br /><br />
- [Datalake data model and Lambda parser](./doc/data_models/datalake/README.md)<br /><br />
- [Opensearch data model and Lambda parser](./doc/data_models/opensearch/README.md) <- Not implemented yet.<br /><br />




7 changes: 7 additions & 0 deletions metadata-index/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## TODO LIST:

* Add user defined columns to the tables capability.
* Add a technical table for Tags at the series level.
* add a domain table.
* Evaluate how to add uris for specific tags in the DICOM tables. ( larges values)
* add column in study table with hash of the study instance uid.
Empty file.
Empty file.
32 changes: 32 additions & 0 deletions metadata-index/backend/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python3
'''
AHI-Index CDK App
Description: This CDK project creates the infrastructure for the AHI-Index application. It can e configured to deploy an Index on RDS Aurora MySQL or export the AHI metadata on S3 as a datalake.
License : MIT-0
'''
import os

import aws_cdk as cdk
from aws_cdk import Aspects
import config as config

from backend.backend import BackendStack
import cdk_nag
from cdk_nag import NagSuppressions , NagPackSuppression

app_name = config.CDK_APP_NAME
app = cdk.App()
env=cdk.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'), region=os.getenv('CDK_DEFAULT_REGION'))
backend_stack = BackendStack(app, app_name, config , env=env )

Aspects.of(app).add(cdk_nag.AwsSolutionsChecks())
NagSuppressions.add_stack_suppressions(backend_stack, suppressions=[
NagPackSuppression( id = 'AwsSolutions-IAM4' , reason ='Roles created by CDK constructs.'),
NagPackSuppression( id = 'AwsSolutions-IAM5' , reason ='Access to getImageSetMetadata at datastore level does not provide any privileges but is necessary to allow privelege at lower level of ImageSet within the datastore.'),
NagPackSuppression( id = 'AwsSolutions-SMG4' , reason ='Password rotation not required.'),
NagPackSuppression( id = 'AwsSolutions-RDS11' , reason ='Default port is preferred. Access is secured by security group.'),
NagPackSuppression( id = 'AwsSolutions-RDS16' , reason ='auditing disabled.')])
app.synth()


137 changes: 137 additions & 0 deletions metadata-index/backend/backend/backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0
"""

import json
import aws_cdk as cdk
from aws_cdk import (
aws_kms as kms,
aws_iam as iam,
aws_lambda as _lambda,
aws_s3_notifications as s3n,
aws_s3 as s3,
aws_sns as sns,
aws_ec2 as ec2,
Stack,
aws_secretsmanager as secretsmanager,
aws_logs as logs
)
from aws_cdk import SecretValue
from constructs import Construct
from .function import PythonLambda
from .network import Vpc
from .security_groups import SecurityGroups
from .lambda_roles import LambdaRoles
from .custom import CustomLambdaResource
from .database import AuroraServerlessDB
from .glue import GlueDatabase



class BackendStack(Stack):

def __init__(self, scope: Construct, construct_id: str, config: dict, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
stack_name = self.stack_name.lower()
#Read lambdas relted configs and create the lambda role
lambda_config = config.LAMBDA_CONFIG

# Get the VPC ID from stack if specified, otherwise creates a new one.
if (config.VPC["USE_VPC"] == True):
if (config.VPC["EXISTING_VPC_ID"] != ""):
vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id=config.VPC["EXISTING_VPC_ID"])
vpc_cidr = config.VPC["NEW_VPC_CIDR"]
vpc_construct = Vpc(self, "Network", vpc_cidr)
vpc = vpc_construct.getVpc()
else:
vpc=None
# Create Security groups
sec_groups = SecurityGroups(self, "Security Groups", vpc=vpc)

lambda_config = config.LAMBDA_CONFIG
ahi_datastore_arn = config.AHI_DATASTORE_ARN


sns_key = kms.Key(self, "sns-topic",enable_key_rotation=True)
sns_topic = sns.Topic(self, "ahi-to-index-topic", display_name=stack_name+"ahi-to-index-topic" , master_key=sns_key )
ahi_output_bucket = s3.Bucket.from_bucket_attributes(self, "ImportedBucket",bucket_arn=config.AHI_IMPORT_OUPUT_BUCKET_ARN)
ahi_output_bucket.add_event_notification(s3.EventType.OBJECT_CREATED, s3n.SnsDestination(sns_topic) , s3.NotificationKeyFilter(suffix='job-output-manifest.json'))
sns_key.grant_encrypt_decrypt(iam.ServicePrincipal("s3.amazonaws.com"))
sns_key.grant_encrypt_decrypt(iam.ServicePrincipal("lambda.amazonaws.com"))

if config.RDBMS_CONFIG["enabled"] == True:
#Create the database
aurora_security_group = sec_groups.getAuroraSecGroup()
db_min_acu_capacity = config.RDBMS_CONFIG["min_acu_capacity"]
db_max_acu_capacity = config.RDBMS_CONFIG["max_acu_capacity"]
db_name = config.RDBMS_CONFIG["db_name"]
db = AuroraServerlessDB(self,"ahi-to-rdbms-Aurora-DB", vpc=vpc, db_name=db_name, aurora_security_group=aurora_security_group , min_acu_capacity=db_min_acu_capacity , max_acu_capacity=db_max_acu_capacity )
db_secret_arn = db.getDbCluster().secret.secret_arn

db_user_secret = secretsmanager.Secret(self, "Secret", secret_object_value={
"username": SecretValue.unsafe_plain_text("ahi_parser"),
"host": SecretValue.unsafe_plain_text(db.getDbCluster().cluster_endpoint.hostname ) ,
"dbname": SecretValue.unsafe_plain_text(db_name),
},
secret_name=stack_name+"-ahi-db-user-secret")




#MySql DBInit Lambda creation.
db_init_role = LambdaRoles(self, 'ahi-to-rdbms-db-init-lambdarole', db_secret_arn=db_secret_arn , )
fn_db_init = PythonLambda(self, "ahi-to-rdbms-db-Init", lambda_config["DbInit"], db_init_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_db_init.getFn().add_environment(key="DB_SECRET", value=db_secret_arn)

#Deploy the database schema
iep_schema = CustomLambdaResource(self, "db-schema", fn_db_init.getFn())
iep_schema.node.add_dependency(db.getDbCluster())

rdbms_lambda_role = LambdaRoles(self, 'ahi-to-rdbms-lambdarole', db_secret_arn=db_user_secret.secret_arn , datastore_arn=ahi_datastore_arn, database_resource_id=db.getDbCluster().cluster_resource_identifier )
fn_ahi_to_rdbms = PythonLambda(self, "ahi-to-rdbms", lambda_config["AHItoRDBMS"], rdbms_lambda_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_ahi_to_rdbms.getFn().add_environment(key="DB_SECRET", value=db_user_secret.secret_arn )
fn_ahi_to_rdbms.getFn().add_environment(key="POPULATE_INSTANCE_LEVEL", value=str(config.RDBMS_CONFIG["populate_instance_level"]))
fn_ahi_to_rdbms.getFn().add_environment(key="POPULATE_FRAME_LEVEL", value=str(config.RDBMS_CONFIG["populate_frame_level"]))
fn_ahi_to_rdbms.getFn().add_environment(key="AHLI_ENDPOINT", value="") #T08/27/2023 - jpleger : This is a workaround for the medical-imaging service descriptor, not nice... Will fix soon.
ahi_output_bucket.grant_read(fn_ahi_to_rdbms.getFn())



fn_ahi_to_rdbms.getFn().add_permission("ahi-to-rdbms-sllow-sns", principal=iam.ServicePrincipal("sns.amazonaws.com"), action="lambda:InvokeFunction")
sns.Subscription(self, "ahi-to-rdbms-sns-subscription",topic=sns_topic,endpoint=fn_ahi_to_rdbms.getFn().function_arn ,protocol=sns.SubscriptionProtocol.LAMBDA)

if config.OPENSEARCH_CONFIG["enabled"] == True:
opensearch_lambda_role = LambdaRoles(self, 'ahi-to-rdbms-init-lambdarole', db_secret_arn=db_secret_arn , datastore_arn=ahi_datastore_arn )
fn_ahi_to_opensearch = PythonLambda(self, "ahi-to-opensearch", lambda_config["AHItoOpenSearch"], opensearch_lambda_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_ahi_to_opensearch.getFn().add_environment(key="DOMAIN_ENDPOINT", value="")
ahi_output_bucket.grant_read(fn_ahi_to_datalake.getFn())

fn_ahi_to_opensearch.getFn().add_permission("ahi-to-opensearch-allow-sns", principal=iam.ServicePrincipal("sns.amazonaws.com"), action="lambda:InvokeFunction")
sns.Subscription(self, "ahi-to-opensearch-sns-subscription",topic=sns_topic,endpoint=fn_ahi_to_opensearch.getFn().function_arn,protocol=sns.SubscriptionProtocol.LAMBDA)

if config.DATALAKE_CONFIG["enabled"] == True:
if config.DATALAKE_CONFIG["destination_bucket_name"] == "":
bucket_name=None
else:
bucket_name=config.DESTINATION_BUCKET_NAME
datalake_lambda_role = LambdaRoles(self, 'ahi-to-datalake-lambdarole', datastore_arn=ahi_datastore_arn)
access_log_lambda_role = iam.Role(self, "Role",assumed_by=iam.ServicePrincipal("logging.s3.amazonaws.com"),description="Grants S3 service to put access logs.")
access_log_bucket = s3.Bucket(self, "ahi-to-datalake-access_log-bucket", bucket_name=None, block_public_access=s3.BlockPublicAccess.BLOCK_ALL, removal_policy=cdk.RemovalPolicy.RETAIN , enforce_ssl=True , encryption=s3.BucketEncryption.S3_MANAGED)
access_log_bucket.grant_put(access_log_lambda_role)
destination_bucket = s3.Bucket(self, "ahi-to-datalake-destination-bucket", bucket_name=bucket_name, block_public_access=s3.BlockPublicAccess.BLOCK_ALL, removal_policy=cdk.RemovalPolicy.RETAIN , enforce_ssl=True , encryption=s3.BucketEncryption.S3_MANAGED , server_access_logs_prefix="access-logs/" , server_access_logs_bucket=access_log_bucket )
fn_ahi_to_datalake = PythonLambda(self, "ahi-to-datalake", lambda_config["AHItoDatalake"], datalake_lambda_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_ahi_to_datalake.getFn().add_environment(key="DESTINATION_BUCKET", value=destination_bucket.bucket_name)
fn_ahi_to_datalake.getFn().add_environment(key="POPULATE_INSTANCE_LEVEL", value=str(config.DATALAKE_CONFIG["populate_instance_level"]))
fn_ahi_to_datalake.getFn().add_environment(key="AHLI_ENDPOINT", value="") #08/27/2023 - jpleger : This is a workaround for the medical-imaging service descriptor, not nice... Will fix soon.
destination_bucket.grant_read_write(fn_ahi_to_datalake.getFn())
ahi_output_bucket.grant_read(fn_ahi_to_datalake.getFn())
if config.DATALAKE_CONFIG["deploy_glue_default_config"] == True:
GlueDatabase(self, "ahi-datalake-db" , datalake_bucket=destination_bucket , stack_name=stack_name)



fn_ahi_to_datalake.getFn().add_permission("ahi-to-datalake-allows-sns", principal=iam.ServicePrincipal("sns.amazonaws.com"), action="lambda:InvokeFunction")
sns.Subscription(self, "ahi-to-datalke-sns-subscription",topic=sns_topic,endpoint=fn_ahi_to_datalake.getFn().function_arn ,protocol=sns.SubscriptionProtocol.LAMBDA)


21 changes: 21 additions & 0 deletions metadata-index/backend/backend/custom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0
"""

from constructs import Construct
from aws_cdk import CustomResource, custom_resources as cr, aws_logs as logs


class CustomLambdaResource(Construct):
def __init__(self, scope: Construct, id: str, lambda_handler, cr_properties={}, **kwargs) -> None:
super().__init__(scope, id, **kwargs)

cr_provider = cr.Provider(
self, "CustomLambdaResourceProvider", on_event_handler=lambda_handler, log_retention=logs.RetentionDays.THREE_DAYS
)

self.cr = CustomResource(
self, "CustomLambdaResource", service_token=cr_provider.service_token, properties=cr_properties,
)

53 changes: 53 additions & 0 deletions metadata-index/backend/backend/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0
Creates Aurora Serverless database for the application.
"""

from constructs import Construct
from aws_cdk import (
Duration,
RemovalPolicy,
aws_rds as rds,
aws_ec2 as ec2,
Stack,
)


class AuroraServerlessDB(Construct):

def __init__(self, scope: Construct, id: str, vpc: ec2.Vpc, db_name: str, aurora_security_group: ec2.SecurityGroup, min_acu_capacity: int, max_acu_capacity: int, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
stack_name = Stack.of(self).stack_name.lower()
self._subnetGroup = rds.SubnetGroup(self, "ahi-index-Aurora-Subnet-Group", vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS), description="ahi index Aurora DB Subnet Group")
self._db_adminpassword = rds.Credentials.from_generated_secret(username="admin")

self._dbCluster = rds.DatabaseCluster(
self,
"ahi-index-DBCluster",
instances=1,
engine=rds.DatabaseClusterEngine.aurora_mysql( version=rds.AuroraMysqlEngineVersion.of('8.0.mysql_aurora.3.04.0')),
parameter_group=rds.ParameterGroup.from_parameter_group_name(self, "ahi-index-db-cluster-ParameterGroup", parameter_group_name="default.aurora-mysql8.0"),
cluster_identifier=stack_name+"-ahi-index-db-cluster",
default_database_name=db_name,
security_groups=[aurora_security_group,],
credentials=self._db_adminpassword,
subnet_group=self._subnetGroup,
deletion_protection=True,
removal_policy=RemovalPolicy.SNAPSHOT,
storage_encrypted=True,
iam_authentication=True,
backtrack_window=Duration.hours(24),
instance_props=rds.InstanceProps(
vpc=vpc,
instance_type=ec2.InstanceType("Serverless"),
publicly_accessible=False,
)
)
self._dbCluster.node.default_child.add_property_override('ServerlessV2ScalingConfiguration', {"MinCapacity": min_acu_capacity, "MaxCapacity": max_acu_capacity})
#self._dbCluster.add_rotation_single_user(exclude_characters="\"@/\\" , vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS))

def getDbCluster(self) -> rds.DatabaseCluster:
return self._dbCluster

Loading

0 comments on commit 9890140

Please sign in to comment.