Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add-metadata-index-v2-from-wangrob #71

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
0fa6515
add-metadata-index-v2-from-wangrob
wangrob Oct 16, 2024
24a6990
reset AHI_DATASTORE_ARN to parametrized value
wangrob Oct 16, 2024
fc18660
add cdk.context.json changes to deployment doc
wangrob Oct 20, 2024
492c297
correct section name for cdk.context.json doc
wangrob Oct 20, 2024
d076094
add step for cdk bootstrap
wangrob Oct 20, 2024
15b09bc
remove prompt so command could be copied and pasted in terminal
wangrob Oct 20, 2024
6f9fe45
add code scanners
wangrob Oct 31, 2024
d2e82d1
create virtual env in project root to make code scan cleaner
wangrob Oct 31, 2024
946ba12
fully spell out Python virtual environment
wangrob Oct 31, 2024
eb456a8
replace try-except-pass and skip unneeded bandit checks
Nov 1, 2024
6925f71
remove semgrep b/c it's dup to bandit
Nov 1, 2024
6fe7301
migrate blog post solution architecture content to repo doc page
wangrob Nov 11, 2024
3347fcd
fix formatting errors and typos in solution architecture doumentation
wangrob Nov 11, 2024
ff1c7ca
migrate solution deployment, testing, and clean up instructions to do…
wangrob Nov 11, 2024
fdf2c9f
add images to testing instructions
wangrob Nov 11, 2024
e9ac6a0
revise clean up instructinos for clarity
wangrob Nov 11, 2024
54d7741
back up old version and move v2 to current version
wangrob Dec 26, 2024
21d4b35
replace metadata-index-v2 with metadata-index
wangrob Dec 26, 2024
f10f2a1
Merge branch 'aws-samples:main' into add-metadata-index-v2-from-wangrob
wangrob Dec 26, 2024
c834f73
remove v2 from project title
wangrob Dec 26, 2024
0159877
merge in new SQL optimizations from jpleger
wangrob Jan 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions metadata-index-old/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

# AWS HealthImaging Indexing project

A project to index the metadata of DICOM studies stored in AWS HealthImaging to Aurora MYSQL, AWS S3 Datalake or AWS OpenSearch Service.


## Table of Contents

### Architecture
Follow this link for documentation about the solution architecture [Architecture](./doc/architecture/README.md)

### Solution deployment
Follow this link to know more about this project and how to deploy it in your AWS account: [Deployment](./doc/deployment/README.md)

### Data models
Each mode has a slightly different data model due to the target data store characterisitcs. The data model for each data store is described in the following sections:<br />
- [RDBMS data model and Lambda parser](./doc/data_models/rdbms/README.md)<br /><br />
- [Datalake data model and Lambda parser](./doc/data_models/datalake/README.md)<br /><br />
- [Opensearch data model and Lambda parser](./doc/data_models/opensearch/README.md) <- Not implemented yet.<br /><br />




7 changes: 7 additions & 0 deletions metadata-index-old/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## TODO LIST:

* Add user defined columns to the tables capability.
* Add a technical table for Tags at the series level.
* add a domain table.
* Evaluate how to add uris for specific tags in the DICOM tables. ( larges values)
* add column in study table with hash of the study instance uid.
Empty file.
Empty file.
32 changes: 32 additions & 0 deletions metadata-index-old/backend/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/usr/bin/env python3
'''
AHI-Index CDK App

Description: This CDK project creates the infrastructure for the AHI-Index application. It can e configured to deploy an Index on RDS Aurora MySQL or export the AHI metadata on S3 as a datalake.
License : MIT-0
'''
import os

import aws_cdk as cdk
from aws_cdk import Aspects
import config as config

from backend.backend import BackendStack
import cdk_nag
from cdk_nag import NagSuppressions , NagPackSuppression

app_name = config.CDK_APP_NAME
app = cdk.App()
env=cdk.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'), region=os.getenv('CDK_DEFAULT_REGION'))
backend_stack = BackendStack(app, app_name, config , env=env )

Aspects.of(app).add(cdk_nag.AwsSolutionsChecks())
NagSuppressions.add_stack_suppressions(backend_stack, suppressions=[
NagPackSuppression( id = 'AwsSolutions-IAM4' , reason ='Roles created by CDK constructs.'),
NagPackSuppression( id = 'AwsSolutions-IAM5' , reason ='Access to getImageSetMetadata at datastore level does not provide any privileges but is necessary to allow privelege at lower level of ImageSet within the datastore.'),
NagPackSuppression( id = 'AwsSolutions-SMG4' , reason ='Password rotation not required.'),
NagPackSuppression( id = 'AwsSolutions-RDS11' , reason ='Default port is preferred. Access is secured by security group.'),
NagPackSuppression( id = 'AwsSolutions-RDS16' , reason ='auditing disabled.')])
app.synth()


148 changes: 148 additions & 0 deletions metadata-index-old/backend/backend/backend.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0
"""

import json
import aws_cdk as cdk
from aws_cdk import (
aws_kms as kms,
aws_iam as iam,
aws_lambda as _lambda,
aws_s3_notifications as s3n,
aws_s3 as s3,
aws_sns as sns,
aws_ec2 as ec2,
Stack,
aws_secretsmanager as secretsmanager,
aws_logs as logs,
CfnOutput,
)
from aws_cdk import SecretValue
from constructs import Construct
from .function import PythonLambda
from .network import Vpc
from .security_groups import SecurityGroups
from .lambda_roles import LambdaRoles
from .custom import CustomLambdaResource
from .database import AuroraServerlessDB
from .glue import GlueDatabase



class BackendStack(Stack):

def __init__(self, scope: Construct, construct_id: str, config: dict, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
stack_name = self.stack_name.lower()
#Read lambdas relted configs and create the lambda role
lambda_config = config.LAMBDA_CONFIG

# Get the VPC ID from stack if specified, otherwise creates a new one.
if (config.VPC["USE_VPC"] == True):
if (config.VPC["EXISTING_VPC_ID"] != ""):
vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_id=config.VPC["EXISTING_VPC_ID"])
else:
vpc_cidr = config.VPC["NEW_VPC_CIDR"]
vpc_construct = Vpc(self, "Network", vpc_cidr)
vpc = vpc_construct.getVpc()
else:
vpc=None
# Create Security groups
sec_groups = SecurityGroups(self, "Security Groups", vpc=vpc)

lambda_config = config.LAMBDA_CONFIG
ahi_datastore_arn = config.AHI_DATASTORE_ARN


sns_key = kms.Key(self, "sns-topic",enable_key_rotation=True)
sns_topic = sns.Topic(self, "ahi-to-index-topic", display_name=stack_name+"ahi-to-index-topic" , master_key=sns_key )
ahi_output_bucket = s3.Bucket.from_bucket_attributes(self, "ImportedBucket",bucket_arn=config.AHI_IMPORT_OUPUT_BUCKET_ARN)
ahi_output_bucket.add_event_notification(s3.EventType.OBJECT_CREATED, s3n.SnsDestination(sns_topic) , s3.NotificationKeyFilter(suffix='job-output-manifest.json'))
sns_key.grant_encrypt_decrypt(iam.ServicePrincipal("s3.amazonaws.com"))
sns_key.grant_encrypt_decrypt(iam.ServicePrincipal("lambda.amazonaws.com"))

if config.RDBMS_CONFIG["enabled"] == True:
#Create the database
aurora_security_group = sec_groups.getAuroraSecGroup()
db_min_acu_capacity = config.RDBMS_CONFIG["min_acu_capacity"]
db_max_acu_capacity = config.RDBMS_CONFIG["max_acu_capacity"]
db_name = config.RDBMS_CONFIG["db_name"]
db = AuroraServerlessDB(self,"ahi-to-rdbms-Aurora-DB", vpc=vpc, db_name=db_name, aurora_security_group=aurora_security_group , min_acu_capacity=db_min_acu_capacity , max_acu_capacity=db_max_acu_capacity )
db_secret_arn = db.getDbCluster().secret.secret_arn

db_user_secret = secretsmanager.Secret(self, "Secret", secret_object_value={
"username": SecretValue.unsafe_plain_text("ahi_parser"),
"host": SecretValue.unsafe_plain_text(db.getDbCluster().cluster_endpoint.hostname ) ,
"dbname": SecretValue.unsafe_plain_text(db_name),
},
secret_name=stack_name+"-ahi-db-user-secret")




#MySql DBInit Lambda creation.
db_init_role = LambdaRoles(self, 'ahi-to-rdbms-db-init-lambdarole', db_secret_arn=db_secret_arn , )
fn_db_init = PythonLambda(self, "ahi-to-rdbms-db-Init", lambda_config["DbInit"], db_init_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_db_init.getFn().add_environment(key="DB_SECRET", value=db_secret_arn)

#Deploy the database schema
iep_schema = CustomLambdaResource(self, "db-schema", fn_db_init.getFn())
iep_schema.node.add_dependency(db.getDbCluster())

rdbms_lambda_role = LambdaRoles(self, 'ahi-to-rdbms-lambdarole', db_secret_arn=db_user_secret.secret_arn , datastore_arn=ahi_datastore_arn, database_resource_id=db.getDbCluster().cluster_resource_identifier )
fn_ahi_to_rdbms = PythonLambda(self, "ahi-to-rdbms", lambda_config["AHItoRDBMS"], rdbms_lambda_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_ahi_to_rdbms.getFn().add_environment(key="DB_SECRET", value=db_user_secret.secret_arn )
fn_ahi_to_rdbms.getFn().add_environment(key="POPULATE_INSTANCE_LEVEL", value=str(config.RDBMS_CONFIG["populate_instance_level"]))
fn_ahi_to_rdbms.getFn().add_environment(key="POPULATE_FRAME_LEVEL", value=str(config.RDBMS_CONFIG["populate_frame_level"]))
fn_ahi_to_rdbms.getFn().add_environment(key="AHLI_ENDPOINT", value="") #08/27/2023 - jpleger : This is a workaround for the medical-imaging service descriptor, not nice... Will fix soon.
ahi_output_bucket.grant_read(fn_ahi_to_rdbms.getFn())



fn_ahi_to_rdbms.getFn().add_permission("ahi-to-rdbms-sllow-sns", principal=iam.ServicePrincipal("sns.amazonaws.com"), action="lambda:InvokeFunction")
sns.Subscription(self, "ahi-to-rdbms-sns-subscription",topic=sns_topic,endpoint=fn_ahi_to_rdbms.getFn().function_arn ,protocol=sns.SubscriptionProtocol.LAMBDA)

if config.OPENSEARCH_CONFIG["enabled"] == True:
opensearch_lambda_role = LambdaRoles(self, 'ahi-to-rdbms-init-lambdarole', db_secret_arn=db_secret_arn , datastore_arn=ahi_datastore_arn )
fn_ahi_to_opensearch = PythonLambda(self, "ahi-to-opensearch", lambda_config["AHItoOpenSearch"], opensearch_lambda_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_ahi_to_opensearch.getFn().add_environment(key="DOMAIN_ENDPOINT", value="")
ahi_output_bucket.grant_read(fn_ahi_to_datalake.getFn())

fn_ahi_to_opensearch.getFn().add_permission("ahi-to-opensearch-allow-sns", principal=iam.ServicePrincipal("sns.amazonaws.com"), action="lambda:InvokeFunction")
sns.Subscription(self, "ahi-to-opensearch-sns-subscription",topic=sns_topic,endpoint=fn_ahi_to_opensearch.getFn().function_arn,protocol=sns.SubscriptionProtocol.LAMBDA)

if config.DATALAKE_CONFIG["enabled"] == True:
if config.DATALAKE_CONFIG["destination_bucket_name"] == "":
bucket_name=None
else:
bucket_name=config.DESTINATION_BUCKET_NAME
datalake_lambda_role = LambdaRoles(self, 'ahi-to-datalake-lambdarole', datastore_arn=ahi_datastore_arn)
access_log_lambda_role = iam.Role(self, "Role",assumed_by=iam.ServicePrincipal("logging.s3.amazonaws.com"),description="Grants S3 service to put access logs.")
access_log_bucket = s3.Bucket(self, "ahi-to-datalake-access_log-bucket", bucket_name=None, block_public_access=s3.BlockPublicAccess.BLOCK_ALL, removal_policy=cdk.RemovalPolicy.RETAIN , enforce_ssl=True , encryption=s3.BucketEncryption.S3_MANAGED)
access_log_bucket.grant_put(access_log_lambda_role)
destination_bucket = s3.Bucket(self, "ahi-to-datalake-destination-bucket", bucket_name=bucket_name, block_public_access=s3.BlockPublicAccess.BLOCK_ALL, removal_policy=cdk.RemovalPolicy.RETAIN , enforce_ssl=True , encryption=s3.BucketEncryption.S3_MANAGED , server_access_logs_prefix="access-logs/" , server_access_logs_bucket=access_log_bucket )
fn_ahi_to_datalake = PythonLambda(self, "ahi-to-datalake", lambda_config["AHItoDatalake"], datalake_lambda_role.getLambdaRole(), vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) , security_group=sec_groups.getLambdaSecGroup() )
fn_ahi_to_datalake.getFn().add_environment(key="DESTINATION_BUCKET", value=destination_bucket.bucket_name)
fn_ahi_to_datalake.getFn().add_environment(key="POPULATE_INSTANCE_LEVEL", value=str(config.DATALAKE_CONFIG["populate_instance_level"]))
fn_ahi_to_datalake.getFn().add_environment(key="AHLI_ENDPOINT", value="") #08/27/2023 - jpleger : This is a workaround for the medical-imaging service descriptor, not nice... Will fix soon.
destination_bucket.grant_read_write(fn_ahi_to_datalake.getFn())
ahi_output_bucket.grant_read(fn_ahi_to_datalake.getFn())
if config.DATALAKE_CONFIG["deploy_glue_default_config"] == True:
GlueDatabase(self, "ahi-datalake-db" , datalake_bucket=destination_bucket , stack_name=stack_name)



fn_ahi_to_datalake.getFn().add_permission("ahi-to-datalake-allows-sns", principal=iam.ServicePrincipal("sns.amazonaws.com"), action="lambda:InvokeFunction")
sns.Subscription(self, "ahi-to-datalke-sns-subscription",topic=sns_topic,endpoint=fn_ahi_to_datalake.getFn().function_arn ,protocol=sns.SubscriptionProtocol.LAMBDA)
if (config.VPC["USE_VPC"] == True):
CfnOutput(self, "ahi-vpc-id", export_name=f"{stack_name}-ahi-vpc-id", value=vpc.vpc_id)
CfnOutput(self, "ahi-output-bucket", export_name=f"{stack_name}-ahi-output-bucket", value=ahi_output_bucket.bucket_name)
CfnOutput(self, "ahi-datastore-arn", export_name=f"{stack_name}-ahi-datastore-arn", value=ahi_datastore_arn)
if config.RDBMS_CONFIG["enabled"] == True:
CfnOutput(self, "rdbms-cluster-id", export_name=f"{stack_name}-rdbms-database-arn", value=db.getDbCluster().cluster_resource_identifier)
CfnOutput(self, "rdbms-database-secret-arn", export_name=f"{stack_name}-rdbms-database-secret-arn", value=db_secret_arn)
CfnOutput(self, "rdbms-database-name", export_name=f"{stack_name}-rdbms-database-name", value=db_name)
CfnOutput(self, "rdbms-database-security-group", export_name=f"{stack_name}-rdbms-database-security-group", value=aurora_security_group.security_group_id)
if config.DATALAKE_CONFIG["enabled"] == True:
CfnOutput(self, "datalake-destination-bucket", export_name=f"{stack_name}-datalake-destination-bucket", value=destination_bucket.bucket_name)
21 changes: 21 additions & 0 deletions metadata-index-old/backend/backend/custom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0
"""

from constructs import Construct
from aws_cdk import CustomResource, custom_resources as cr, aws_logs as logs


class CustomLambdaResource(Construct):
def __init__(self, scope: Construct, id: str, lambda_handler, cr_properties={}, **kwargs) -> None:
super().__init__(scope, id, **kwargs)

cr_provider = cr.Provider(
self, "CustomLambdaResourceProvider", on_event_handler=lambda_handler, log_retention=logs.RetentionDays.THREE_DAYS
)

self.cr = CustomResource(
self, "CustomLambdaResource", service_token=cr_provider.service_token, properties=cr_properties,
)

53 changes: 53 additions & 0 deletions metadata-index-old/backend/backend/database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0

Creates Aurora Serverless database for the application.
"""

from constructs import Construct
from aws_cdk import (
Duration,
RemovalPolicy,
aws_rds as rds,
aws_ec2 as ec2,
Stack,
)


class AuroraServerlessDB(Construct):

def __init__(self, scope: Construct, id: str, vpc: ec2.Vpc, db_name: str, aurora_security_group: ec2.SecurityGroup, min_acu_capacity: int, max_acu_capacity: int, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
stack_name = Stack.of(self).stack_name.lower()
self._subnetGroup = rds.SubnetGroup(self, "ahi-index-Aurora-Subnet-Group", vpc=vpc, vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS), description="ahi index Aurora DB Subnet Group")
self._db_adminpassword = rds.Credentials.from_generated_secret(username="admin")

self._dbCluster = rds.DatabaseCluster(
self,
"ahi-index-DBCluster",
instances=1,
engine=rds.DatabaseClusterEngine.aurora_mysql( version=rds.AuroraMysqlEngineVersion.of('8.0.mysql_aurora.3.04.0')),
parameter_group=rds.ParameterGroup.from_parameter_group_name(self, "ahi-index-db-cluster-ParameterGroup", parameter_group_name="default.aurora-mysql8.0"),
cluster_identifier=stack_name+"-ahi-index-db-cluster",
default_database_name=db_name,
security_groups=[aurora_security_group,],
credentials=self._db_adminpassword,
subnet_group=self._subnetGroup,
deletion_protection=True,
removal_policy=RemovalPolicy.SNAPSHOT,
storage_encrypted=True,
iam_authentication=True,
backtrack_window=Duration.hours(24),
instance_props=rds.InstanceProps(
vpc=vpc,
instance_type=ec2.InstanceType("Serverless"),
publicly_accessible=False,
)
)
self._dbCluster.node.default_child.add_property_override('ServerlessV2ScalingConfiguration', {"MinCapacity": min_acu_capacity, "MaxCapacity": max_acu_capacity})
#self._dbCluster.add_rotation_single_user(exclude_characters="\"@/\\" , vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS))

def getDbCluster(self) -> rds.DatabaseCluster:
return self._dbCluster

62 changes: 62 additions & 0 deletions metadata-index-old/backend/backend/function.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
SPDX-License-Identifier: MIT-0

Generates Lambdas functions for the pplication.
"""

from constructs import Construct
from aws_cdk import (
aws_logs as logs,
aws_iam as iam,
aws_ec2 as ec2,
aws_lambda as lambda_,
aws_lambda_python_alpha as aws_lambda_python,
Duration as Duration,
)


class PythonLambda(Construct):
def __init__(self, scope: Construct, id: str, config, role: iam.Role, vpc: ec2.Vpc = None , vpc_subnets: ec2.SubnetSelection = None, security_group: ec2.SecurityGroup = None, **kwargs) -> None:
super().__init__(scope, id, **kwargs)

layers = []
for l in config["layers"]:
layers.append( aws_lambda_python.PythonLayerVersion(self, "ahi-to-index-"+l,
#code=lambda_.Code.from_asset("lambda_layer/"+l),
entry="lambda_layer/"+l,
compatible_runtimes=[lambda_.Runtime.PYTHON_3_11],
license="Apache-2.0",
description=""
))
if ((vpc is None) or (vpc_subnets is None) or (security_group is None)): #Building lambda wihtout any VPC.
self._fn = lambda_.Function(self, id,
runtime=lambda_.Runtime.PYTHON_3_11,
handler=config["index"]+"."+config["handler"],
code=lambda_.Code.from_asset(config["entry"]),
layers=layers,
role=role,
reserved_concurrent_executions=None,
timeout= Duration.minutes(int(config["timeout"])),
memory_size=config["memory"]
)
else:
self._fn = lambda_.Function(self, id,
runtime=lambda_.Runtime.PYTHON_3_11,
handler=config["index"]+"."+config["handler"],
code=lambda_.Code.from_asset(config["entry"]),
layers=layers,
role=role,
vpc=vpc,
vpc_subnets=vpc_subnets,
security_groups=[security_group,],
reserved_concurrent_executions=None,
timeout= Duration.minutes(int(config["timeout"])),
memory_size=config["memory"]
)

for env in config["envs"]:
self._fn.add_environment(key=str(env), value=str(config["envs"][env]))

def getFn(self) -> lambda_.Function :
return self._fn
Loading