Skip to content

Commit

Permalink
UPDATE to have a single URL as pds_nucleus_opensearch_url for all the…
Browse files Browse the repository at this point in the history
… nodes, have Node specific OpenSearch registry names and used a data source for S3 to read existing staging bucket in MCP Prod.
  • Loading branch information
ramesh-maddegoda committed Feb 12, 2025
1 parent b9e8a79 commit 8c6c0fa
Show file tree
Hide file tree
Showing 8 changed files with 74 additions and 50 deletions.
22 changes: 6 additions & 16 deletions terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,9 @@ Note: Examples of `terraform.tfvars` files are available at `terraform/variable
- pds_node_names = List of PDS Node names to be supported (E.g.: ["PDS_SBN", "PDS_IMG", "PDS_EN"]).The following node name format should be used.
- (PDS_ATM, PDS_ENG, PDS_GEO, PDS_IMG, PDS_NAIF, PDS_RMS, PDS_SBN, PSA, JAXA, ROSCOSMOS)
- Please check https://nasa-pds.github.io/registry/user/harvest_job_configuration.html for PDS Node name descriptions.
- pds_nucleus_opensearch_url : OpenSearch URL to be used with Harvest tool
- pds_nucleus_opensearch_registry_names : List of Nod3e specific OpenSearch registry names (E.g.: ["pds-nucleus-sbn-registry"", "pds-nucleus-img-registry"])
- pds_nucleus_opensearch_urls : List of Node specific OpenSearch URLs (E.g.: ["https://abcdef.us-west-2.aoss.amazonaws.com", "https://opqrst.us-west-2.aoss.amazonaws.com"])
- pds_nucleus_opensearch_credential_relative_url : Opensearch Credential URL (E.g.: "http://<IP ADDRESS>/AWS_CONTAINER_CREDENTIALS_RELATIVE_URI")
- pds_nucleus_harvest_replace_prefix_with_list : List of harvest replace with strings (E.g.: ["s3://pds-sbn-nucleus-staging","s3://pds-img-nucleus-staging"])
Expand Down Expand Up @@ -121,7 +124,8 @@ aws_secretmanager_key_arn = "arn:aws:kms:us-west-2:12345678:key/12345-12
# Please check https://nasa-pds.github.io/registry/user/harvest_job_configuration.html for PDS Node name descriptions.
pds_node_names = ["PDS_SBN", "PDS_IMG"]
pds_nucleus_opensearch_urls = ["https://abcdef.us-west-2.aoss.amazonaws.com", "https://opqrst.us-west-2.aoss.amazonaws.com"]
pds_nucleus_opensearch_url = "https://abcdef.us-west-2.aoss.amazonaws.com"
pds_nucleus_opensearch_registry_names = ["pds-nucleus-sbn-registry"", "pds-nucleus-img-registry"]
pds_nucleus_opensearch_credential_relative_url = "http://<IP ADDRESS>/AWS_CONTAINER_CREDENTIALS_RELATIVE_URI"
pds_nucleus_harvest_replace_prefix_with_list = ["s3://pds-sbn-nucleus-staging", "s3://pds-img-nucleus-staging"]
Expand Down Expand Up @@ -183,21 +187,7 @@ terraform apply
13. The DAGs can be added to the Airflow by uploading Airflow DAG files to the DAG folder of S3 bucket
configured as `mwaa_dag_s3_bucket_name` in the `terraform.tfvars` file.

14. Go to the AWS Secret manager (https://us-west-2.console.aws.amazon.com/secretsmanager/listsecrets?region=us-west-2) and locate the secrets in the following format.
- pds/nucleus/opensearch/creds/<PDS NODE NAME>/user
- pds/nucleus/opensearch/creds/<PDS NODE NAME>/password

E.g.:
- pds/nucleus/opensearch/creds/PDS_IMG/user
- pds/nucleus/opensearch/creds/PDS_SBN/user
- pds/nucleus/opensearch/creds/PDS_IMG/password
- pds/nucleus/opensearch/creds/PDS_SBN/password

15. Obtain the Opensearch username and password for each PDS Node and update the above secrets with relevant usernames and passwords.
- To update a secret, click on a secret -> Retrieve secret value -> Edit -> Save


15. Use the PDS Data Upload Manager (DUM) tool to upload files to pds_nucleus_staging_bucket.
16. Use the PDS Data Upload Manager (DUM) tool to upload files to pds_nucleus_staging_bucket.


## Steps to Access Nucleus Airflow UI With Cognito Credentials
Expand Down
4 changes: 2 additions & 2 deletions terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ module "product-copy-completion-checker" {
pds_nucleus_cold_archive_bucket_name_postfix = var.pds_nucleus_cold_archive_bucket_name_postfix

pds_node_names = var.pds_node_names
pds_nucleus_opensearch_urls = var.pds_nucleus_opensearch_urls
pds_nucleus_opensearch_url = var.pds_nucleus_opensearch_url
pds_nucleus_opensearch_registry_names = var.pds_nucleus_opensearch_registry_names
pds_nucleus_opensearch_credential_relative_url = var.pds_nucleus_opensearch_credential_relative_url
pds_nucleus_harvest_replace_prefix_with_list = var.pds_nucleus_harvest_replace_prefix_with_list

Expand Down Expand Up @@ -141,4 +142,3 @@ module "cognito-auth" {
cognito_user_pool_id = var.cognito_user_pool_id
aws_elb_account_id_for_the_region = var.aws_elb_account_id_for_the_region
}

Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
dag_name = os.environ.get('AIRFLOW_DAG_NAME')
pds_node_name = os.environ.get('PDS_NODE_NAME')
opensearch_endpoint = os.environ.get('OPENSEARCH_ENDPOINT')
opensearch_registry_name = os.environ.get('OPENSEARCH_REGISTRY_NAME')
pds_nucleus_opensearch_credential_relative_url = os.environ.get('OPENSEARCH_CREDENTIAL_RELATIVE_URL')
replace_prefix_with = os.environ.get('REPLACE_PREFIX_WITH')
efs_mount_path = os.environ.get('EFS_MOUNT_PATH')
Expand All @@ -45,6 +46,7 @@
pds_hot_archive_bucket_name = os.environ.get('PDS_HOT_ARCHIVE_S3_BUCKET_NAME')
pds_cold_archive_bucket_name = os.environ.get('PDS_COLD_ARCHIVE_S3_BUCKET_NAME')
pds_staging_bucket_name = os.environ.get('PDS_STAGING_S3_BUCKET_NAME')
product_batch_size = os.environ.get('PRODUCT_BATCH_SIZE')

replace_prefix = efs_mount_path

Expand Down Expand Up @@ -98,7 +100,7 @@ def process_completed_products():
logger.debug(f"Number of completed product labels : {str(response['records'])}")
logger.debug(f"Number of completed product labels : {str(len(response['records']))}")

n = 10
n = product_batch_size
count = 0
list_of_product_labels_to_process = []

Expand Down Expand Up @@ -222,7 +224,7 @@ def create_harvest_configs_and_trigger_nucleus(list_of_product_labels_to_process
logger.info(f"Created harvest config XML file: {harvest_config_file_path}")

connection_xml_content = f"""<?xml version="1.0" encoding="UTF-8"?>
<registry_connection index="en-registry">
<registry_connection index="{opensearch_registry_name}">
<ec2_credential_url endpoint="{opensearch_endpoint}">{pds_nucleus_opensearch_credential_relative_url}</ec2_credential_url>
</registry_connection>
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -235,13 +235,36 @@ resource "aws_s3_bucket" "pds_nucleus_s3_config_bucket" {
force_destroy = true
}

# Create a staging S3 Bucket for each PDS Node
resource "aws_s3_bucket" "pds_nucleus_s3_staging_bucket" {
count = length(var.pds_node_names)
# convert PDS node name to S3 bucket name compatible format
# This data source is added to access existing S3 buckets, bcause an S3 staging bucket is already available in MCP Prod environment.
data "aws_s3_bucket" "pds_nucleus_s3_staging_bucket" {
count = length(var.pds_node_names)
bucket = "${lower(replace(var.pds_node_names[count.index], "_", "-"))}-${var.pds_nucleus_staging_bucket_name_postfix}"
}

# Commented out the following S3 bucket resources, because an S3 staging bucket is already available in MCP Prod environment.
# However, this resource is useful when deploying in a fresh environment.

# # Create a staging S3 Bucket for each PDS Node
# resource "aws_s3_bucket" "pds_nucleus_s3_staging_bucket" {
# count = length(var.pds_node_names)
# # convert PDS node name to S3 bucket name compatible format
# bucket = "${lower(replace(var.pds_node_names[count.index], "_", "-"))}-${var.pds_nucleus_staging_bucket_name_postfix}"
# }

# # Create an aws_s3_bucket_notification for each s3 bucket of each Node
# resource "aws_s3_bucket_notification" "pds_nucleus_s3_staging_bucket_notification" {
#
# count = length(var.pds_node_names)
# # convert PDS node name to S3 bucket name compatible format
# bucket = "${lower(replace(var.pds_node_names[count.index], "_", "-"))}-${var.pds_nucleus_staging_bucket_name_postfix}"
#
# queue {
# events = ["s3:ObjectCreated:*"]
# queue_arn = aws_sqs_queue.pds_nucleus_files_to_save_in_database_sqs_queue[count.index].arn
# }
# }


# Create pds_nucleus_s3_file_file_event_processor_function for each PDS Node
resource "aws_lambda_function" "pds_nucleus_s3_file_file_event_processor_function" {
count = length(var.pds_node_names)
Expand Down Expand Up @@ -292,15 +315,17 @@ resource "aws_lambda_function" "pds_nucleus_product_completion_checker_function"
DB_SECRET_ARN = aws_secretsmanager_secret.pds_nucleus_rds_credentials.arn
EFS_MOUNT_PATH = "/mnt/data"
ES_AUTH_CONFIG_FILE_PATH = "/etc/es-auth.cfg"
OPENSEARCH_ENDPOINT = var.pds_nucleus_opensearch_urls[count.index]
OPENSEARCH_ENDPOINT = var.pds_nucleus_opensearch_url
OPENSEARCH_REGISTRY_NAME = var.pds_nucleus_opensearch_registry_names[count.index]
OPENSEARCH_CREDENTIAL_RELATIVE_URL = var.pds_nucleus_opensearch_credential_relative_url
PDS_NODE_NAME = var.pds_node_names[count.index]
PDS_NUCLEUS_CONFIG_BUCKET_NAME = var.pds_nucleus_config_bucket_name
REPLACE_PREFIX_WITH = var.pds_nucleus_harvest_replace_prefix_with_list[count.index]
PDS_MWAA_ENV_NAME = var.airflow_env_name
PDS_HOT_ARCHIVE_S3_BUCKET_NAME = "${lower(replace(var.pds_node_names[count.index], "_", "-"))}-${var.pds_nucleus_hot_archive_bucket_name_postfix}"
PDS_COLD_ARCHIVE_S3_BUCKET_NAME = "${lower(replace(var.pds_node_names[count.index], "_", "-"))}-${var.pds_nucleus_cold_archive_bucket_name_postfix}"
PDS_STAGING_S3_BUCKET_NAME = aws_s3_bucket.pds_nucleus_s3_staging_bucket[count.index].id
PDS_STAGING_S3_BUCKET_NAME = data.aws_s3_bucket.pds_nucleus_s3_staging_bucket[count.index].id
PRODUCT_BATCH_SIZE = var.product_batch_size
}
}
}
Expand Down Expand Up @@ -342,7 +367,7 @@ resource "aws_lambda_permission" "s3-lambda-permission" {
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.pds_nucleus_s3_file_file_event_processor_function[count.index].function_name
principal = "s3.amazonaws.com"
source_arn = aws_s3_bucket.pds_nucleus_s3_staging_bucket[count.index].arn
source_arn = data.aws_s3_bucket.pds_nucleus_s3_staging_bucket[count.index].arn
}

# Create an SQS queue to receive S3 bucket notifications for each s3 bucket of each Node
Expand Down Expand Up @@ -374,7 +399,7 @@ data "aws_iam_policy_document" "pds_nucleus_files_to_save_in_database_sqs_queue_
condition {
test = "StringEquals"
variable = "aws:SourceArn"
values = [aws_s3_bucket.pds_nucleus_s3_staging_bucket[count.index].arn]
values = [data.aws_s3_bucket.pds_nucleus_s3_staging_bucket[count.index].arn]
}
}
}
Expand All @@ -386,20 +411,6 @@ resource "aws_sqs_queue_policy" "pds_nucleus_files_to_save_in_database_sqs_queue
policy = data.aws_iam_policy_document.pds_nucleus_files_to_save_in_database_sqs_queue_policy_document[count.index].json
}

# Create an aws_s3_bucket_notification for each s3 bucket of each Node
resource "aws_s3_bucket_notification" "pds_nucleus_s3_staging_bucket_notification" {

count = length(var.pds_node_names)
# convert PDS node name to S3 bucket name compatible format
bucket = "${lower(replace(var.pds_node_names[count.index], "_", "-"))}-${var.pds_nucleus_staging_bucket_name_postfix}"

queue {
events = ["s3:ObjectCreated:*"]
queue_arn = aws_sqs_queue.pds_nucleus_files_to_save_in_database_sqs_queue[count.index].arn
}
}


resource "time_sleep" "wait_for_database" {
create_duration = "2m"

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,15 @@ variable "pds_node_names" {
sensitive = true
}

variable "pds_nucleus_opensearch_urls" {
description = "List of PDS Nucleus OpenSearch Config file paths"
type = list(string)
variable "pds_nucleus_opensearch_url" {
description = "List of PDS Nucleus OpenSearch URL"
type = string
sensitive = true
}

variable "pds_nucleus_opensearch_registry_names" {
description = "List of PDS Nucleus OpenSearch Registry Names"
type = list(string)
sensitive = true
}

Expand Down Expand Up @@ -119,6 +125,12 @@ variable "airflow_env_name" {
type = string
}

variable "product_batch_size" {
description = "Size of the product batch to send to Nuclees DAG top process per given DAG invocation"
default = 10
type = number
}

variable "region" {
description = "AWS Region"
type = string
Expand Down
13 changes: 10 additions & 3 deletions terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -101,15 +101,22 @@ variable "pds_nucleus_default_airflow_dag_id" {
variable "pds_node_names" {
description = "List of PDS Node Names"
type = list(string)
default = ["pds-sbn", "pds-img"]
sensitive = true
}

variable "pds_nucleus_opensearch_url" {
description = "List of PDS Nucleus OpenSearch URL"
type = string
sensitive = true
}

variable "pds_nucleus_opensearch_urls" {
description = "List of PDS Nucleus OpenSearch Config file paths"
variable "pds_nucleus_opensearch_registry_names" {
description = "List of PDS Nucleus OpenSearch Registry Names"
type = list(string)
sensitive = true
}


variable "pds_nucleus_opensearch_credential_relative_url" {
description = "List of PDS Nucleus OpenSearch Credential Relative URL"
type = string
Expand Down
3 changes: 2 additions & 1 deletion terraform/variables/terraform.tfvars.dev
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ aws_secretmanager_key_arn = "arn:aws:kms:us-west-2:12345678:key/abcdef-a
# (PDS_ATM, PDS_ENG, PDS_GEO, PDS_IMG, PDS_NAIF, PDS_RMS, PDS_SBN, PSA, JAXA, ROSCOSMOS)

pds_node_names = ["PDS_SBN", "PDS_IMG"]
pds_nucleus_opensearch_urls = ["https://abcdef.us-west-2.aoss.amazonaws.com", "https://pqrst.us-west-2.aoss.amazonaws.com"]
pds_nucleus_opensearch_url = "https://abcdef.us-west-2.aoss.amazonaws.com"
pds_nucleus_opensearch_registry_names = ["pds-nucleus-sbn-registry"", "pds-nucleus-img-registry"]
pds_nucleus_opensearch_credential_relative_url = "http://<IP ADDRESS>/AWS_CONTAINER_CREDENTIALS_RELATIVE_URI"
pds_nucleus_harvest_replace_prefix_with_list = ["s3://pds-sbn-nucleus-staging", "s3://pds-img-nucleus-staging"]

Expand Down
3 changes: 2 additions & 1 deletion terraform/variables/terraform.tfvars.test
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ aws_secretmanager_key_arn = "arn:aws:kms:us-west-2:12345678:key/abcdef-a
# (PDS_ATM, PDS_ENG, PDS_GEO, PDS_IMG, PDS_NAIF, PDS_RMS, PDS_SBN, PSA, JAXA, ROSCOSMOS)

pds_node_names = ["PDS_SBN", "PDS_IMG"]
pds_nucleus_opensearch_urls = ["https://abcdef.us-west-2.aoss.amazonaws.com", "https://pqrst.us-west-2.aoss.amazonaws.com"]
pds_nucleus_opensearch_url = "https://abcdef.us-west-2.aoss.amazonaws.com"
pds_nucleus_opensearch_registry_names = ["pds-nucleus-sbn-registry"", "pds-nucleus-img-registry"]
pds_nucleus_opensearch_credential_relative_url = "http://<IP ADDRESS>/AWS_CONTAINER_CREDENTIALS_RELATIVE_URI"
pds_nucleus_harvest_replace_prefix_with_list = ["s3://pds-sbn-nucleus-staging", "s3://pds-img-nucleus-staging"]

Expand Down

0 comments on commit 8c6c0fa

Please sign in to comment.