Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with external PostgreSQL on AWS RDS #270

Open
miaeyg opened this issue Jan 30, 2024 · 26 comments
Open

Issues with external PostgreSQL on AWS RDS #270

miaeyg opened this issue Jan 30, 2024 · 26 comments
Assignees

Comments

@miaeyg
Copy link

miaeyg commented Jan 30, 2024

I have an issue and a question about using an external PostgreSQL on AWS (RDS) instead of the SAS internal default postgreSQL (Crunchy):

  1. I think there is a error in the description of the "storage_size" parameter described here https://github.com/sassoftware/viya4-iac-aws/blob/main/docs/CONFIG-VARS.md#postgresql-server
    The descriptions says: "Max storage allowed for the PostgreSQL server in MB"
    but looking at https://registry.terraform.io/modules/terraform-aws-modules/rds/aws/latest and the parameter "allocated_storage" (which uses the "storage_size" input var) is measured in GB, not MB: "The allocated storage in gigabytes". In addition the parameter's default value of 50 seems low compared to the Crunchy default value which is 128GB.

  2. I installed SAS with an external PostgreSQL and used the default value for "ssl_enforcement_enabled" which is "true" and now my SAS services to not start. I see in the log of sas-logon-app (as an example) this Java SSL exception below. Question is how do I fix this exception?

java.lang.IllegalStateException: org.postgresql.util.PSQLException: SSL error: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at com.sas.commons.db.postgres.PostgresDbInitializer.doCreate(PostgresDbInitializer.java:199)
at com.sas.commons.db.postgres.PostgresDbInitializer.createSchema(PostgresDbInitializer.java:158)

I downloaded the AWS CA in PEM format for my region from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html#UsingWithRDS.SSL.RegionCertificates and saved in a local directory then when I invoked the DaC project I added this option to the invocation of the viya-deployment Ansible project: --volume /home/sasinst/deployments/aws-ca:/config/v4_cfg_tls_trusted_ca_certs viya4-deployment and added this line to ansible-vars.yaml: V4_CFG_TLS_TRUSTED_CA_CERTS: "/home/sasinst/deployments/aws-creds" but it did not help

@dhoucgitter
Copy link
Member

Hi @miaeyg. The steps that you've followed to download the AWS CA file in PEM format are what you need to fix the error. Since you are using the DAC docker container and specifying the volume mapping as --volume /home/sasinst/deployments/aws-ca:/config/v4_cfg_tls_trusted_ca_certs, the docker-entrypoint.sh script should set the correct V4_CFG_TLS_TRUSTED_CA_CERTS environment variable value for you when the playbook runs inside the container. Try commenting out the V4_CFG_TLS_TRUSTED_CA_CERTS setting that you have in ansible-vars.yaml to see if that helps. The value you have set for that variable reflects what you might use if you were running DAC using the locally installed Ansible binary since it points to a location on your local workstation.

@miaeyg
Copy link
Author

miaeyg commented Jan 30, 2024

Hi @dhoucgitter, tried your suggestion as well and got the same results (Java SSL exceptions in all SAS services)

@dhoucgitter
Copy link
Member

Hi @miaeyg, did you run DAC using "viya,uninstall" tags to remove your previous deployment if you haven't already, then reinstall with "viya,install"? Also scan for the string "Running: ansible-playbook" in the initial lines of the docker run output to inspect the value of V4_CFG_TLS_TRUSTED_CA_CERTS being set.

@miaeyg
Copy link
Author

miaeyg commented Jan 30, 2024

Hi @dhoucgitter, each time I do a test I always uninstall and reinstall.
I can tell you that looking at the deployment folders created by DaC I saw these files created:

site-config/vdm/security/cacerts/il-central-1-bundle.pem
site-config/vdm/generators/customer-provided-ca-certificates.yaml

The contents of the second file is

apiVersion: builtin
kind: ConfigMapGenerator
metadata:
  name: sas-customer-provided-ca-certificates
behavior: merge
files:
  - cacerts1.pem=site-config/vdm/security/cacerts/il-central-1-bundle.pem

The second file is also included in the "kustomization.yaml" like this:

generators:
## vdm defined generators (pre)
- site-config/vdm/generators/sas-license.yaml
- site-config/vdm/generators/sas-shared-config.yaml
- site-config/vdm/generators/sas-consul-config-secret.yaml
- site-config/vdm/generators/ingress-input.yaml
- site-config/vdm/generators/openldap-bootstrap-config.yaml
- site-config/vdm/generators/postgres-default-secrets.yaml
- site-config/vdm/generators/customer-provided-ca-certificates.yaml
- site-config/vdm/generators/customer-provided-merge-sas-certframe-configmap.yaml

Note this line:

  • site-config/vdm/generators/customer-provided-ca-certificates.yaml

So it looks like the AWS CA PEM file was captured in the DaC docker and included in the correct places.

@miaeyg
Copy link
Author

miaeyg commented Jan 30, 2024

In addition to my previous reply I want to suggest that the generated file "site-config/vdm/generators/customer-provided-ca-certificates.yaml" contains a syntax error "cacerts1.pem=" in the last line:


apiVersion: builtin
kind: ConfigMapGenerator
metadata:
  name: sas-customer-provided-ca-certificates
behavior: merge
files:
  - cacerts1.pem=site-config/vdm/security/cacerts/il-central-1-bundle.pem

The source of this possible syntax error is here https://github.com/sassoftware/viya4-deployment/blob/4ef910891e6d6e6dca0bccbb7fd013c93a589ae0/roles/vdm/templates/generators/customer-provided-ca-certificates.yaml#L8

By the way, I tried to fix this by removing the "cacerts1.pem=" text and it did not help.

@dhoucgitter
Copy link
Member

In addition to my previous reply I want to suggest that the generated file "site-config/vdm/generators/customer-provided-ca-certificates.yaml" contains a syntax error "cacerts1.pem=" in the last line:

The DAC generated customer-provided-ca-certificates.yaml file I have contains the same format. There are other .yaml files that contain a similar

files:
  - token=filespec

format as well so I do not expect that is causing the problem.

@dhoucgitter
Copy link
Member

Hi @miaeyg, thanks for noticing that and good catch. The description for the storage_size input variable should indicate that the units are in GB, not MB. I will open a ticket to fix that doc reference. Will also handle adjusting the default size if needed.

  1. I think there is a error in the description of the "storage_size" parameter described here https://github.com/sassoftware/viya4-iac-aws/blob/main/docs/CONFIG-VARS.md#postgresql-server
    The descriptions says: "Max storage allowed for the PostgreSQL server in MB"
    but looking at https://registry.terraform.io/modules/terraform-aws-modules/rds/aws/latest and the parameter "allocated_storage" (which uses the "storage_size" input var) is measured in GB, not MB: "The allocated storage in gigabytes".

@miaeyg
Copy link
Author

miaeyg commented Jan 30, 2024 via email

@miaeyg
Copy link
Author

miaeyg commented Jan 31, 2024

Hi @dhoucgitter, about the SSL problem - by bad! I was using the wrong AWS CA PEM file as I downloaded by mistake the PEM for a different region than the region my RDS was running on (I have two accounts in different regions so I downloaded the PEM appropriate for the second account). Got it to work now. Sorry for the mess.

I have another related question - the generated RDS instance is marked as "Publicly accessible=yes". What if I do not want to make it public. How can I make it private and not public using this Terraform project? Is it supported to have it non-public?

Also, how can I control in which AZ the generated RDS instance will be created using this project? SAS recommends that the RDS instance will be in the same AZ as the SAS instances to avoid latency and to improve performance.

Also, it seems like the RDS instance generated by this project uses the following CA from AWS: "rds-ca-2019". According to AWS https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL-certificate-rotation.html they recommend to consider using newer CAs as this CA expires on August 2024 and not auto-rotated. How can I change the CA used what AWS recommend "rds-ca-rsa2048-g1 CA" from this Terraform project?

@dhoucgitter
Copy link
Member

dhoucgitter commented Jan 31, 2024

Hi @miaeyg, where do you see that your RDS instance is using the rds-ca-2019 CA?
In the external RDS instance I have created by IAC AWS, I am seeing rds-ca-rsa2048-g1 as the CA that's being used. Wondering if this could be a regional decision?
Showing this CA for my RDS instance
image

@dhoucgitter dhoucgitter self-assigned this Jan 31, 2024
@miaeyg
Copy link
Author

miaeyg commented Feb 1, 2024

Hi @dhoucgitter, I have already destroyed my env. so I do not have a screenshot for you but I went to RDS and saw the DB instance, clicked on it and in the details screen there was a "Certificate Authority' option and the value was "rds-ca-2019". I am sure about it. I have created the RDS instance in the "us-east-1" region. Is this the same region you are testing?

Is there an option to control the CA used for the RDS instance within any of the parameters/options of the "postgres_servers" variable in terraform.tfvars file?

Here is my "postgres_servers" section from terraform.tfvars:

postgres_servers = {
  default = {
    "server_version"          = "15"
    "instance_type"           = "db.r6idn.xlarge"
    "storage_size"            = 128
    "ssl_enforcement_enabled" = true
    "administrator_login"     = "pgadmin"
    "administrator_password"  = "my$up3rS3cretPassw00rd"
  }
}

@dhoucgitter
Copy link
Member

Found an AWS blog entry that seems relevant to the discussion:

Now through January 25, 2024 – new RDS DB instances will have the rds-ca-2019 certificate by default, unless you specify a different CA via the ca-certificate-identifier option on the create-db-instance API; or you specify a default CA override for your account like mentioned in the above section. Starting January 26, 2024 – any new database instances will default to using the rds-ca-rsa2048-g1 certificate. If you wish for new instances to use a different certificate, you can specify which certificate to use with the AWS console or the AWS CLI.

@miaeyg
Copy link
Author

miaeyg commented Feb 1, 2024

Thanks @dhoucgitter !
Just to mention that I created the RDS database instance after January 26th and still got rds-ca-2019 but I can try again.
What about my other two questions? Here they are again:

I have another related question - the generated RDS instance is marked as "Publicly accessible=yes". What if I do not want to make it public. How can I make it private and not public using this Terraform project? Is it supported to have it non-public?

Also, how can I control in which AZ the generated RDS instance will be created using this project? SAS recommends that the RDS instance will be in the same AZ as the SAS instances to avoid latency and to improve performance.

@dhoucgitter
Copy link
Member

Hi @miaeyg, the CA being assigned to new RDS instances seems to be different in different regions. I have opened a AWS case asking about the discrepancy between RDS instances created in us-west-1 where they are assigned the rds-ca-rsq2048-g1 CA and us-east-1 where they were still getting the rds-ca-2019 CA as of yesterday.

how can I control in which AZ the generated RDS instance will be created using this project? SAS recommends that the RDS instance will be in the same AZ as the SAS instances to avoid latency and to improve performance.

The RDS instance AZs are controlled by the AZs of the subnets that the RDS instance is using. The last few sentences of the first paragraph in the Subnet Requirements section helps explain the possibilities.

@miaeyg
Copy link
Author

miaeyg commented Feb 4, 2024

Hi @dhoucgitter thanks.

Reading the doc you pointed me to says:

Amazon RDS relies on having two subnets in separate AZs to provide database redundancy should a failure occur in one AZ.

Can I assume that first database subnet (and hence the corresponding AZ) on the provided list to terraform.tfvars file will be used for the RDS Master node and the second database subnet on the list will be used for the RDS Standby node? If not, then I do not understand how I can force the RDS Master to be on the same AZ as the private subnet's AZ (where all SAS components reside)?

@dhoucgitter
Copy link
Member

Hi @miaeyg, as you described, the behavior that I have observed is that RDS deploys the PostgresSQL database instance to the AZ corresponding to the first subnet in the list, whether that is the public subnet list or the database subnet list. I have not experienced or forced a DB failover however my expectation is that if a local AZ event caused the running PostgresSQL db to fail, RDS would start up a new instance in the AZ corresponding to the second subnet in the list as you describe.

@miaeyg
Copy link
Author

miaeyg commented Feb 5, 2024

Hi @dhoucgitter thanks. I think this info should be reflected in the project's doc to have the RDS in the same AZ as SAS components by setting so the first database subnet will be in the same AZ as the private subnet's AZ for performance reasons.

Still waiting on the last question about how can I make the RDS instance non-public.

Thanks for all your kind help!

@miaeyg
Copy link
Author

miaeyg commented Feb 6, 2024

Hi @dhoucgitter in addition to my previous comment from yesterday, there is one more question that came up - when setting up the "default" and "CDS" PostgreSQL as external PostgreSQL meaning as AWS RDS Database instances is it possible to setup only one database instance for both and have the two databases (default + CDS) reside within this single database instance?

I believe that using the supplied examples in this project will result in two database instances each with one database - correct me if I am wrong

@dhoucgitter
Copy link
Member

Hi @miaeyg,

I believe that using the supplied examples in this project will result in two database instances each with one database - correct me if I am wrong

You are right, the intent is to set up two separate database instances, each with it's own database. There may be additional concerns, but one reason for the separate instance is to provide a dedicated and thus higher performance path for the applications that plan to rely upon the CDS instance without suffering any latencies that may occur for those apps when using the default DB instance that most other Viya applications may already using.

@dhoucgitter
Copy link
Member

Hi @miaeyg, circling back around on the unanswered question.

I have another related question - the generated RDS instance is marked as "Publicly accessible=yes". What if I do not want to make it public. How can I make it private and not public using this Terraform project? Is it supported to have it non-public?

Yes, IAC supports creation of a private or publicy_accessible=false RDS instance. If your .tfvars content specifies a CIDR range for either default_public_access_cidrs or postgres_public_access_cidrs then publicly_accessible=true will get set. Setting both of those variables to null should result in creating a private RDS instance where publicy_accessible=false.

@miaeyg
Copy link
Author

miaeyg commented Feb 27, 2024

Thanks @dhoucgitter for answering this leftover question and all my other previous RDS questions!

@miaeyg miaeyg closed this as completed Feb 27, 2024
@miaeyg
Copy link
Author

miaeyg commented Jun 17, 2024

Hi @dhoucgitter,

I am reopening this issue - maybe you will have an idea. I just deployed SAS Viya in AWS using this project and noticed that RDS instance was deployed in the AZ corresponding to the second (not first as the assumption was) in the "database" list. I am using BYON scenario #2. Question is if and how to control in which AZ the RDS deploys to.

Here are screenshots of what I noticed:

Note RDS instance was deployed to the il-central-1b AZ:
image

The terraform.tfvars includes this:
image

And you can see that the first subnet in the "database" list is in the il-central-1a AZ:
image

I guess this is bad from performance perspective. Any idea what happened and how to control this behavior of RDS deployment?

Here is your reply about this some time ago:

Hi @miaeyg, as you described, the behavior that I have observed is that RDS deploys the PostgresSQL database instance to the AZ corresponding to the first subnet in the list, whether that is the public subnet list or the database subnet list. I have not experienced or forced a DB failover however my expectation is that if a local AZ event caused the running PostgresSQL db to fail, RDS would start up a new instance in the AZ corresponding to the second subnet in the list as you describe.

@miaeyg miaeyg reopened this Jun 17, 2024
@miaeyg
Copy link
Author

miaeyg commented Jun 17, 2024

Hi @dhoucgitter,

I am reopening this issue - maybe you will have an idea. I just deployed SAS Viya in AWS using this project and noticed that RDS instance was deployed in the AZ corresponding to the second (not first as the assumption was) in the "database" list. I am using BYON scenario #2. Question is if and how to control in which AZ the RDS deploys to.

Here are screenshots of what I noticed:

Note RDS instance was deployed to the il-central-1b AZ:
image

The terraform.tfvars includes this:
image

And you can see that the first subnet in the "database" list is in the il-central-1a AZ:
image

I guess this is bad from performance perspective. Any idea what happened and how to control this behavior of RDS deployment?

Here is your reply about this some time ago:

Hi @miaeyg, as you described, the behavior that I have observed is that RDS deploys the PostgresSQL database instance to the AZ corresponding to the first subnet in the list, whether that is the public subnet list or the database subnet list. I have not experienced or forced a DB failover however my expectation is that if a local AZ event caused the running PostgresSQL db to fail, RDS would start up a new instance in the AZ corresponding to the second subnet in the list as you describe.

@dhoucgitter
Copy link
Member

dhoucgitter commented Jun 19, 2024

Hi @miaeyg , what happens if you remove the second subnet and only supply the first one. Is RDS able to successfully place the postgres instance there?

This is only to verify that RDS can create the instance there and that your ID is correct along with the other bits, not a permanent solution.

@miaeyg
Copy link
Author

miaeyg commented Jun 20, 2024

Hi @dhoucgitter your suggestion to supply only 1 database subnet fails to create an RDS instance with this error:

DBSubnetGroupDoesNotCoverEnoughAZs: The DB subnet group doesn't meet Availability Zone (AZ) coverage requirement. 
Current AZ coverage: il-central-1a. Add subnets to cover at least 2 AZs.

It seems like supplying 2 subnets in two different AZs is an AWS requirement but it seems like AWS RDS instance is not necessarily placed in the first database subnet provided to Terraform, as I have shown in my previous comment, which can cause SAS private subnet to be placed in AZ1 and RDS instance to be placed in AZ2 which I understand is bad performance-wise. How can we force AWS to place the RDS instance in the same AZ as the SAS private subnet?

@miaeyg
Copy link
Author

miaeyg commented Jul 31, 2024

Hi @dhoucgitter did you see me previous update on Jun 20th? Do you have any suggestion for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants