This section makes an attempt to cover all possible aspects of IAM access management for Capillaries and Capillaries deployment. It requires some understanding of how AWS resource access management works.
There are two sides to this story:
- AWS identities used by Capillaries binaries (webapi, daemon) after the deploymenti is complete ("AWS: who is trying to access S3 buckets with Capillaries data and configuration files?")
- AWS identities used to create a deployment ("AWS: who runs capideploy?")
Let's start with IAM access management for Capillaries binaries.
Capillaries binaries running in your AWS deployment will need to read and write files from/to S3 bucket. If you paid close attention to Capillaries S3 setup for integration tests, you probably remember that we used a dedicated test IAM user UserAccessCapillariesTestbucket
and provided its AWS credentials whin building test Docker images. Bucket policy explicitly gives the user arn:aws:iam::<your_aws_acount>:user/UserAccessCapillariesTestbucket
access to the buckets.
For production environment, it make more sense to use AWS feature called instance profile
: it allows binaries running on specific ec2 instances to access specified AWS resources (S3 buckets, in our case) without using user credentials. capideploy S3 bucket access model will use a separate policy and a separate role with this policy attached, and Capillaries instances can assume that role using instance profile mechanism.
Which AWS identity should be used to run capideploy? There are some options.
-
Run capideploy under your AWS root account, but this is generally discouraged. It's a trivial case and we will not be considering it.
-
Run capideploy under some dedicated IAM account
UserCapideployOperator
that has permissions to create and maintain all kinds of required AWS resources. -
Pretend that capideploy is executed by some third party that does not have an IAM account within your AWS account. You want to grant that third party some specific permissions that allow that third party to create Capillaries deployment in your AWS account workspace. Giving a third party access to your AWS resources is a standard practice and the recommended way to do that is to use IAM roles. We will be using
PolicySaasCapideployOperator
attached toUserSaasCapideployOperator
on theSaaS
side, and we will ask customer's AWS account to trustUserSaasCapideployOperator
with the permissions to create and maintain AWS resource within customer's AWS account.
The rest of the IAM settings
section discusses the AWS IAM preparation steps to create the necessary role structure for 2
and 3
. Basic familiarity with AWS console is required. Through the document we will be referring to two different AWS accounts:
- customer's ("your") AWS account: this account technically OWNS created AWS resources and will be BILLED by Amazon for them;
- "SaaS" AWS account: this account will NOT be billed by Amazon, it only use its
UserSaasCapideployOperator
user to assume a role within customer's (your) AWS account; NO resources will be created within "SaaS" AWS account.
Let's assume all capideploy activities (creation and maintenance of AWS resources) are performed on behalf of an IAM user named UserCapideployOperator
. As a first step, create this user in IAM->Users
section of customer's (your) AWS console.
We will attach PolicyCapideployOperator
to this user later.
Create credentials for UserCapideployOperator
and save them in UserCapideployOperator.rc:
export AWS_ACCESS_KEY_ID=AK...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
If you want to run capideploy unnder this account (not under some SaaS provider account as described below), run this .rc file before running capideploy, so AWS SDK can use those credentials.
This section discusses the steps required to implement instance profile
-based S3 bucket access mentioned above.
Capillaries binaries running in your AWS deployment will need to read and write files from/to S3 bucket. As per Capillaries S3 instructions, we assume that you already have an S3 bucket for your future Capillaries deployment, let's assume the name of the bucket is capillaries-testbucket
(in fact, it will be more like acme-corp-prod-files
) and it has Block all public access
setting on (assuming you do not want strangers to see your files).
In IAM->Policies
, let's create a policy PolicyAccessCapillariesTestbucket
that allows access to the bucket we will be using:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::capillaries-testbucket"
},
{
"Effect": "Allow",
"Action": [
"s3:DeleteObject",
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::capillaries-testbucket/*"
}
]
}
In IAM->Roles
, create a role RoleAccessCapillariesTestbucket
with Trusted entity type
set to AWS Service
and:
- attach the newly created
PolicyAccessCapillariesTestbucket
to it (Permissions
tab); - under
Trust relationships
, make sure that ec2 service can assume this role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Please note that, since we created the role with Trusted entity type
set to AWS Service
, RoleAccessCapillariesTestbucket
has two ARNs, as a role and as an instance profile:
Name type | Name |
---|---|
ARN | arn:aws:iam::<customer_aws_account_id>:role/RoleAccessCapillariesTestbucket |
Instance profile ARN | arn:aws:iam::<customer_aws_account_id>:instance-profile/RoleAccessCapillariesTestbucket |
Run the following command as AWS root or as UserCapideployOperator
(if you have already assigned iam:GetInstanceProfile
permission to it, see below):
$ aws iam get-instance-profile --instance-profile-name RoleAccessCapillariesTestbucket
The result shows that role RoleAccessCapillariesTestbucket
is "wrapped" by instance profile RoleAccessCapillariesTestbucket
.
As we agreed above, UserCapideployOperator
(who potentially can be a third party), needs only a very restricted set of permissions. This user will need permissions to do two major things:
- create/delete AWS resources (networks, subnets, instances etc) that will provide infrastructure to run Capillaries binaries and Cassandra cluster
- give created instances permission to read/write config/data files from/to S3 bucket
In IAM->Policies, create a customer-managed policy PolicyCapideployOperator:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateAddress",
"ec2:AssociateIamInstanceProfile",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateImage",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateVpc",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSnapshot",
"ec2:DeleteSubnet",
"ec2:DeleteVolume",
"ec2:DeleteVpc",
"ec2:DeregisterImage",
"ec2:DescribeAddresses",
"ec2:DescribeImages",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInternetGateways",
"ec2:DescribeKeyPairs",
"ec2:DescribeNatGateways",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSnapshots",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DetachVolume",
"ec2:ReleaseAddress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"iam:GetInstanceProfile",
"tag:GetResources"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::<customer_aws_account_id>:role/RoleAccessCapillariesTestbucket"
}
]
}
The first part is obvious: it lists all AWS API calls performed by capideploy. As for the second part,it adds PassRole permission for RoleAccessCapillariesTestbucket
created above. Without this permission, AssociateIamInstanceProfile
call (that tells AWS to allow instances to access the bucket) will fail.
Just in case - to list all AWS API calls used by capideploy, run:
grep -r -e "ec2Client\.[A-Za-z]*" --include "*.go"
grep -r -e "tClient\.[A-Za-z]*" --include "*.go"
In IAM->Users->UserCapideployOperators->Permissions
, attach PolicyCapideployOperator
.
This section is relevant only for those who decide to use the third IAM scenario with UserSaasCapideployOperator
and it assumes "you" are the "customer" of the SaaS provider and you give this SaaS provider access to your AWS resources.
All AWS console activities are performed under SaaS AWS account, not customer's ("your") AWS account.
In SaaS provider console IAM->Users
, create a new user UserSaasCapideployOperator
. This will be the account capideply will be running under. Create credentials for UserSaasCapideployOperator
and save them in UserSaasCapideployOperator.rc:
export AWS_ACCESS_KEY_ID=AK...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
If you want to run capideploy unnder this SaaS account (not under your UserCapideployOperator
account as described above), run this .rc file before running capideploy, so AWS SDK can use those credentials.
In SaaS provider console IAM->Policies
, create a new policy PolicySaasCapideployOperator
as follows:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateAddress",
"ec2:AssociateIamInstanceProfile",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateImage",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateVpc",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSnapshot",
"ec2:DeleteSubnet",
"ec2:DeleteVolume",
"ec2:DeleteVpc",
"ec2:DeregisterImage",
"ec2:DescribeAddresses",
"ec2:DescribeImages",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInternetGateways",
"ec2:DescribeKeyPairs",
"ec2:DescribeNatGateways",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSnapshots",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DetachVolume",
"ec2:ReleaseAddress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"iam:GetInstanceProfile",
"tag:GetResources",
"iam:PassRole",
"sts:AssumeRole"
],
"Resource": "*"
}
]
}
This policy is very similar to your PolicyCapideployOperator
discussed above, but there are two important differences:
- it allows
iam:PassRole
for all resources (because SaaS provider user will work with many customers, it will need access not only to yourarn:aws:iam::<customer_aws_account_id>:role/RoleAccessCapillariesTestbucket
, but to all relevant roles from many customers) - it allows
sts:AssumeRole
, capideploy will call AWS APIAssumeRole("arn:aws:iam::<customer_aws_account_id>:role/RoleCapideployOperator", externalId)
when establishing an AWS service session, so it will create/delete all resources on your (<customer_aws_account_id>
) behalf.
Attach PolicySaasCapideployOperator
to UserSaasCapideployOperator
.
In your (customer's) AWS console, create RoleCapideployOperator
with PolicyCapideployOperator
attached to it:
In IAM->Roles->RoleCapideployOperator->Trusted relationships
, add:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<saas_provider_aws_account_id>:user/UserSaasCapideployOperator"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "someExternalId"
}
}
}
]
}
This will allow UserSaasCapideployOperator
to assume role RoleCapideployOperator
and perform all actions listed in your (customer's) PolicySaasCapideployOperator
on your (customer's) AWS resources.
If you want to run capideploy as SaaS provider's UserSaasCapideployOperator
, make sure to set these environment variables:
export CAPIDEPLOY_AWS_ROLE_TO_ASSUME_ARN="arn:aws:iam::<customer_aws_account_id>:role/RoleCapideployOperator"
export CAPIDEPLOY_AWS_ROLE_TO_ASSUME_EXTERNAL_ID="..."
They will tell capideploy to assume the specified role before performing any action, so it will look like someone from customer's (your) AWS account performs them. If you are not sure about what external-id is, there are a lot of AWS-related articles that cover it. In two words: random GUID is good enough.
Sample capideploy_aws.rc file to run before Capildeploy contains variables used in the .jsonnet file:
# Variables used in jsonnet
# Alphanumeric characters only. Make it unique.
export CAPIDEPLOY_DEPLOYMENT_NAME="sampleaws001"
# Subnets, volumes and instances created here
export CAPIDEPLOY_SUBNET_AVAILABILITY_ZONE="us-east-1c"
# 1. aws or azure, 2. amd64 or arm64, 3. Flavor family, 4. Number of cores in Cassandra nodes. Daemon cores are 4 times less.
export CAPIDEPLOY_DEPLOYMENT_FLAVOR_POWER="aws.arm64.c7g.8"
# Cassandra cluster size - 4,8,16,32
export CAPIDEPLOY_CASSANDRA_CLUSTER_SIZE="4"
# SSH access to EC2 instances
export CAPIDEPLOY_SSH_USER=ubuntu
# Name of the keypair stored at AWS
export CAPIDEPLOY_AWS_SSH_ROOT_KEYPAIR_NAME=sampledeployment005-root-key
# Exported PEM file with private SSH key from the AWS keypair: either a file (/home/johndoe/.ssh/sampledeployment005_rsa) or PEM key contents
export CAPIDEPLOY_AWS_SSH_ROOT_KEYPAIR_PRIVATE_KEY_OR_PATH="-----BEGIN RSA PRIVATE KEY-----...-----END RSA PRIVATE KEY-----"
# NGINX IP address filter: your IP address(es) or cidr(s), for example: "135.23.0.0/16,136.104.0.21"
export CAPIDEPLOY_BASTION_ALLOWED_IPS="..."
# This is where capideploy takes Capillaries binaries from, see https://github.com/capillariesio/capillaries/blob/main/binaries_upload.sh
export CAPIDEPLOY_CAPILLARIES_RELEASE_URL=https://capillaries-release.s3.us-east-1.amazonaws.com/latest
# RabbitMQ admin access (RabbitMQ Mgmt UI), can be anything
export CAPIDEPLOY_RABBITMQ_ADMIN_NAME=...
export CAPIDEPLOY_RABBITMQ_ADMIN_PASS=...
# RabbitMQ user access (used by Capillaries components to talk to RabbitMQ), can be anything
export CAPIDEPLOY_RABBITMQ_USER_NAME=...
export CAPIDEPLOY_RABBITMQ_USER_PASS=...
# Goes to /home/$SSH_USER/.aws/config: default/region (without it, AWS API called by Capillaries binaries will not locate S3 buckets)
export CAPIDEPLOY_S3_AWS_DEFAULT_REGION=us-east-1
# Capideploy will use this instance profile when creating instances that need access to S3 bucket
export CAPIDEPLOY_AWS_INSTANCE_PROFILE_WITH_S3_ACCESS=RoleAccessCapillariesTestbucket
# Variables not used in jsonnet, but used by capideploy binaries. It's just more convenient to use env variables instead of cmd parameters
# These two variables are required only for the arn:aws:iam::<saas_provider_aws_account_id>:user/UserSaasCapideployOperator scenario.
# If CAPIDEPLOY_AWS_ROLE_TO_ASSUME_ARN is empty, capideploy runs under arn:aws:iam::<customer_aws_account_id>:user/UserCapideployOperator
# ARN of the role to assume, if needed
export CAPIDEPLOY_AWS_ROLE_TO_ASSUME_ARN="arn:aws:iam::...:role/RoleCapideployOperator"
# External id of the role to assume, can be empty. If CAPIDEPLOY_AWS_ROLE_TO_ASSUME_ARN is specified, it is recommended to use external id
export CAPIDEPLOY_AWS_ROLE_TO_ASSUME_EXTERNAL_ID="..."
# Variables not used in jsonnet, but used by AWS SDK called from capideploy binaries
# arn:aws:iam::<customer_aws_account_id>:user/UserCapideployOperator or arn:aws:iam::<saas_provider_aws_account_id>:user/UserSaasCapideployOperator
export AWS_ACCESS_KEY_ID=AK...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
go build ./pkg/cmd/capideploy
Run
source ~/capideploy_aws.rc
./capideploy deployment_create -p sample.jsonnet -v > deploy.log
If everything goes well, it will create a Capillaries deployment accessible at BASTION_IP address (see deploy.log). capideploy does not use DNS, so you will have to access your deployment by IP address. Find it in the deploy.log, it suggests you BASTION_IP environment variable for it.
Capillaries UI: http://$BASTION_IP
RabbitMQ console: http://$BASTION_IP:15672/#/queues
Prometheus: Cassandra stats http://$BASTION_IP:9090/graph?g0.expr=sum(irate(cassandra_clientrequest_localrequests_count%7Bclientrequest%3D%22Write%22%7D%5B1m%5D))&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=1&g0.range_input=15m&g1.expr=sum(irate(cassandra_clientrequest_localrequests_count%7Bclientrequest%3D%22Read%22%7D%5B1m%5D))&g1.tab=0&g1.display_mode=lines&g1.show_exemplars=0&g1.range_input=15m
Prometheus: CPU stats http://$BASTION_IP:9090/graph?g0.expr=100%20-%20(avg%20by(instance)%20(rate(node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B1m%5D))%20*%20100)&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=15m
Cassandra status: ssh -o StrictHostKeyChecking=no -i ~/.ssh/sprivate_key -J $BASTION_IP [email protected] 'nodetool status'
Capillaries repository has a few tests that are ready to run in the cloud deployment:
- lookup quicktest S3: run
test_one_run_cloud.sh
- Fannie Mae quicktest S3: run
test_one_run_cloud.sh
- Fannie Mae bigtest: run
test_one_run.sh
- Portfolio bigtest: run
test_one_run.sh
You will probably have to run these tests using UserAccessCapillariesTestbucket
IAM user as per Capillaries S3 instructions: that user should have access to the S3 bucket to upload/download config/data files.
Please note that in order to run these tests or your own scripts in your newly created deployment you only need access to the S3 bucket and HTTP access to the bastion host (which should allow HTTP access from all machines matching CAPIDEPLOY_BASTION_ALLOWED_IPS address or cidr). UserCapideployOperator
or UserSaasCapideployOperator
users are NOT involved at this point.
In general, you can start a Capillaries run in your deployment via REST API as follows:
source ~/UserAccessCapillariesTestbucket.rc
CAPILLARIES_AWS_TESTBUCKET=capillaries-testbucket
keyspace="lookup_quicktest_s3"
cfgS3=s3://$CAPILLARIES_AWS_TESTBUCKET/capi_cfg/lookup_quicktest
outS3=s3://$CAPILLARIES_AWS_TESTBUCKET/capi_out/lookup_quicktest
scriptFile=$cfgS3/script.json
paramsFile=$cfgS3/script_params_one_run_s3.json
webapiUrl=http://$BASTION_IP:6544
startNodes=read_orders,read_order_items
curl -s -w "\n" -d '{"script_url":"'$scriptFile'", "script_params_url":"'$paramsFile'", "start_nodes":"'$startNodes'"}' -H "Content-Type: application/json" -X POST $webapiUrl"/ks/$keyspace/run"
Q. The run starts, but no nodes processed A. For some reason, Capillaries daemon(s) cannot process RabbitMQ messages created when the run was started. Check out combined capidaemon logs at the bastion:
ssh -o StrictHostKeyChecking=no -i ~/.ssh/private_key ubuntu@$BASTION_IP less /mnt/capi_log/capidaemon.log
Q. Getting HTTP 403 (forbidden) error when navigating to the UI or calling webapi. A. Make sure you specified the proper CAPIDEPLOY_BASTION_ALLOWED_IPS environment variable when you run capideploy. To fix the problem in the deployment, edit /etc/nginx/includes/allowed_ips.conf on the bastion and restart nginx.
Q. When the UI calls Webapi, some error is returned. A. Make sure that the UI calls webapi at the right URL, not at localhost:6543. There is a section in pkg/rexec/scripts/ui/config.sh that patches UI js file, make sure it is working as expected.
To delete all AWS resources that your deployment uses, run
source ~/capideploy_aws.rc
./capideploy deployment_delete -p sample.jsonnet -v -i > undeploy.log