Skip to content

Commit b5eb8fa

Browse files
authored
Example for Cloudera on AWS env with TF and data services with Ansible (#151)
* Add example for Cloudera on AWS env with TF and data services with Ansible * Add license header to source files * Update AWS Terraform documentation * Add example for Cloudera on GCP env with TF * Add gitignore for AWS TF * Add example for Cloudera on Azure env with TF * Update AWS Terraform scripts * Update AWS and Azure config file * Update required variables in TF definitions * Update AWS terraform example Signed-off-by: Jim Enright <[email protected]>
1 parent b7358a4 commit b5eb8fa

33 files changed

+2720
-0
lines changed

public-cloud/aws/terraform/.gitignore

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Copyright 2023 Cloudera, Inc. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
ansible-navigator.log
16+
runs
17+
context
18+
19+
# Local .terraform directories
20+
**/.terraform/*
21+
22+
# .tfstate files
23+
*.tfstate
24+
*.tfstate.*
25+
26+
# .lock files
27+
*.terraform.lock.hcl
28+
29+
# .tfvars files
30+
*.tfvars

public-cloud/aws/terraform/README.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# CDP Public Cloud - Environment and Data Services Example
2+
3+
Constructs a CDP Public Cloud Environment, Datalake and specified Data Services on AWS.
4+
5+
> Uses the [cdp-tf-quickstarts](https://github.com/cloudera-labs/cdp-tf-quickstarts) Terraform module, called via Ansible, to generate the AWS infrastructure pre-requisite resources and the CDP environment and datalake. The [cloudera.cloud](https://github.com/cloudera-labs/cloudera.cloud) Ansible collection is used to deploy the data services.
6+
7+
> **NOTE:** This deployment example does not use a `definition.yml` based configuration file. Instead a standard Ansible extra vars configuration file is used.
8+
9+
## Requirements
10+
11+
To run, you need:
12+
13+
* Docker (or a Docker alterative)
14+
* AWS credentials (set via `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` env vars)
15+
* CDP credentials (set via `CDP_ACCESS_KEY_ID`, `CDP_PRIVATE_KEY` env vars)
16+
17+
## Set Up
18+
19+
First, set up your `ansible-navigator` aka `cdp-navigator` environment -- follow the instructions in the [NAVIGATOR document](https://github.com/cloudera-labs/cldr-runner/blob/main/NAVIGATOR.md) in `cloudera-labs/cldr-runner`.
20+
21+
Then, clone this project and change your working directory.
22+
23+
```bash
24+
git clone https://github.com/cloudera-labs/cloudera-deploy.git; cd cloudera-deploy/public-cloud/aws/terraform
25+
```
26+
27+
## Configure
28+
29+
Set the required environment variables:
30+
31+
```bash
32+
export AWS_ACCESS_KEY_ID=your-aws-access-key-id
33+
export AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key
34+
export AWS_SESSION_TOKEN=your-aws-session-token # (optional if using AWS SSO)
35+
export CDP_ACCESS_KEY_ID=your-cdp-access-key-id
36+
export CDP_PRIVATE_KEY=your-cdp-private-id
37+
```
38+
39+
Tweak the `config.yml` parameters to your liking. Notably, you should add and/or change the below parameters. The Data Services configurations (e.g. GPU for CML) can be edited in the config.yml file.
40+
41+
```yaml
42+
name_prefix: ex01 # Keep this short (4-7 characters)
43+
infra_region: us-east-2 # CSP region for infra
44+
45+
deployment_template: public # Specify the deployment pattern below. Options are public, semi-private or private
46+
47+
# Change data services to enable as required
48+
enable_cdf: False
49+
enable_cml: False
50+
enable_cdw: False
51+
enable_cde: False
52+
```
53+
54+
> [!NOTE]
55+
> You can override these parameters with any typical Ansible _extra variables_ flags, i.e. `-e name_prefix=ex03`. See the [cldr-runner FAQ](https://github.com/cloudera-labs/cldr-runner/blob/main/FAQ.md#how-do-i-add-extra-variables-and-tags-to-ansible-navigator) for details.
56+
57+
### SSH Keys
58+
59+
This definition will create a new SSH keypair on the host of the name `<name_prefix>-ssh-key.{pem,pub}`. This is stored in the `./tf-cdp-env` directory. A AWS Keypair will be created using the generated public key.
60+
61+
## Execute
62+
63+
Then set up the CDP Public Cloud by running the playbook:
64+
65+
```bash
66+
ansible-navigator run setup.yml -e @./config.yml
67+
```
68+
69+
> ⏱️ **Note:** The deployment can take up to **60 minutes**.
70+
71+
> ⚠️ **Note:** Since Terraform is used to deploy the Cloudera environment and datalake, caution is advised when cancelling a deployment mid-execution, as it may lead to corruption of the Terraform state file.
72+
73+
### Terraform resource files
74+
75+
The Terraform root module resource files run by the playbook is in the `./tf-cdp-env/` sub-directory.
76+
77+
Standard Terraform commands - e.g. `terraform output`, `terraform console`, can be run from within these directories.
78+
79+
## Tear Down
80+
81+
The cleanup is split into two separate playbooks - one to remove all Data Services and the second to remove the Cloudera Environment and infrastructure.
82+
83+
Tear down data services by running the following command:
84+
85+
```bash
86+
ansible-navigator run teardown-cdp-ds.yml -e @./config.yml
87+
```
88+
89+
Tear down the CDP environment and infrastructure by running the command below:
90+
91+
```bash
92+
ansible-navigator run teardown-cdp-env.yml -e @./config.yml
93+
```
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
# Copyright 2024 Cloudera, Inc. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
ansible-navigator:
17+
playbook-artifact:
18+
save-as: "runs/{playbook_name}-{time_stamp}.json"
19+
20+
ansible-runner:
21+
artifact-dir: runs
22+
rotate-artifacts-count: 3
23+
24+
logging:
25+
level: debug
26+
append: False
27+
28+
ansible:
29+
inventory:
30+
entries:
31+
- inventory.yml
32+
33+
execution-environment:
34+
enabled: True
35+
environment-variables:
36+
pass:
37+
- CDP_ACCESS_KEY_ID
38+
- CDP_PRIVATE_KEY
39+
- AWS_ACCESS_KEY_ID
40+
- AWS_SECRET_ACCESS_KEY
41+
- AWS_SESSION_TOKEN
42+
set:
43+
ANSIBLE_CALLBACK_WHITELIST: "ansible.posix.profile_tasks"
44+
ANSIBLE_GATHERING: "smart"
45+
ANSIBLE_DEPRECATION_WARNINGS: False
46+
ANSIBLE_HOST_KEY_CHECKING: False
47+
ANSIBLE_SSH_RETRIES: 10
48+
ANSIBLE_SSH_CONTROL_PATH: "/dev/shm/cp%%h-%%p-%%r"
49+
image: ghcr.io/cloudera-labs/cldr-runner-aws:latest
50+
pull:
51+
policy: missing
52+
container-options:
53+
- "--network=host"

public-cloud/aws/terraform/config.yml

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
# Copyright 2024 Cloudera, Inc. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Short prefix to append to all resources created
17+
name_prefix: "<ENTER_VALUE>" # str; keep short 4-7 characters
18+
19+
# AWS region for the CDP Public Cloud
20+
infra_region: "<ENTER_VALUE>" # str; e.g. (us-east-2)
21+
22+
# Tags to apply to resources, examples shown below
23+
tags:
24+
owner: "<ENTER_VALUE>"
25+
enddate: "<ENTER_VALUE>"
26+
27+
# CDP Environment Deployment type. Options are public, semi-private and private
28+
deployment_template: public
29+
30+
# Limit UI and SSH access to the caller/controller
31+
# Edit with list of CIDRs if access required from other endpoints
32+
allowed_cidrs: "{{ lookup('ansible.builtin.url', 'https://api.ipify.org', wantlist=True) | product(['32']) | map('join', '/') | list }}"
33+
34+
# Network connectivity
35+
ingress_extra_cidrs_and_ports:
36+
cidrs: "{{ allowed_cidrs }}"
37+
ports: [22, 443] # Cloud-only access
38+
39+
# Data Services to enable
40+
enable_cdf: False
41+
enable_cml: False
42+
enable_cdw: False
43+
enable_cde: False
44+
# NOTE: Data Services configurations (e.g. GPU for CML) can be edited below.
45+
46+
# Data Flow Configurations
47+
df:
48+
nodes_min: 3
49+
nodes_max: 20
50+
public_loadbalancer: yes
51+
loadbalancer_ip_ranges: "{{ ingress_extra_cidrs_and_ports.cidrs }}"
52+
k8s_ip_ranges: "{{ ingress_extra_cidrs_and_ports.cidrs }}"
53+
54+
# ML workspace definition
55+
ml:
56+
definitions:
57+
- name: "{{ name_prefix }}-default-ml"
58+
tls: yes
59+
governance: yes
60+
metrics: yes
61+
monitoring: yes
62+
private_cluster: no
63+
public_loadbalancer: yes
64+
instance_groups:
65+
- name: cpu_settings
66+
instanceCount: 1
67+
instanceType: "m5.2xlarge"
68+
instanceTier: "ON_DEMAND"
69+
autoscaling:
70+
minInstances: 1
71+
maxInstances: 3
72+
enabled: yes
73+
rootVolume:
74+
size: 300
75+
# Example GPU instance group shown below; Uncomment to use
76+
# - name: gpu_settings
77+
# autoscaling:
78+
# maxInstances: 1
79+
# minInstances: 0
80+
# instanceCount: 0
81+
# instanceTier: "ON_DEMAND"
82+
# instanceType: "p2.8xlarge" # AWS
83+
# rootVolume:
84+
# size: 300
85+
86+
# Data Warehouse Service
87+
dw:
88+
private_loadbalancer: no
89+
public_worker_node: yes
90+
overlay: yes
91+
92+
# Data Engineering Service
93+
de:
94+
definitions:
95+
- name: "{{ name_prefix }}-sandbox-default-de"
96+
instanceType: "m5.4xlarge"
97+
private_cluster: no
98+
public_loadbalancer: yes
99+
workload_analytics: yes
100+
k8s_ip_ranges: "{{ ingress_extra_cidrs_and_ports.cidrs }}"
101+
loadbalancer_ip_ranges: "{{ ingress_extra_cidrs_and_ports.cidrs }}"

0 commit comments

Comments
 (0)