More detailed instructions for other components : https://github.com/stanislawbartkowski/hdpwiredencryption/wiki
Wired Encryption masks all data which moves into the cluster, inside and out of the HDP cluster. In addition to authorization and authentication, it is another layer of security. Traffic is encrypted not only while dealing with the external world but also internally. Unfortunately, it comes with a price, usually, there is a performance penalty around 10-15% because all data in traffic is to be encrypted and decrypted.
Wired Encryption is adding not only the next layer of security but also another layer of complexity. Unlike Kerberos, it is not automated and manual changes are necessary. It could be a painstaking process for the first time. The HDP documentation provides all necessary information but it is very general and the HDP administrator can be confused trying to extract practical steps to implement encryption.
In this article, I'm going to alleviate this pain and confusion and provide some practical steps and tools how to deal with that.
The HDP cluster should be installed and healthy. Java and keytool is required.
The Wired Encryption does not interact with Kerberos, it does not matter if the cluster is Kerberized or not.
Encryption required certificates which can be signed by Certificate Authority or self-signed. CA-signed certficates are recommended, self-signed certificates make data be encrypted but do not guarantee a full confidentiality.
The encryption method desribed in this article is enabled for the following services: WebHDFS, MapReduce/TEZ and Yarn.
Passwordless root ssh connection is expected to be enabled between the host where the tool is installed and all other hosts in the cluster.
The tool ships the scripts to the cluster nodes and executes them. The scripts when the task is completed. Scripts source.rc and custom.rc contains the key and truststore passwords in plain text and it makes the potential security risk. The scripts are transported to root/re directory on all hosts.
The tool comprised several simple and self-explaining bash scripts. The scripts generate and distribute across the cluster self-signed certificates.
Script | Description |
---|---|
allkeys.sh | Collects and prepares and distributes the keystore connecting all keys |
custom.rc | Custom rc allowing overwriting defaults in source.rc |
genkey.sh | Creates self-signed certificate, keystore and truststore |
finalize.sh | Completes the procedure, applies all necessary ownerships and permissions to keystores and keys |
hosts.txt | List of hostnames of the cluster. The tool is using this list to distribute the certificates |
run.sh | The launcher script |
source.rc | Settings of common environment variables. The setting can be customized in custom.rc |
The variable description in source.rc. The defaults for key and truststore locations reflect defaults in Ambari configuration panel.
Variable | Description | Default | Customize |
---|---|---|---|
SERVER_KEY_LOCATION | Directory to keep server keys | /etc/security/serverKeys | No |
CLIENT_KEY_LOCATION | Directory to keep client keys | /etc/security/clientKeys | No:q! |
HOST_NAME | Host name, assigned automatically | $HOSTNAME | No |
KEYSTORE_FILE | Server keystore file | $SERVER_KEY_LOCATION/keystore.jks | No |
TRUSTSTORE_FILE | Server truststore file | $SERVER_KEY_LOCATION/truststore.jks | No |
CERTIFICATE_NAME | Server certificate file | $SERVER_KEY_LOCATION/$HOST_NAME.cert | No |
ALL_JKS | Client keys | $CLIENT_KEY_LOCATION/allkeys.jks | No |
YARN_USER | Yarn user | yarn | No |
KEYS | Temporary directory | keys | No |
SERVER_STOREPASS_PASSWORD | Server keystore password | $SERVER_KEYPASS_PASSWORD | Yes |
SERVER_TRUSTSTORE_PASSWORD | Server truststore password | $SERVER_KEYPASS_PASSWORD | Yes |
ORGDATA | Organization name | "OU=hw,O=hw,L=paloalto,ST=ca,C=us" | Yes |
Copy files from templates directory and modify.
- hosts.txt : contains the list of all hostnames in the cluster. A passwordless ssh root connection should be configured.
- source.rc : contains some common names and location. Variables in source.rc can be overwritten in custom.rc
- custom.rc : modify the value of the following variables: SERVER_KEYPASS_PASSWORD, SERVER_STOREPASS_PASSWORD, SERVER_TRUSTSTORE_PASSWORD,CLIENT_ALLKEYS_PASSWORD and ORGDATA
- Create self-signed certificates
- Distribute and install certificates in SSL KeyStore
- Create and distribute a client truststore containing public certificates for all hosts
- Configure services for encryption
./run.sh 0
The tool generates a self-signed certificate for every host and creates server keystore and truststore. Important: the tool wipes out all previous content of /etc/security/clientKeys and serverKeys without warning.
After that, on all hosts, the following directory structure should be created.
- /etc/security/clientKeys : empty directory
- /etc/security/serverKeys
- keystore.jks
- <hostname>.cert
- truststore.jks
Verify
keytool -list -v -keystore /etc/security/serverKeys/keystore.jks
Make sure that organization name reflects the customized name found in custom.rc and CN is equal to the full hostname.
- Create self-signed certificates
- Creates and collects all CSR (Certficate Signing Requests)
- Manual step: send CSRs to Certtificate Authority to have them signed
- Next steps are automated assuming that signed certificates follow prescribed structure
- Distribute and install CA-signed certificates in SSL keystores
- Create and distribute a client truststore containing public certificates for all hosts
- Configure services for encryption
./run 3
Self-signed certificates are created on every node, for every node a CSR is generated and all CSRs are collected in csrs directory.
ll csrs/
razem 16
-rw-r--r-- 1 root root 743 10-13 09:43 bushily1.fyre.ibm.com.csr
-rw-r--r-- 1 root root 743 10-13 09:43 bushily2.fyre.ibm.com.csr
-rw-r--r-- 1 root root 743 10-13 09:43 bushily3.fyre.ibm.com.csr
-rw-r--r-- 1 root root 737 10-13 09:43 exile1.fyre.ibm.com.csr
Pick up all CSR files from csrs directory and send them for signing.
This step can be done by the tool assuming that CA-signed certificates match the below format. Otherwise, preparing a signed keystores and distribute them should be conducted manually.
All CA-signed certificates should be collected in certs directory. Certificate for every node including the certifcate chain should be stored in PEM format. The certificate file name is expected to follow the format: <host name>.cert.node.
Example, the list of all hosts in the cluster.
exile1.fyre.ibm.com
bushily1.fyre.ibm.com
bushily2.fyre.ibm.com
bushily3.fyre.ibm.com
The corresponding certs directory.
ll certs/
bushily1.fyre.ibm.com.cert.pem
bushily2.fyre.ibm.com.cert.pem
bushily3.fyre.ibm.com.cert.pem
exile1.fyre.ibm.com.cert.pem
./run.sh 4
Signed certificates are imported into appropriate keystores and server truststore is created containig the CA-signed certifcate only.
ls /etc/security/serverKeys/ -ltr
exile1.fyre.ibm.com.csr
exile1.fyre.ibm.com.cert.pem
keystore.jks
exile1.fyre.ibm.com.cert
truststore.jks
This step is common for self-signed certificates and CA-signed.
./run.sh 1
The tool creates a client trustore containing the public certificates from all hosts. The trustore is then shipped to all hosts and saved in /etc/security/clientKeys/allkeys.jks file.
Verify the content of the trustore
keytool -list -v -keystore /etc/security/clientKeys/allkeys.jks
The number of entries should be equal to the number of hosts found in hosts.txt file
This step is common for self-signed certificates and CA-signed.
./run.sh 2
In this step, the tool applies proper ownerships and permissions for keystores and truststores. All files in /etc/security/serverKeys should be visible only for users belonging to hadoop group and closed for all other users. File /etc/security/clientKeys/allkeys.jks should be visible by all.
The next step is to enable SSL for basic Hadoop services: WebHDFS, MapReduce2, TEZ and Yarn. After applying the settings, the cluster should be restarted and put the changed into force.
Parameter | Add/modify | Value |
---|---|---|
ssl.server.truststore.location | Modify | /etc/security/serverKeys/truststore.jks |
ssl.server.truststore.password | Modify | $SERVER_TRUSTSTORE_PASSWORD |
ssl.server.truststore.type | Accept default | jks |
sl.server.keystore.location | Accept default | /etc/security/serverKeys/keystore.jks |
ssl.server.keystore.password | Modify | $SERVER_KEYPASS_PASSWORD |
ssl.server.keystore.type | Accept default | jks |
ssl.server.keystore.keypassword | Modify | $SERVER_KEYPASS_PASSWORD |
Parameter | Add/modify | Value |
---|---|---|
ssl.client.truststore.location | Modify | /etc/security/clientKeys/allkeys.jks |
ssl.client.truststore.password | Modify | CLIENT_ALLKEYS_PASSWORD |
ssl.client.truststore.type | Accept default | jks |
Parameter | Add/modify | Value |
---|---|---|
hadoop.rpc.protection | Add | privacy (remove authentication) |
Parameter | Add/modify | Value |
---|---|---|
dfs.encrypt.data.transfer | Add new | true |
dfs.encrypt.data.transfer.algorithm | Add new | 3des |
Parameter | Add/modify | Value |
---|---|---|
dfs.http.policy | Modify | HTTPS_ONLY |
dfs.datanode.https.address | Accept default | 0.0.0.0:50475 |
dfs.namenode.https-address | Accept default or Add | <hostname>:50470 or 0.0.0.0:50470 (listening on all networks) |
dfs.namenode.secondary.https-address | Add only if HA not activated | <secondary namenode hostname>:50091 |
Parameter | Add/modify | Value |
---|---|---|
yarn.http.policy | Modify | HTTPS_ONLY |
yarn.log.server.url | Modify | Change to HTTPS URL : https://<host>:19889/jobhistory/logs |
yarn.resourcemanager.webapp.https.address | Accept default | <host name>:8090 |
yarn.nodemanager.webapp.https.address | Accept default | 0.0.0.0:8042 |
yarn.log.server.web-service.url | Modify | Change to HTTPS URL : https://<host name>:8190/ws/v1/applicationhistory |
Parameter | Add/modify | Value |
---|---|---|
mapreduce.jobhistory.http.policy | Modify | HTTPS_ONLY |
Parameter | Add/modify | Value |
---|---|---|
mapreduce.jobhistory.webapp.https.address | Add | <JHS>:<JHS_HTTPS_PORT> (Yarn history server and secure port) |
mapreduce.ssl.enabled | Add | true |
mapreduce.shuffle.ssl.enabled | Add | true |
Parameter | Add/modify | Value |
---|---|---|
tez.runtime.shuffle.ssl.enable | Add | true |
tez.runtime.shuffle.keep-alive.enabled | Add | true |
After setting a wired encryption, Ambari Metrcis component is up and running but metrics are dead. To enable them, the certificate of the host where HDFS NameNode is installed should be imported into Ambari truststore.
The certificate can be downloaded from /etc/security/serverKey/<hostname>.cert or copied and pasted from output.
openssl s_client -connect <namenode host>:50470
The next step is to import the certificate into Ambari truststore
ambari-server setup-security
Using python /usr/bin/python
Security setup options...
===========================================================================
Choose one of the following options:
[1] Enable HTTPS for Ambari server.
[2] Encrypt passwords stored in ambari.properties file.
[3] Setup Ambari kerberos JAAS configuration.
[4] Setup truststore.
[5] Import certificate to truststore.
===========================================================================
Enter choice, (1-5): 5
Use 4) if the truststore is not created yet or directly 5) otherwise.
After fixing the truststore, restart Ambari server.
Run a health-check for all services.
Launch HDFS, Yarn, and MapReduce2 UIs. Pay attention that browser is using secure, https, connection.
Review again HDFS parameters (here NameNode hostname is mdp1.sb.com)
- dfs.namenode.http-address mdp1.sb.com:50070
- dfs.namenode.https-address mdp1.sb.com:50470
The expected result is that non-secure connection on port 50070 is closed and WebHDFS is enabled for secure 50470 port.
nc -zv mdp1.sb.com 50070
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection refused
nc -zv mdp1.sb.com 50470
Ncat: Connected to 192.168.122.129:50470.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
Test WebHDFS on secure connection.
Kerberos authentication enabled.
curl -i -k --negotiate -u : -X GET https://mdp1.sb.com:50470/webhdfs/v1/?op=LISTSTATUS
Kerberos authentication disabled.
curl -i -k -X GET https://mdp1.sb.com:50470/webhdfs/v1/?op=LISTSTATUS
HTTP/1.1 401 Authentication required
..........
.........
{"FileStatuses":{"FileStatus":[
{"accessTime":0,"blockSize":0,"childrenNum":3,"fileId":16392,"group":"hadoop","length":0,"modificationTime":1553698288837,"owner":"yarn","pathSuffix":"app-logs","permission":"1777","replication":0,"storagePolicy":0,"type":"DIRECTORY"},
....................
er":"hdfs","pathSuffix":"warehouse","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}
]}}