Skip to content

Amazon Simple Storage Service (S3) resource plugin for iRODS

License

Notifications You must be signed in to change notification settings

heliumdatacommons/irods_resource_plugin_s3

 
 

Repository files navigation

iRODS S3 Resource Plugin
------------------------

To build the S3 Resource Plugin, you will need to have:

 - the iRODS Development Tools (irods-dev and irods-runtime) installed for your platform
     http://irods.org/download

 - libxml2-dev / libxml2-devel

 - libcurl4-gnutls-dev / curl-devel

 - libs3 installed for your platform
     https://github.com/irods/libs3


To use this resource plugin:

  irods@hostname $ iadmin mkresc compResc compound
  irods@hostname $ iadmin mkresc cacheResc unixfilesystem <hostname>:</full/path/to/Vault>
  irods@hostname $ iadmin mkresc archiveResc s3 <hostname>:/<s3BucketName>/irods/Vault "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=</full/path/to/AWS.keypair>;S3_RETRY_COUNT=<num reconn tries>;S3_WAIT_TIME_SEC=<wait between retries>;S3_PROTO=<HTTP|HTTPS>"
  irods@hostname $ iadmin addchildtoresc compResc cacheResc cache
  irods@hostname $ iadmin addchildtoresc compResc archiveResc archive
  irods@hostname $ iput -R compResc foo.txt

The AWS/S3 keypair file should have two values (Access Key ID and Secret Access Key):

  AKDJFH4KJHFCIOBJ5SLK
  rlgjolivb7293r928vu98n498ur92jfgsdkjfh8e

You may specify more than one host:IP as the S3_DEFAULT_HOSTNAME by listing them with a comma (,) between them:
ex:  S3_DEFAULT_HOSTNAME=192.168.122.128:443,192.168.122.129:443,192.168.122.130:443

To control multipart uploads, add the resource variables "S3_MPU_CHUNK" and "S3_MPU_THREADS" to the creation line.
* S3_MPU_CHUNK is the size of each part to be uploaded in parallel (in MB, default is 5MB).  Objects smaller than this will be uploaded with standard PUTs.
* S3_MPU_THREADS is the number of parts to upload in parallel (only under Linux, default is 10).  On non-Linux OSes, this parameter is ignored and multipart uploads are performed sequentially.

To ensure end-to-end data integrity, MD5 checksums can be calculated and used for S3 uploads.  Note that this requires 2x the disk IO (because the file must first be read to calculate the MD5 before the S3 upload can start) and a corresponding increase in CPU usage
S3_ENABLE_MD5=[0|1]  (default is 0, off)

S3 server side encryption can be enabled using the parameter S3_SERVER_ENCRYPT=[0|1] (default=0=off).  This is not the same as HTTPS, and implies that the data will be stored on disk encrypted.  To encrypt during the network transport to S3, please use S3_PROTO=HTTPS (the default)

Using this plugin with the AWS assume role (ARN) functionality (AWS only)
--------------------------------------------

AWS allows a resource owner to provide access to objects in a bucket by mapping account numbers to each other using STS acces policies.  Once these policies are in place one user can access objects from another account. For a specific example of this see https://docs.google.com/document/d/1z_WZFmc2mqPqF5hAadB1a3Nhw4dn9-c47XxIjR5-SDw.  The iRODS s3 driver has been enhanced to take advantage of this functionality. To use this functionality, follow these steps:

* If you are using Ubuntu 16, you can simply install the binaries from GitHub. The steps for this are:

* git clone https://github.com/heliumdatacommons/irods_resource_plugin_s3.git
* cd download dir/dist/xenial
* as root, copy the 2 .so files to /usr/local/lib: sudo cp *.so /usr/local/lib
* as root, install the driver:  sudo dpkg -i irods-resource-plugin-s3_2.3.0~xenial_amd64.deb
* Continue to the configuration section

* Download or build the AWS SDK. The package should be part of the irods externals repository. If not, see below.
* Download the S3 driver from the helium repo: git clone https://github.com/heliumdatacommons/irods_resource_plugin_s3.git. You'll also need all of the iRODS externals and the development package.
* Build the package: cd download dir; mkdir build && cd build; /opt/irods-externals/clang3.8-0/bin/clang/cmake -DCMAKE_C_COMPILER=/opt/irods-externals/clang3.8-0/bin/clang -DCMAKE_CXX_COMPILER=/opt/irods-externals/clang3.8-0/bin/clang++ -DCMAKE_CXX_FLAGS=-std=c++14 -DUSING_AWS_SDK_CPP:boolean=true ..; make package
* install the package using rpm or dpkg as appropriate

Configuration
--------------------------------------------
* Configure a compound resource to access the target bucket.  Here's an example of how to configure the archive resource

iadmin mkresc AWSCloudArchive-ARN-test s3 ip-172-31-43-128.ec2.internal:nih-nhgri-datacommons "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/.ssh/irods-s3-resource-1.keypair;S3_RETRY_COUNT=10;S3_WAIT_TIME_SEC=10;S3_PROTO=HTTPS;S3_ENABLE_MPU=1;S3_ENABLE_ROLE_ASSUMPTION=1;S3_ASSUME_ROLE_FILE=/etc/irods/irods-s3-arn.config"

Note the new flags: S3_ENABLE_ROLE_ASSUMPTION=1 tells the driver that this resource will use the ARN functionality. The 3_ASSUME_ROLE_FILE=/etc/irods/irods-s3-arn.config tells the driver where to find the ARN specific configuration for the resource. This configuration file has two lines: the first is the ARN info and the second is the max duration of new session in seconds. Here's an example file:

arn:aws:iam::xxxxxxxxxxxx:role/new_role 
900

where the string of x is replaced by the new role id and the string new_role is replaced by the role to be assumed.

The archive and compound resource configurations are unchanged.

Building the AWS SDK
--------------------------------------------
Download the SDK from Github with git clone https://github.com/aws/aws-sdk-cpp.git,  You'll also need all of the iRODS externals and the development package.

Add these lines to the CMakeLists.txt file
#RENCI
include_directories(BEFORE SYSTEM /opt/irods-externals/clang3.8-0/include/c++/v1)
set(CMAKE_SHARED_LINKER_FLAGS_INIT "${CMAKE_SHARED_LINKER_FLAGS} -stdlib=libc++")
link_directories(/opt/irods-externals/clang-runtime3.8-0/lib)
link_libraries(-lc++)

Build the SDK: cd download dir; mkdir build && cd build; /opt/irods-externals/clang3.8-0/bin/clang/cmake -DCMAKE_C_COMPILER=/opt/irods-externals/clang3.8-0/bin/clang -DCMAKE_CXX_COMPILER=/opt/irods-externals/clang3.8-0/bin/clang++ -DCMAKE_CXX_FLAGS=-std=c++14  ..

This is a big build, so we just want to make and install the pieces we need. So cd to the aws-cpp-sdk-core directory and type make. Then cd ../aws-cpp-sdk-sts and type make.  Install libaws-cpp-sdk-sts.so and libaws-cpp-sdk-core.so in /usr/local/lib (Ubuntu) or /usr/local/lib64 (Centos).

Using this plugin with Google Cloud services
--------------------------------------------

This plugin has been certified to work with google cloud storage. This works becuase Google has implmented the s3 protocol.  There are several differences:

* Google does not seem to support multpart uploads.  So it is neccessary to disable this feature by adding the S3_ENABLE_MPU=0 flag to the context string.
* The default hostname in the context string should be set to storage.googleapis.com
* The signature version should be set to s3v4
* The values in the key file have to be generated according to the instructions at this link: https://cloud.google.com/storage/docs/migrating#keys

About

Amazon Simple Storage Service (S3) resource plugin for iRODS

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 64.6%
  • Python 30.9%
  • CMake 4.5%