sugar for s3
https://gallantlab.github.io/cottoncandy
A python scientific library for storing and accessing numpy array data on S3. This is achieved by reading arrays from memory and downloading arrays directly into memory. This means that you don't have to download your array to disk, and then load it from disk into your python session.
This library relies heavily on boto3
Clone the repo from GitHub and do the usual python install from the command line
$ git clone https://github.com/gallantlab/cottoncandy.git
$ cd cottoncandy
$ sudo python setup.py install
A configuration file will be saved under ~/.config/cottoncandy/options.cfg
. Upon installation cottoncandy will try to find your AWS keys and store them in this file. See the default file for more configuration options.
Object and bucket permissions are set to authenticated-read
by default. If you wish to keep all your objects private, modify the configuration file and set default_acl = private
. See AWS ACL overview for more information on S3 permissions.
Setup the connection (endpoint, access and secret keys can be specified in the configuration file instead)::
>>> import cottoncandy as cc
>>> cci = cc.get_interface('my_bucket',
ACCESS_KEY='FAKEACCESSKEYTEXT',
SECRET_KEY='FAKESECRETKEYTEXT',
endpoint_url='https://s3.amazonaws.com')
>>> import numpy as np
>>> arr = np.random.randn(100)
>>> s3_response = cci.upload_raw_array('myarray', arr)
>>> arr_down = cci.download_raw_array('myarray')
>>> assert np.allclose(arr, arr_down)
>>> arr = np.random.randn(100,600,1000)
>>> s3_response = cci.upload_dask_array('test_dim', arr, axis=-1)
>>> dask_object = cci.download_dask_array('test_dim')
>>> dask_object
dask.array<array, shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> dask_slice = dask_object[..., :200]
>>> dask_slice
dask.array<getitem..., shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> downloaded_data = np.asarray(dask_slice) # this downloads the array
>>> downloaded_data.shape
(100, 600, 200)
>>> cci.glob('/path/to/*/file01*.grp/image_data')
['/path/to/my/file01a.grp/image_data',
'/path/to/my/file01b.grp/image_data',
'/path/to/your/file01a.grp/image_data',
'/path/to/your/file01b.grp/image_data']
>>> cci.glob('/path/to/my/file02*.grp/*')
['/path/to/my/file02a.grp/image_data',
'/path/to/my/file02a.grp/text_data',
'/path/to/my/file02b.grp/image_data',
'/path/to/my/file02b.grp/text_data']
>>> import cottoncandy as cc
>>> browser = cc.get_browser('my_bucket_name',
ACCESS_KEY='FAKEACCESSKEYTEXT',
SECRET_KEY='FAKESECRETKEYTEXT',
endpoint_url='https://s3.amazonaws.com')
>>> browser.sweet_project.sub<TAB>
browser.sweet_project.sub01_awesome_analysis_DOT_grp
browser.sweet_project.sub02_awesome_analysis_DOT_grp
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp
<cottoncandy-group <bucket:my_bucket_name> (sub01_awesome_analysis.grp: 3 keys)>
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp.result_model01
<cottoncandy-dataset <bucket:my_bucket_name [1.00MB:shape=(10000)]>
cottoncandy
can also use Google Drive as a back-end. This equires a client_secrets.json
file in your ~/.config/cottoncandy
folder and the pydrive package.
See the Google Drive setup instructions for more details.
>>> import cottoncandy as cc
>>> cci = cc.get_interface(backend='gdrive')
cottoncandy
provides a transparent encryption interface for AWS S3 and Google Drive. This requires the pycrypto
package. This is HIGHLY EXPERIMENTAL.
>>> import cottoncandy as cc
>>> cci = cc.get_encrypted_interface('my_bucket_name',
ACCESS_KEY='FAKEACCESSKEYTEXT',
SECRET_KEY='FAKESECRETKEYTEXT',
endpoint_url='https://s3.amazonaws.com')
Nunez-Elizalde AO, Zhang T, Huth AG, Gao JS, Slivkoff, Lescroart MD, Deniz F, McNeil C, Gibboni R, Popham SF, Rokem A, Oliver MD and Gallant JL. cottoncandy: scientific python package for easy cloud storage. Zenodo. 2017. http://doi.org/10.5281/zenodo.1034342