Welcome to cottoncandy!

sugar for s3

https://gallantlab.github.io/cottoncandy

What is cottoncandy?

A python scientific library for storing and accessing numpy array data on S3. This is achieved by reading arrays from memory and downloading arrays directly into memory. This means that you don't have to download your array to disk, and then load it from disk into your python session.

This library relies heavily on boto3

Installation

Clone the repo from GitHub and do the usual python install from the command line

$ git clone https://github.com/gallantlab/cottoncandy.git
$ cd cottoncandy
$ sudo python setup.py install

A configuration file will be saved under ~/.config/cottoncandy/options.cfg. Upon installation cottoncandy will try to find your AWS keys and store them in this file. See the default file for more configuration options.

Object and bucket permissions are set to authenticated-read by default. If you wish to keep all your objects private, modify the configuration file and set default_acl = private. See AWS ACL overview for more information on S3 permissions.

Getting started

Setup the connection (endpoint, access and secret keys can be specified in the configuration file instead)::

>>> import cottoncandy as cc
>>> cci = cc.get_interface('my_bucket',
                           ACCESS_KEY='FAKEACCESSKEYTEXT',
                           SECRET_KEY='FAKESECRETKEYTEXT',
                           endpoint_url='https://s3.amazonaws.com')

Storing numpy arrays

>>> import numpy as np
>>> arr = np.random.randn(100)
>>> s3_response = cci.upload_raw_array('myarray', arr)
>>> arr_down = cci.download_raw_array('myarray')
>>> assert np.allclose(arr, arr_down)

Storing dask arrays

>>> arr = np.random.randn(100,600,1000)
>>> s3_response = cci.upload_dask_array('test_dim', arr, axis=-1)
>>> dask_object = cci.download_dask_array('test_dim')
>>> dask_object
dask.array<array, shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> dask_slice = dask_object[..., :200]
>>> dask_slice
dask.array<getitem..., shape=(100, 600, 1000), dtype=float64, chunksize=(100, 600, 100)>
>>> downloaded_data = np.asarray(dask_slice) # this downloads the array
>>> downloaded_data.shape
(100, 600, 200)

Command-line search

>>> cci.glob('/path/to/*/file01*.grp/image_data')
['/path/to/my/file01a.grp/image_data',
 '/path/to/my/file01b.grp/image_data',
 '/path/to/your/file01a.grp/image_data',
 '/path/to/your/file01b.grp/image_data']
>>> cci.glob('/path/to/my/file02*.grp/*')
['/path/to/my/file02a.grp/image_data',
 '/path/to/my/file02a.grp/text_data',
 '/path/to/my/file02b.grp/image_data',
 '/path/to/my/file02b.grp/text_data']

File system-like object browsing

>>> import cottoncandy as cc
>>> browser = cc.get_browser('my_bucket_name',
                             ACCESS_KEY='FAKEACCESSKEYTEXT',
                             SECRET_KEY='FAKESECRETKEYTEXT',
                             endpoint_url='https://s3.amazonaws.com')
>>> browser.sweet_project.sub<TAB>
browser.sweet_project.sub01_awesome_analysis_DOT_grp
browser.sweet_project.sub02_awesome_analysis_DOT_grp
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp
<cottoncandy-group <bucket:my_bucket_name> (sub01_awesome_analysis.grp: 3 keys)>
>>> browser.sweet_project.sub01_awesome_analysis_DOT_grp.result_model01
<cottoncandy-dataset <bucket:my_bucket_name [1.00MB:shape=(10000)]>

Google Drive backend (experimental)

cottoncandy can also use Google Drive as a back-end. This equires a client_secrets.json file in your ~/.config/cottoncandy folder and the pydrive package.

See the Google Drive setup instructions for more details.

>>> import cottoncandy as cc
>>> cci = cc.get_interface(backend='gdrive')

Encryption (highly experimental)

cottoncandyprovides a transparent encryption interface for AWS S3 and Google Drive. This requires the pycrypto package. This is HIGHLY EXPERIMENTAL.

>>> import cottoncandy as cc
>>> cci = cc.get_encrypted_interface('my_bucket_name',
                                      ACCESS_KEY='FAKEACCESSKEYTEXT',
                                      SECRET_KEY='FAKESECRETKEYTEXT',
                                      endpoint_url='https://s3.amazonaws.com')

Cite as

Nunez-Elizalde AO, Zhang T, Huth AG, Gao JS, Slivkoff, Lescroart MD, Deniz F, McNeil C, Gibboni R, Popham SF, Rokem A, Oliver MD and Gallant JL. cottoncandy: scientific python package for easy cloud storage. Zenodo. 2017. http://doi.org/10.5281/zenodo.1034342

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
cottoncandy		cottoncandy
docs		docs
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
google_drive_setup_instructions.md		google_drive_setup_instructions.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to cottoncandy!

What is cottoncandy?

Installation

Getting started

Storing numpy arrays

Storing dask arrays

Command-line search

File system-like object browsing

Google Drive backend (experimental)

Encryption (highly experimental)

Cite as

About

Releases

Packages

Languages

License

shaileesjain/cottoncandy

Folders and files

Latest commit

History

Repository files navigation

Welcome to cottoncandy!

What is cottoncandy?

Installation

Getting started

Storing numpy arrays

Storing dask arrays

Command-line search

File system-like object browsing

Google Drive backend (experimental)

Encryption (highly experimental)

Cite as

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages