Utils to work with our data lake.
python3 -m pip install git+https://github.com/noverde/novlake#egg=novlake
-
Add config of user
test
tonovlake-settings.yaml
-
Create database
user_test
in Athena.
CREATE DATABASE user_test;
- Run pytest
pytest
Create .env
file in home directory with the following instruction:
export NOVLAKE_SETTINGS=s3://<BUCKET_NAME>/novlake-settings.yaml
from novlake.lake import Lake
lake = Lake("camila")
lake.query("SELECT * FROM dumps.loans LIMIT 10")
novlake-settings.yaml shall use the following schema:
documentation_home: ""
users:
default:
notebook_path: s3://sample-notebooks/default/
athena_schema_name: user_default
s3_repo: s3://sample-repo/user_default/
athena_output: s3://aws-athena-query-results-sample/novlake/user_default/
test:
notebook_path: s3://novlake-test-data/notebooks/user_test/
athena_schema_name: user_test
s3_repo: s3://novlake-test-data/user_test/
athena_output: s3://novlake-test-data/athena_output/