Skip to content
This repository was archived by the owner on Oct 2, 2023. It is now read-only.

NYCPlanning/db-data-library

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

NOTE: this repo has been archived. This code now lives in NYCPlanning Data Engineering's primary repo

db-data-library

Docker Image Size (latest by date)

Archive datasets to S3 via CLI

library archive --help

library archive --name dcp_boroboundaries --version 22c --output-format csv

library archive --name dcp_commercialoverlay --s3 --latest

library archive --name sca_e_pct --version 20230425 --output-format postgres --postgres-url $RECIPE_ENGINE

Usage

Because gdal dependencies are difficult to install, we recommend using the library CLI commands via our docker image nycplanning/library:ubuntu-latest

Method A: Run a single command

If you have environmental variables stored in a .env file:

docker run --rm --env-file .env \
    nycplanning/library:ubuntu-latest < library ... >

Otherwise, use docker run with explicit environmental variables:

docker run --rm\
    -e AWS_S3_ENDPOINT=< endpoint >
    -e AWS_SECRET_ACCESS_KEY=< access secret ket >
    -e AWS_ACCESS_KEY_ID=< access key id >
    -e AWS_S3_BUCKET=< bucket name >
    nycplanning/library:ubuntu-latest < library ... >

Where the library ... command can be any of the library commands (e.g. library archive --name dcp_commercialoverlay -s -c)

Method B: Use a dev container in VS Code

  1. Open the repo in a Remote Window in VS Code (either when prompted or via the green icon at the bottom left)

  2. Start a poetry shell via poetry shell

  3. Install python packages via poetry install

  4. Run library commands

Method C: Use github actions

๐Ÿ’ก Note: This method will always push to S3

  1. Navigate to the Actions section of the repo

  2. Select Update a Single Dataset

  3. Within Run workflow, enter the relevant inputs and click Run workflow

Dev Instructions

  1. Make sure you have GDAL installed (we are using version 3.2.1+dfsg-1+b1)
sudo apt install -y gdal-bin libgdal-dev python3-gdal
  1. then install poetry
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3 -
  1. Use poetry to install dependencies poetry install
  2. Install pre-commit poetry run pre-commit install
  3. Check out what's available via the cli poetry run library --help
  4. To add/update documentation, run poetry run pdoc -o docs --html library

Testing

To test all functions within a script: poetry run pytest tests/{test script}.py -s

To test a specific function: poetry run pytest tests/{test script}.py::{test name} -s

note the -s flag is optional, it allows print output (via stdout) to be included in the test output, otherwise it is ignored

Releases

No releases published

Packages

No packages published

Contributors 13

Languages