Skip to content

A PostGIS database of rain on snow events from AROSS stations, indexed by space and time.

License

Notifications You must be signed in to change notification settings

nsidc/aross-stations-db

Repository files navigation

aross-stations-db

Actions Status Documentation Status

PyPI version Conda-Forge PyPI platforms

GitHub Discussion

Reads Automated Surface Observation Station (ASOS) data from disk on the NSIDC archive to create a temporally and geospatially indexed database to quickly search events.

Note

TODO: Is this data available publicly and documented? How is it produced? Links!

Usage

To get started quickly, install Docker.

Important

Instructions that follow presume the current working directory is the root of this repository unless otherwise stated.

Dev quickstart

‼️ Don't worry about this unless you intend to change the code!

View the contributing docs for more details!

Set up the development compose configuration to be automatically loaded:

ln -s compose.dev.yml compose.override.dev.yml

Before starting the containers: dev environment setup

You will need local tooling like Nox and pre-commit to do development. Use whatever Python version management tool you prefer (Conda, VirtualEnv, PyEnv, ...) to create a virtual environment, then install this package and its dev dependencies:

pip install --editable ".[dev]"

[!IMPORTANT] Do this step before starting the stack in dev mode, or you may encounter an error (in which case, see the troubleshooting section for explanation!).

Debugging

You may wish to run the API process from an attached shell for interactive debugging. You can set up the relevant container to "sleep" in compose.dev.yml:

  api:
    <<: *dev-common
    entrypoint: "sleep"
    command: ["9999999"]
    # command: ["dev", "--host", "0.0.0.0", "./src/aross_stations_db/api"]

Then you can manually run the dev server interactively:

docker compose exec api fastapi dev --host 0.0.0.0 ./src/aross_stations_db/api

From here, you can interactively pause at any breakpoint() calls in the Python code.

Set envvars

Create a .env file or otherwise export the required envvars. If you use an .env file, it should look like this (feel free to change the password 😄):

POSTGRES_PASSWORD="supersecret"
AROSS_DB_CONNSTR="postgresql+psycopg://aross:${POSTGRES_PASSWORD}@db:5432/aross"
AROSS_DATA_BASEDIR="/path/to/aross-data-dir"

Important

$AROSS_DATA_BASEDIR should be Andy's data directory containing expected "metadata" and "events" subdirectories. TODO: Document how that data is created! How can the public access it?

Note

The connection string shown here is for connecting within the Docker network to a container with the hostname db.

Start the application stack

The stack is configured within compose.yml and includes containers:

  • aross-stations-db: A PostGIS database for quickly storing and accessing event records.
  • aross-stations-admin: An Adminer container for inspecting the database in the browser.
  • aross-stations-api: An HTTP API for accessing data in the database.
docker compose up --pull=always --detach

Important

If you've pulled the images before, you may need to fetch new ones! Bring down the running containers:

docker compose down --remove-orphans

...then run the "up" command again.

Inspect the database

You can use the included Adminer container for quick inspection. Navigate in your browser to http://localhost:80 and enter:

Field Value
System PostgreSQL
Server aross-stations-db
Username aross
Password Whatever you specified in the environment variable
Database aross

Note

At this point, the database is empty. We're just verifying we can connect. Continue to ingest next!

Run ingest

docker compose run ingest init  # Create empty tables (deleting any pre-existing ones)
docker compose run ingest load  # Load the tables from event files

From a fast disk, this should take under 2 minutes.

✨ Check out the data!

Now, you can use Adminer's SQL Query menu to select some data:

Example SQL query
select event.*
from event
join station on event.station_id = station.id
where
  ST_Within(
    station.location,
    ST_SetSRID(
      ST_GeomFromText('POLYGON ((-159.32130625160698 69.56469019745796, -159.32130625160698 68.08208920517862, -150.17196253090276 68.08208920517862, -150.17196253090276 69.56469019745796, -159.32130625160698 69.56469019745796))'),
      4326
    )
  )
  AND event.time_start > '2023-01-01'::date
  AND event.time_end < '2023-06-01'::date
;

Or you can check out the API docs in your browser at http://localhost:8000/docs or submit an HTTP query:

Example HTTP query
http://localhost:8000/v1/?start=2023-01-01&end=2023-06-01&polygon=POLYGON%20((-159.32130625160698%2069.56469019745796,%20-159.32130625160698%2068.08208920517862,%20-150.17196253090276%2068.08208920517862,%20-150.17196253090276%2069.56469019745796,%20-159.32130625160698%2069.56469019745796))

View logs

In this example, we view and follow logs for the api service:

docker compose logs --follow api

You can replace api with any other service name, or omit it to view logs for all services.

View UI

For now, it's just a Jupyterlab instance with a demo notebook. In your browser, navigate to http://localhost:8888. The password is the same as the database password you set earlier.

This UI will likely be replaced with something more robust. Who knows ;)

Shutdown

docker compose down

Cleanup

Database

Remove the _db/ directory to start over with a fresh database.

Containers and images
# Bring down containers, even if a service name has changed
docker compose down --remove-orphans
# Clean up all unused images aggressively
docker system prune -af

Troubleshooting

Permission denied errors on API startup

When this error occurs, the webserver still responds to queries, but hot-reloading doesn't work.

You may need to grant read access to the _data/ directory if you're running locally. The problem is that FastAPI's hot-reloading functionality in dev needs to watch the current directory for changes, and I don't know of a way to ignore this directory that is usually not readable. The directory is likely owned by root, assuming it was created automatically by Docker, so you may need to use sudo.

sudo chmod -R ugo+r _data

API fails to start in dev with No module named 'aross_stations_db._version'

Unfortunately, this project doesn't work perfectly with Docker for development yet. This is because our project configuration (pyproject.toml) is set up to dynamically generate version numbers from source control at build-time:

[tool.hatch]
version.source = "vcs"
build.hooks.vcs.version-file = "src/aross_stations_db/_version.py"

If you freshly clone this project and immediately start up the docker containers in dev mode, the dynamically-generated version module, _version.py, won't exist yet in the source directory (because it is git-ignored). The source directory will be mounted in to the docker container, overwriting the pre-built source directory in the image that does (well, it did until it was overwritten 😉) include _version.py.

It's very important to complete the initial setup step of creating a local environment and installing the package and it's development dependencies if you plan to be doing development. This will also give you Nox and pre-commit for automating development tasks.

About

A PostGIS database of rain on snow events from AROSS stations, indexed by space and time.

Resources

License

Stars

Watchers

Forks

Releases

No releases published