Using Airflow to implement our ETL pipelines.
We use uv to manage dependencies and virtual environment.
Below are the steps to create a virtual environment using uv:
-
Create a Virtual Environment with Dependencies Installed
To create a virtual environment, run the following command:
uv sync
By default, uv sets up the virtual environment in
.venv
-
Activate the Virtual Environment
After creating the virtual environment, activate it using the following command:
source .venv/bin/activate
-
Deactivate the Virtual Environment
When you're done working in the virtual environment, you can deactivate it with:
deactivate
-
For development or testing, run
cp .env.template .env.staging
. For production, runcp .env.template .env.production
. -
Follow the instructions in
.env.<staging|production>
and fill in your secrets. If you are running the staging instance for development as a sandbox and do not need to access any specific third-party services, leaving.env.staging
as-is should be fine.
Contact the maintainer if you don't have these secrets.
⚠ WARNING: About .env Please do not use the .env file for local development, as it might affect the production tables.
Set up the Authentication for GCP: https://googleapis.dev/python/google-api-core/latest/auth.html
- After running
gcloud auth application-default login
, you will get a credentials.json file located at$HOME/.config/gcloud/application_default_credentials.json
. Runexport GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"
if you have it. - service-account.json: Please contact @david30907d via email or Discord. You do not need this json file if you are running the sandbox staging instance for development.
If you are a developer 👨💻, please check the Contributing Guide.
If you are a maintainer 👨🔧, please check the Maintenance Guide.
For development/testing:
# Build the local dev/test image
make build-dev
# Start dev/test services
make deploy-dev
# Stop dev/test services
make down-dev
The difference between production and dev/test compose files is that the dev/test compose file uses a locally built image, while the production compose file uses the image from Docker Hub.
If you are an authorized maintainer, you can pull the image from the GCP Artifact Registry.
Docker client must be configured to use the GCP Artifact Registry.
gcloud auth configure-docker asia-east1-docker.pkg.dev
Then, pull the image:
docker pull asia-east1-docker.pkg.dev/pycontw-225217/data-team/pycon-etl:{tag}
There are several tags available:
cache
: cache the image for faster deploymenttest
: for testing purposes, including the test dependenciesstaging
: when pushing to the staging environmentlatest
: when pushing to the production environment
Please check the Production Deployment Guide.