-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add Dagster jobs to compute exchange rates (#29495)
- Loading branch information
1 parent
bd37883
commit 487488d
Showing
14 changed files
with
811 additions
and
125 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,99 @@ | ||
# Dagster | ||
# PostHog Dagster DAGs | ||
|
||
## Running locally | ||
This directory contains [Dagster](https://dagster.io/) data pipelines (DAGs) for PostHog. Dagster is a data orchestration framework that allows us to define, schedule, and monitor data workflows. | ||
|
||
You'll need to set DAGSTER_HOME | ||
## What is Dagster? | ||
|
||
Dagster is an open-source data orchestration tool designed to help you define and execute data pipelines. Key concepts include: | ||
|
||
- **Assets**: Data artifacts that your pipelines produce and consume (e.g., tables, files) | ||
- **Ops**: Individual units of computation (functions) | ||
- **Jobs**: Collections of ops that are executed together | ||
- **Resources**: Shared infrastructure and connections (e.g. database connections) | ||
- **Schedules**: Time-based triggers for jobs | ||
- **Sensors**: Event-based triggers for jobs | ||
|
||
## Project Structure | ||
|
||
- `definitions.py`: Main Dagster definition file that defines assets, jobs, schedules, sensors, and resources | ||
- `common.py`: Shared utilities and resources | ||
- Individual DAG files (e.g., `exchange_rate.py`, `deletes.py`, `person_overrides.py`) | ||
- `tests/`: Tests for the DAGs | ||
|
||
## Local Development | ||
|
||
### Environment Setup | ||
|
||
Dagster uses the `DAGSTER_HOME` environment variable to determine where to store instance configuration, logs, and other local artifacts. If not set, Dagster will use a temporary folder that's erased after you bring `dagster dev` down | ||
|
||
```bash | ||
# Set DAGSTER_HOME to a directory of your choice | ||
export DAGSTER_HOME=/path/to/your/dagster/home | ||
``` | ||
|
||
For consistency with the PostHog development environment, you might want to set this to a subdirectory within your project: | ||
|
||
```bash | ||
export DAGSTER_HOME=$(pwd)/.dagster_home | ||
``` | ||
|
||
You can add this to your shell profile if you want to always store your assets, or to your local `.env` file which will be automatically detected by `dagster dev`. | ||
|
||
### Running the Development Server | ||
|
||
To run the Dagster development server locally: | ||
|
||
```bash | ||
# Important: Set DEBUG=1 when running locally to use local resources | ||
DEBUG=1 dagster dev | ||
``` | ||
|
||
Setting `DEBUG=1` is critical to get it to run properly | ||
|
||
The Dagster UI will be available at http://localhost:3000 by default, where you can: | ||
|
||
- Browse assets, jobs, and schedules | ||
- Manually trigger job runs | ||
- View execution logs and status | ||
- Debug pipeline issues | ||
|
||
## Adding New DAGs | ||
|
||
When adding a new DAG: | ||
|
||
1. Create a new Python file for your DAG | ||
2. Define your assets, ops, and jobs | ||
3. Import and register them in `definitions.py` | ||
4. Add appropriate tests in the `tests/` directory | ||
|
||
## Running Tests | ||
|
||
Tests are implemented using pytest. The following command will run all DAG tests: | ||
|
||
Easiest is to just start jobs from your cli | ||
```bash | ||
dagster job execute -m dags.export_query_logs_to_s3 --config dags/query_log_example.yaml | ||
# From the project root | ||
pytest dags/ | ||
``` | ||
|
||
You can also run the interface | ||
To run a specific test file: | ||
|
||
```bash | ||
dagster dev | ||
pytest dags/tests/test_exchange_rate.py | ||
``` | ||
|
||
By default this will run on http://127.0.0.1:3000/ | ||
To run a specific test: | ||
|
||
```bash | ||
pytest dags/tests/test_exchange_rate.py::test_name | ||
``` | ||
|
||
Add `-v` for verbose output: | ||
|
||
```bash | ||
pytest -v dags/tests/test_exchange_rate.py | ||
``` | ||
|
||
## Additional Resources | ||
|
||
- [Dagster Documentation](https://docs.dagster.io/) | ||
- [PostHog Documentation](https://posthog.com/docs) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,24 @@ | ||
from dagster import ( | ||
Config, | ||
MaterializeResult, | ||
asset, | ||
) | ||
import dagster | ||
|
||
from posthog.clickhouse.client import sync_execute # noqa | ||
|
||
|
||
class ClickHouseConfig(Config): | ||
class ClickHouseConfig(dagster.Config): | ||
result_path: str = "/tmp/clickhouse_version.txt" | ||
|
||
|
||
@asset | ||
def get_clickhouse_version(config: ClickHouseConfig) -> MaterializeResult: | ||
@dagster.asset | ||
def get_clickhouse_version(config: ClickHouseConfig) -> dagster.MaterializeResult: | ||
version = sync_execute("SELECT version()")[0][0] | ||
with open(config.result_path, "w") as f: | ||
f.write(version) | ||
return MaterializeResult(metadata={"version": version}) | ||
|
||
return dagster.MaterializeResult(metadata={"version": version}) | ||
|
||
@asset(deps=[get_clickhouse_version]) | ||
|
||
@dagster.asset(deps=[get_clickhouse_version]) | ||
def print_clickhouse_version(config: ClickHouseConfig): | ||
with open(config.result_path) as f: | ||
print(f.read()) # noqa | ||
return MaterializeResult(metadata={"version": config.result_path}) | ||
|
||
return dagster.MaterializeResult(metadata={"version": config.result_path}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.