Exploring the synergy between Dagster, a modern data orchestrator, and R, a powerful statistical programming language. This project showcases how business logic written in R can be integrated seamlessly within the Dagster framework.
- Docker Integration: Execute R code in isolated environments using Docker container ops.
- Dagster Pipes: Run R scripts within a subprocess, leveraging Dagster's experimental Pipes feature.
- Reticulate Bridge: Utilize the {reticulate} R package to create a bridge between Python and R, enhancing interoperability.
To begin exploring the integration of Dagster and R:
-
Clone the Repository
git clone https://github.com/philiporlando/dagster-and-r.git
-
Navigate to Directory
cd dagster-and-r
-
Install Dependencies Using poetry, install the package and its dependencies:
poetry install
-
Set RETICULATE_PYTHON environment variable Determine the path to the python binary associated with this project's poetry environment.
poetry run which python # /home/user/.cache/pypoetry/virtualenvs/dagster-and-r-kS5e8P_l-py3.10/bin/python
Create a new .Renviron
file at the root of the project and set the RETICULATE_PYTHON
variable to this path.
- Launch the Dagster UI
Start the Dagster web server:
Access the UI at http://localhost:3000 in your browser.
poetry run dagster dev
- Materialize Assets Click the "Materialize all" button in the top right of the UI. Each of the assets within this project should materialize without error.
- Inspect the Run Click the "Runs" tab and navigate to the latest run of the pipeline to access detailed information, including custom logs, asset checks, and environment variables being passed from an external R session.
- Create Assets
Begin writing assets in
dagster_and_r/assets.py
. They are automatically loaded into the Dagster code location.
Then, start the Dagster UI web server:
poetry run dagster dev -m dagster_and_r
Open http://localhost:3000 with your browser to see the project.
- Pass logs between an external R session and Dagster
- Pass environment variables and context between an external R session and Dagster
- Asset checks defined in R
- In-memory data passing
- Pass markdown metadata between R and Dagster (e.g.
head()
of a data.frame))
- Execute external R code from a Docker container op.
To add new Python packages to the project:
poetry add <pkg-name>
Unit tests are essential for ensuring code reliability and are currently being developed. Run existing tests using pytest
:
poetry run pytest dagster_and_r_tests
Note
Unit tests are a work in progress.
To enable Schedules and Sensors, ensure the Dagster Daemon is active:
poetry run dagster dev
With the Daemon running, you can start using schedules and sensors for your jobs.
Contributions to enhance or expand the project are welcome! Feel free to fork the repository, make changes, and submit a pull request.