Note
This project has been moved to the Dagster monorepo. The updated version can be found here.
Orchestrate your flexible compute workloads with the combined forces of Dagster and Modal.
In this example project we show how you can write a pipeline that automatically detects newly published podcasts, transcribes them using the power of GPUs, and notifies you with the summary so that you can focus on the things that truly matter (not listening to people talk into fancy microphones).
Install the project dependencies:
pip install -e ".[dev]"
Run Dagster:
dagster dev
Open http://localhost:3000 in your browser.
Modal
- Modal Docs
- Modal Docs: Parallel podcast transcription using Whisper
- Modal Docs: Mounting Cloudflare R2 Buckets
Dagster
- Dagster Docs
- Dagster Docs: Pipes Subprocess Reference
- Dagster Docs: OpenAI Integration
- Dagster Docs: AWS Integration
OpenAI
Miscellaneous
Podcasts are sourced from RSS feeds as; a table containing a sample of feeds is below:
Title | RSS Feed |
---|---|
Practical AI: Machine Learning, Data Science, LLM | RSS Feed |
The Data Exchange with Ben Lorica | RSS Feed |
Hub & Spoken: Data | RSS Feed |
Making Data Simple | RSS Feed |
The Data Chief | RSS Feed |