- Separate data comes from multiple sources such as databases, CSV and JSON.
- Constraints for each problem will be specifically defined in the project description section.
An edu-tech platform called "pinter-skuy" provides online courses facilitated by professional mentors, and anyone can enroll in these courses. As the business gains momentum, the management level aims to conduct monitoring and evaluation of their online courses.
Therefore, the information that has been stored in different sources to date is intended to be consolidated into a single source of truth for subsequent analysis.
git clone https://github.com/rifa8/capstone-project-with-dynamic-dag
docker compose up -d
Then open localhost:8080
to access Airflow.
Username: airflow
Password: airflow
Next, set up connections in Airflow. Go to Admin >> Connections
in the Airflow UI, then add a connection. In this project, there are 2 connections, to_bq
for connecting to BigQuery and pg_conn
for connecting to the PostgreSQL database.
Then run the DAG.
TIPS: Let the DAG run according to the schedule. Do not manually run the DAG so that the ExternalTaskSensor can activate automatically.
First, activate the DAG dag_etl_to_dwh
and wait until its status is success. After that, activate the DAG dag_etl_to_datamart
and the ExternalTaskSensor
will run automatically because the task status in the DAG dag_etl_to_dwh
is already success, as specified in the script for dag_etl_to_datamart, allowed_states=['success']
.
task_wait_ext_task = ExternalTaskSensor(
task_id=f"wait_{ext_task_depen['dag_id']}_{ext_task_depen['task_id']}",
external_dag_id=ext_task_depen['dag_id'],
external_task_id=ext_task_depen['task_id'],
allowed_states=['success'],
execution_delta=timedelta(minutes=ext_task_depen['minutes_delta'])
)
Check tables in BigQuery
To view the visualization results (dashboard), you can also access the following link: Looker Studio