In this project you are going to build a data pipeline that is processing the Green Taxi Trips
of the NYC Taxi Trip Dataset.
-
The first task is to write a script that is directly uploading the data of the first 3 months of the year 2021 to a GCS bucket.
-
The second task is to write a ETL or ELT pipeline that is taking the data from the GCS bucket and that is processing the data and calculating the revenue per day. This can be done in the Google Cloud or on your local machine.
Bonus task if you have the time:
- Use prefect for the Workflow Orechestration.
- What are the steps you took to complete the project?
- What are the challenges you faced?
- What are the things you would do differently if you had more time?
Please submit your solution as a link to a github repository. The repository should contain the scripts and a README.md file that is answering the questions above.