This repository contains all the data and notebooks for data cleaning and analysis needed to obtain the conclusions made regarding taxi trips in NYC. Traveling by taxi is one of the most popular modes of transportation in NYC, with over 80,000 trips in the month of July 2021 alone, according to the provided dataset. Because of this, we wanted to analyze some of this data to get a better understanding of fares and other aspects of taxi trips within NYC. The main notebook contains a summary of all our findings regarding fares vs. different metrics of time, tip analysis, a tip prediciton model, and fares and trips vs. location of taxi zones.
The data used in this analysis can be found at the following links:
- https://www.kaggle.com/datasets/anandaramg/taxi-trip-data-nyc
- https://data.cityofnewyork.us/Transportation/NYC-Taxi-Zones/d3c5-ddgc
taxi_clean.csv contains additional columns from the taxi_tripdata.csv:
day_of_week
: Day of the week the trip took place, with 0=Monday and 6=Sundayday_of_month
: Day of the month the trip took placehour_of_day
: Hour of the day the trip startedtrip_duration
: Duration of the entire trip in secondstotal_without_tip
: Total cost of the trip not including tipfare_per_mile
: Dollar cost of the trip per mile (not including tip cost)
For more information regarding the columns in the taxi trips dataset see: https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
To run the code in this repo, either use the provided Binder link (which will set up the environment necessary for analysis), or clone this git repository and do the following:
- Configure the environment by running
make env
and activate this environment by runningconda activate hw7env
. Doing this will also install the taxitools package. - If you'd like to manually install or uninstall the taxitools package, run
pip install .
andpip uninstall taxitools
respectively in the root directory of this repo. - Execute notebooks manually or run
make all
to run all the notebooks (excluding the taxi_clean.ipynb data cleaning notebook). Runningmake clean
will clean all the figures and tables produced by the notebooks.
Note: while there isn't a required order of execution for the analysis notebooks, all the figures from the analysis notebooks should exist before attempting to manually execute main.ipynb (All figures currently exist, but may be removed by calling make clean
).
The following are available make commands:
env
: Creates the environment and configures it by activating it and installing ipykernel into itremove-env
: Removes the configured environmentdata/taxi_clean.csv
: Create the cleaned taxi_tripdata.csvclean
: Remove all figures from the /figures and /tables directoryall
: Create all figures and tables from the notebooks
To run tests in the taxitools package, follow these steps:
- Ensure that the proper environment is setup by running
make env
. If an old version of the environment is currently setup, or you would like to reconfigure the environment, runconda activate
on a different environment and runmake remove-env
first. - After running
conda activate hw7env
, runconda install -c anaconda pytest
. - If the taxitools package is not yet installed, manually run
pip install .
inside of the root directory (hw07-group16). - Run
pytest taxitools
.