Skip to content

haonen/Chicago-business-prediction

Repository files navigation

Chicago-business-prediction

Implement a machine learning pipeline to priorize the business licenses that are likely to die in 2 years in Chicago.

The full project report can be found here

Pipelines

  • cofigs: a folder contains the configure files of different combinations of features. We use them to pass all the parameters that we need into pipelines.
  • data: contains an sh file to download the cleaned full data set.
  • data_collector: a folder contains all the code we use to collect and clean data.
  • output: a folder to save the results of our pipeline, including performance table, precision and recall curve plots and AUC-ROC curve plots.
  • pipeline: contains the modules of imputation, evaluation, discretization, get dummies and scaling.
  • tests: a set of test code for our piepline.
  • main.py: the main function to run models and get results.
  • transformer.py: the function to preprocess data set before modeling.

Getting Started

Get the full dataset

cd data
sh get_fullfiles.sh

Prerequisites

All the packages' requirement is in the enviorment.yml

To clone the enviorment, simply run the following:

conda env create -f environment.yml

To activate the enviorment, simply run the following:

conda activate myenv

Installing

python setup.py install

Running the tests

py.test

Running the pipeline

python main.py --config ./cofigs/acs_geo.yml

In the configs file, there are different combination of features that from ACS, reported 311, reported Crime, business license that you can choose.

Getting results

The results of the pipeline is saved in the output folder.

Under the performance foler, there would be csvs to keep all the performance of all models

Under the pr folder, there would be precison-recall graphs

Under the roc foler, there would be roc graphs

Additional materials

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

This project is the final project of machine learning for public policy in University of Chicago.

About

Final Project for CAPP30254 (Machine Learning in Python3)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •