Forked from: Khuyen Tran
Note: This template uses poetry. If you prefer using pip, go to the pip branch instead.
This repository is a template for a data science project. This is the project structure I frequently use for my data science project.
- Poetry: Dependency management - article
- hydra: Manage configuration files - article
- pre-commit plugins: Automate code reviewing formatting - article
- DVC: Data version control - article
- pdoc: Automatically create an API documentation for your project
.
├── config
│ ├── main.yaml # Main configuration file
│ ├── model # Configurations for training model
│ │ ├── model1.yaml # First variation of parameters to train model
│ │ └── model2.yaml # Second variation of parameters to train model
│ └── process # Configurations for processing data
│ ├── process1.yaml # First variation of parameters to process data
│ └── process2.yaml # Second variation of parameters to process data
├── data
│ ├── final # data after training the model
│ ├── processed # data after processing
│ ├── raw # raw data
│ └── raw.dvc # DVC file of data/raw
├── docs # documentation for your project
├── dvc.yaml # DVC pipeline
├── .flake8 # configuration for flake8 - a Python formatter tool
├── .gitignore # ignore files that cannot commit to Git
├── Makefile # store useful commands to set up the environment
├── models # store models
├── notebooks # store notebooks
├── .pre-commit-config.yaml # configurations for pre-commit
├── pyproject.toml # dependencies for poetry
├── README.md # describe your project
├── src # store source code
│ ├── __init__.py # make src a Python module
│ ├── process.py # process data before training model
│ └── train_model.py # train model
└── tests # store tests
├── __init__.py # make tests a Python module
├── test_process.py # test functions for process.py
└── test_train_model.py # test functions for train_model.py
Install Cookiecutter:
pip install cookiecutter
Create a project based on the template:
cookiecutter https://github.com/khuyentran1401/data-science-template
Find detailed explanation of this template here.