car-sales-etl

Description for car-sales-etl project
Explore the docs »

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Contributing
License
Contact

About the project

This project is about building a data pipeline to extract, transform, and load (ETL) data from a source to a target. The data source is a CSV file containing information about car sales. The target is a PostgreSQL database table.

PostgreSQL was preferred for the richer data handling with multiple data types, transaction management and its highly scalability to provide great performance at CRUD operations.
The project followed SQLAlchemy models scheme based on OOP concepts that provide an excellent abstraction when working with multiple datasets for a future process. This high level abstraction provides a greater control over the data being inserted as the table structure can be defined with multiple constraints and relationships.
For more advanced requirements, transactions, migrations and more complex operations can be performed through the ORM so managing large amounts of data won't be an issue.
The project also works with PEP8 style that is tested with Pylint and this includes type hinting for variables, functions arguments and more.

If performance is critical, consider using Python 3.11 in terms of handling exceptions that can be thrown and re-raised in shorter execution times.
Assets are also included with future consideration for HTML and CSS files.
Testing could be done using unittests (to be implemented in a future release).

Transformations

Remove any rows with missing values.
Convert the date columns to a standard format.
Create a new column to store the year of the sale.
Replace the categorical values in the "Car Model" column with numerical values.

Requirements

The target database should be either PostgreSQL or MySQL.
The pipeline should be runnable using a command-line interface.
The pipeline should have error handling and logging capabilities.
The pipeline should be modular and easily extendable to handle additional data sources and transformations.

(back to top)

Built with

(back to top)

Getting started

Prerequisites

Python 3.10+

Installation

Clone the repository

git clone https://github.com/jpcadena/car-sales-etl.git

Change the directory to root project
```
cd car-sales-etl
```
Create a virtual environment venv
```
python3 -m venv venv
```
Activate environment in Windows
```
.\venv\Scripts\activate
```
Or with Unix/Mac OS X
```
source venv/bin/activate
```
Install requirements with PIP
```
pip install -r requirements.txt
```

(back to top)

Usage

Rename file sample.env to .env.
Replace your credentials into the .env file.
Execute with console.
```
python main.py
```

(back to top)

Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Use docstrings with reStructuredText format by adding triple double quotes """ after function definition.
Add a brief function description, also for the parameters including the return value and its corresponding data type.
Please use linting to check your code quality following PEP 8.
Check documentation for Visual Studio Code or Jetbrains Pycharm.\

Recommended plugin for autocompletion: Tabnine

(back to top)

License

Distributed under the MIT License.

(back to top)

Contact

LinkedIn: Juan Pablo Cadena Aguilar

E-mail: Juan Pablo Cadena Aguilar

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
analysis		analysis
assets		assets
core		core
data		data
db		db
docs		docs
engineering		engineering
logs		logs
models		models
notebooks		notebooks
reports/figures		reports/figures
schema		schema
services		services
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
favicon.ico		favicon.ico
main.py		main.py
requirements.txt		requirements.txt
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

car-sales-etl

car-sales-etl

About the project

Transformations

Requirements

Built with

Getting started

Prerequisites

Installation

Usage

Contributing

License

Contact

About

Releases

Packages

Languages

License

jpcadena/car-sales-etl

Folders and files

Latest commit

History

Repository files navigation

car-sales-etl

car-sales-etl

About the project

Transformations

Requirements

Built with

Getting started

Prerequisites

Installation

Usage

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages