- This repository has the function of capturing, files hosted locally .csv, transform them, extracting some features of the data as an example. Then load them into a Posgres database.
The organization of this repository is as follows:
├───src │ ├─── extraction.py │ ├─── loading.py │ ├─── setting.py │ ├─── transformation.py │ └─── tables.sql ├─── pyproject.toml ├─── poetry.lock ├─── README.md └─── requirements.txt
-
You must have installed Docker Desktop. In case you do not have it installed, go to the following sites according to the operating system used:
-
you must have installed Poetry. In case you do not have it installed, go to the following site: https://python-poetry.org/docs/
- Clone the repository. whit command: git clone, or download the repository in zip format.
git clone https://github.com/MuttData/ETL_exercise_1
cd ETL_exercise_1
-
Open a terminal in the repository folder.
-
Run the command (if you do not have the image it will take a few minutes)
docker pull postgres
-
This comand build the image of Docker. Official image of postgres in Docker Hub https://hub.docker.com/_/postgres
-
Execute the command
```docker
docker run -d -h <hostname or ip address> -p <port>:5432 --name <name_dontainer> -e POSTGRES_USER=<User> -e POSTGRES_PASSWORD=<Password> -e POSGRES_DB=<DB> postgres
This command creates a Docker container with the postgres image. That will connect to the database with the user and the password on the port of the host. Replace the values between <> with the desired values.
[ Nota ] En caso de tener instalado un cliente de SQL como DBeaver, o pgAdmin Es posible correr las consultas SQL desde estos clientes. con las siguientes configuraciones:
[ Nota ] If you have an SQL client installed such as DBeaver, or pgAdmin It is possible to run the SQL queries from these clients, whitout docker. with the following configurations:
Host: <localhost>
Port: <Port>
User: <User>
Password: <Password>
DB: <DB>
Schema: <Schema>
- It is necessary to get the packages of poetry stored in the poetry.toml For this run the command
poetry install
poetry run python <script_name>.py
to run the scripts. The scripts are in the src folder.
poetry run python main.py
[ Note ] It is not necessary to use poetry, you can install the packages with pip. The packages are in the requirements.txt file. Alternative installation with Conda or pip. Using a package manager such as Conda, you can install the packages with the following command:
pip install -r requirements.txt
In case you do not have pip installed, you can install it with the following command:
python -m pip install --upgrade pip
Next you can install the packages with the following command:
pip install -r requirements.txt
To use the scripts, you must run the following command:
python main.py
This repository has the function of capturing, files hosted locally .csv, transform them, extracting some features of the data as an example. Then load them into a Posgres database. Just for practice.