Skip to content

YKarsten/Clean-and-tranform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clean-and-tranform

Project Overview

Welcome to the Clean-and-Transform project, where we dive into the realm of data refinement and enhancement using Python's powerful pandas and numpy libraries. In this endeavor, I aim to demonstrate my expertise in the art of data cleaning and transformation.

Why Data Cleaning and Transformation?

Data is at the heart of every data-driven project, and the quality of your data directly impacts the success of your analysis, modeling, and decision-making. In many real-world scenarios, data can be messy, inconsistent, or riddled with inaccuracies. This project provides an opportunity to showcase my skills in addressing these challenges, making the data more reliable, and preparing it for further analysis.

The Dataset

For this project, I've chosen a well-documented dataset with a substantial number of data points. This dataset has been generated by a machine, but it's not immune to discrepancies. These discrepancies present us with a valuable opportunity to hone our data cleaning and transformation skills, turning raw data into a refined, trustworthy resource.

What to Expect

Throughout this project, I will tackle a variety of data issues, including missing values, inconsistent formats, outliers, and more. I'll employ pandas and numpy to perform these transformations systematically and efficiently. By the end of this project, you can expect to see the data in a much-improved state, ready for further analysis, visualization, or machine learning applications.

Let's roll up our sleeves and get started!

Installation

You can view a rendered version of the notebook here.
Or a pdf version of the notebook here

Prerequisites

  • Python: Ensure that Python is installed on your machine. You can download it from python.org.
  • Jupyter Lab: Install Jupyter Lab using the following command in your terminal or command prompt:
    pip install jupyter lab
  • External Libraries: Use pip install for library installation.
    pip install pandas numpy matplotlib 
    

Steps

  1. Download: Download the Jupyter notebook file SO-2023-survey.ipynb from this repository to your local machine.

  2. Run Jupyter Lab Server:

  • Open a terminal or command prompt.
  • Navigate to the directory where you saved the notebook file.
  • Run the following command:
    jupyter lab
    
  1. Access the notebook:
  • Open your web browser and go to the URL displayed in the terminal.
  • Navigate to the notebook file and click on it to open.
  1. Interact with the Notebook:
  • Execute code cells using the "Run" button or by pressing Shift + Enter.

Usage

It is recommended to Run all cells as this ensures all cells to execute properly.

Structure

  1. youtube.ipynb: Jupyter notebook containing all the steps of the data cleaning process

  2. youtube.pdf Pdf rendition of the aforementioned jupyter notebook.

  3. Global YouTube Statistics.csv csv file of the original dataset.

  4. README.md: Instructions on how to get started, install dependencies, and use the Jupyter notebook.

  5. license.txt Text file listing the MIT open-source software license

Dependencies

  • Python: Version 3.10.12
  • Jupyter Lab: Version 4.0.5
  • Libraries:
    • NumPy: Version 1.25.2
    • pandas: Version 2.0.3
    • matplotlib: Version 3.7.2

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Basic Data Cleaning & Transformation using Python

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published