Manipulate and clean data in Python
In this workshop, you will learn how to use Python, and popular libraries like NumPy and pandas, to manipulate and clean data to prepare it for analysis.
Goal | Description |
---|---|
What will you learn | How to find information about, clean, and prepare data that's stored in a pandas DataFrame. |
What you'll need | Visual Studio Code environment set up to run Python and Jupyter notebooks |
Duration | 1 hr 20 min |
Just want to try the app or see the solution? | Solution |
Slides | Powerpoint |
🎥 Click this image to watch Ornella walk you through the workshop
- Visual Studio Code
- Python
- Python extension for Visual Studio Code
- Jupyter extension for Visual Studio Code
- Activated Anaconda environment
- A data science environment in VS Code
Say you want to perform some analysis on a dataset that you find interesting -- like the squirrel population of Central Park, or various types of French cheese. The first thing you'll need to do with any dataset is to clean it up. Many datasets have missing information, or won't be formatted in the exact way you'd like. In this workshop, you will learn how to use data science libraries to prepare your data for analysis and visualization.
In this section, you'll review an introduction and make sure that your data science environment is set up correctly before continuing on to the next part of the workshop.
Next, you will learn how to use Python libraries to explore an iconic dataset. You will be able to understand how to use pandas DataFrames to get an immediate idea about the size, shape, and content of a particular dataset.
Now that you know how to get an overall sense of the dataset you are working with, you will learn how to identify and deal with missing values.
Another common thing you'll have to do with most datasets you encounter is remove duplicate data. In this section of the workshop, you will learn how to use pandas to detect and remove duplicate entries.
Sometimes, you will need to combine datasets together. Luckily, there are several methods available in pandas to merge and join datasets.
So far, you've learned how to use pandas methods to examine some aspects of a DataFrame, and fill in, remove, and combine data. The final way we will seek to understand our data is by creating visualizations.
- Explore and analyze data with Python
- Introduction to machine learning
- Discover the role of Python in space exploration
To test your knowledge, try downloading a free dataset from Kaggle that you find interesting. Use the techniques that you learned in this workshop to manipulate and clean your data!
Be sure to give feedback about this workshop!