Cleaning-Uber-Dataset

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

In today's date we have people from the data science community who put in a lot of effort to learn about machine learning, deep learning and many new technologies however we forget that for performing all these tasks the main fuel for this is good data. The world is producing data at an enormous rate however obtaining good data is very difficult. I found a lot of data online for Uber rides which have been taken over the world, however the data which was present had many anomolies and errors. Hence I decided why not do some wrangling on the data to make it more valuable for analysis.

Here I have classified the datasets into three categories and all the data has some form of error. Using mathematical algorithms and techniques the datasets have been wrangled and the errors have been removed from the dataset.

The project is done using Python3. All the steps have been elaborately explained and reason for doing each step has also been mentioned. The dataset with the "_dirty" name is the data with errors and after executing the ipynb notebook completely the clean datasets could be obtained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cleaning-Uber-Dataset

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cleaning-Uber-Dataset