Wrangle Report

Introduction

The aim of these project was to iterate over the processes of data gathering, visual and programmatic data assessment, data clelaning towards performing informative analyses.

Data Gathering

The twitter archive data of the @WeRateDogs account was provided by Udacity, in a .csv format. The dataframe contained 2356 observations and 17 columns, with columns including; tweet_id, dog_stage, name, etc up until August 1, 2017.

The image predictions dataset is available on the Udacity servers and was retrived programmatically using the Requests library.

Additional data were scraped from the Twitter API using the TweePy library. The information retrieved included the twitter_id, favorite_count and retweet_count.

Data Assessment

The datasets were assessed: Visually, by displaying the dataset in Jupyter notebook and Excel to detect any quality and tidiness issues. Programmatically, by employing the use of pandas functions and methods. Various issues were detected and a total of ten quality issues and 2 tidiness issues were documented.

Data Cleaning

Each documented issue were then addressed programmatically using a range of pandas functions.

Conclusion

The required @WeRateDogs data were gathered, assessed and cleaned. The datasets weere pooled into one dattaframe names twitter_archive_master.csv with 1030 rows and 13 columns. The master dataset is the result of this data cleaning process, but will require further iterations to truly over all the quality and tidiness issues thay are present, since the data wrangling process id iterative.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
DW2.git		DW2.git
README.md		README.md
act_report (2).html		act_report (2).html
image_pred.tsv		image_pred.tsv
tweet-json (1).txt		tweet-json (1).txt
tweet_json.txt		tweet_json.txt
twitter-archive-enhanced.csv		twitter-archive-enhanced.csv
wrangle_act.ipynb		wrangle_act.ipynb
wrangle_report.html		wrangle_report.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wrangle Report

Introduction

Data Gathering

Data Assessment

Data Cleaning

Conclusion

About

Releases

Packages

Languages

bunmiadejimi/DW_WerateDogs

Folders and files

Latest commit

History

Repository files navigation

Wrangle Report

Introduction

Data Gathering

Data Assessment

Data Cleaning

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages