#Getting and Cleaning Data Course Project for the Coursera/Johns Hopkins University "Getting and Cleaning Data" course.
##Introduction In this course project, the assignment was to download a copy of the Human Activity Recognition Using Smartphones Data Set from the UCI Machine Learning Repository, and from the raw data produce an independent, tidy data set according to the project instructions.
- README.md - this file!
- run_analysis.R - R script that performs the assignment
- CodeBook.md - A code book that explains all variables and transformations.
- tidy.txt - the output file from run_analysis.R
Briefly, the R script called run_analysis.R does the following:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set.
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
Further details can be found in CodeBook.md and in the inline comments of run_analysis.R
##Requirements The following assumptions are made:
- The script ("run_analysis.R") is downloaded to your working directory.
- The project data is downloaded from the following address https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
- The data is unzipped to your working directory ("./UCI HAR Dataset/").
- Please note that this script depends on the package "dplyr" by Hadley Wickham. If needed, add the first line of code below in the console before sourcing the script:
install.packages("dplyr")
source("run_analysis.R")
To examine the output in "tidy.txt", I would recommend to use RStudio, and enter the following commands in the console:
tidy <- read.table("tidy.txt", header = TRUE)
View(tidy)
David Hood was most helpful in the Coursera forum, please see his detailed FAQ for the course project.