run_analysis.R

Developed on:

Windows 7 Professional 64-bit SP1
R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
RStudio version 0.98.1087

R packages required:

dplyr version 0.3.0.2
reshape2 version 1.4.1

This code performs the following actions:

To run this code, first update the path (path_wd) to your preferred working directory.

Downloads and extracts raw zipped data to user defined working directory
a. Downloads raw zipped data to working directory from web: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
b. Extracts files to working directory.
c. Records date and time of download to the "dateDownloaded" variable.
Merges the training and the test sets to create one data set
a. Reads in, formats and combines the following common, test and train text files: activity_labels, features, subject_test, X_test, y_test, subject_train, X_train, y_train.
b. Subject data (integer: 1-30) from subject_test and subject_train is assigned to "subject" variable.
c. Dataset source (character: test or train) is recorded in the "dataset" variable.
d. Dataset, Subject, Activity (Y_data) and All Other Variables(X_data) were combined with cbind() for the test and train datasets independently. The test and train datasets were then combined into a single dataset using rbind().
e. Generates a single tidy data frame "data_all" with dataset, subject, activity (character: activity_label) and the variables selected in Step 3 for further analysis. Each row is a single observation, each column is a single variable.
*See Also: 4a, 5a and code comments for more detail
Extracts only the measurements on the mean and standard deviation for each measurement
a. Selects only variables named like mean() and std() in original features.txt file, excludes "freq" and "angle" variables.
Select statement can be modified to return other/additional variables
Uses descriptive activity names to name the activities in the data set
a. Assigns the descriptive labels in activity_labels.txt to the values in the y_test and y_train files to the variable "activity". Joined on activity_id (integer).
Appropriately labels the data set with descriptive variable names.
a. Assigns variable names to test and train data frames (X_test, X_train) from the downloaded features.txt file. Features.txt contains duplicate and invalid variable names. Unique (and valid) variable names are generated using make.names with unique = TRUE.
b. Original and converted variable names are written to the codebook.txt file generated by this code for comparison to raw data README and features_info files.
*See Also: 6c & codebook.md for more detail
Creates a second, independent tidy data set with the mean of each variable for each activity and subject combination.
a. Calculates the mean value of each variable by activity and subject using reshape2: melt and dcast
b. Writes out a space delimited text file "tidy.txt" using write.table with row.names = FALSE
c. Writes out a space delimited text file "codebook.txt" (only includes selected variables) using write.table with row.names = FALSE
d. Code to properly read output file back into R is included (commented out at end)
e. For description of variables see 3a, 5b & codebook.md

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
codebook.md		codebook.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

run_analysis.R

This code performs the following actions:

About

Releases

Packages

Languages

ftm610/Coursera_Getting_Cleaning_Data

Folders and files

Latest commit

History

Repository files navigation

run_analysis.R

This code performs the following actions:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages