Skip to content

ftm610/Coursera_Getting_Cleaning_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

run_analysis.R

Developed on:

Windows 7 Professional 64-bit SP1
R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
RStudio version 0.98.1087

R packages required:

dplyr version 0.3.0.2
reshape2 version 1.4.1

This code performs the following actions:

To run this code, first update the path (path_wd) to your preferred working directory.

  1. Downloads and extracts raw zipped data to user defined working directory
    a. Downloads raw zipped data to working directory from web: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
    b. Extracts files to working directory.
    c. Records date and time of download to the "dateDownloaded" variable.

  2. Merges the training and the test sets to create one data set
    a. Reads in, formats and combines the following common, test and train text files: activity_labels, features, subject_test, X_test, y_test, subject_train, X_train, y_train.
    b. Subject data (integer: 1-30) from subject_test and subject_train is assigned to "subject" variable.
    c. Dataset source (character: test or train) is recorded in the "dataset" variable.
    d. Dataset, Subject, Activity (Y_data) and All Other Variables(X_data) were combined with cbind() for the test and train datasets independently. The test and train datasets were then combined into a single dataset using rbind().
    e. Generates a single tidy data frame "data_all" with dataset, subject, activity (character: activity_label) and the variables selected in Step 3 for further analysis. Each row is a single observation, each column is a single variable.
    *See Also: 4a, 5a and code comments for more detail

  3. Extracts only the measurements on the mean and standard deviation for each measurement
    a. Selects only variables named like mean() and std() in original features.txt file, excludes "freq" and "angle" variables.
    Select statement can be modified to return other/additional variables

  4. Uses descriptive activity names to name the activities in the data set
    a. Assigns the descriptive labels in activity_labels.txt to the values in the y_test and y_train files to the variable "activity". Joined on activity_id (integer).

  5. Appropriately labels the data set with descriptive variable names.
    a. Assigns variable names to test and train data frames (X_test, X_train) from the downloaded features.txt file. Features.txt contains duplicate and invalid variable names. Unique (and valid) variable names are generated using make.names with unique = TRUE.
    b. Original and converted variable names are written to the codebook.txt file generated by this code for comparison to raw data README and features_info files.
    *See Also: 6c & codebook.md for more detail

  6. Creates a second, independent tidy data set with the mean of each variable for each activity and subject combination.
    a. Calculates the mean value of each variable by activity and subject using reshape2: melt and dcast
    b. Writes out a space delimited text file "tidy.txt" using write.table with row.names = FALSE
    c. Writes out a space delimited text file "codebook.txt" (only includes selected variables) using write.table with row.names = FALSE
    d. Code to properly read output file back into R is included (commented out at end)
    e. For description of variables see 3a, 5b & codebook.md

About

Repo for Course Project of Coursera: Getting and Cleaning Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages