Skip to content

Latest commit

 

History

History
57 lines (37 loc) · 3.33 KB

README.md

File metadata and controls

57 lines (37 loc) · 3.33 KB

run_analysis.R

Developed on:

Windows 7 Professional 64-bit SP1
R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
RStudio version 0.98.1087

R packages required:

dplyr version 0.3.0.2
reshape2 version 1.4.1

This code performs the following actions:

To run this code, first update the path (path_wd) to your preferred working directory.

  1. Downloads and extracts raw zipped data to user defined working directory
    a. Downloads raw zipped data to working directory from web: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
    b. Extracts files to working directory.
    c. Records date and time of download to the "dateDownloaded" variable.

  2. Merges the training and the test sets to create one data set
    a. Reads in, formats and combines the following common, test and train text files: activity_labels, features, subject_test, X_test, y_test, subject_train, X_train, y_train.
    b. Subject data (integer: 1-30) from subject_test and subject_train is assigned to "subject" variable.
    c. Dataset source (character: test or train) is recorded in the "dataset" variable.
    d. Dataset, Subject, Activity (Y_data) and All Other Variables(X_data) were combined with cbind() for the test and train datasets independently. The test and train datasets were then combined into a single dataset using rbind().
    e. Generates a single tidy data frame "data_all" with dataset, subject, activity (character: activity_label) and the variables selected in Step 3 for further analysis. Each row is a single observation, each column is a single variable.
    *See Also: 4a, 5a and code comments for more detail

  3. Extracts only the measurements on the mean and standard deviation for each measurement
    a. Selects only variables named like mean() and std() in original features.txt file, excludes "freq" and "angle" variables.
    Select statement can be modified to return other/additional variables

  4. Uses descriptive activity names to name the activities in the data set
    a. Assigns the descriptive labels in activity_labels.txt to the values in the y_test and y_train files to the variable "activity". Joined on activity_id (integer).

  5. Appropriately labels the data set with descriptive variable names.
    a. Assigns variable names to test and train data frames (X_test, X_train) from the downloaded features.txt file. Features.txt contains duplicate and invalid variable names. Unique (and valid) variable names are generated using make.names with unique = TRUE.
    b. Original and converted variable names are written to the codebook.txt file generated by this code for comparison to raw data README and features_info files.
    *See Also: 6c & codebook.md for more detail

  6. Creates a second, independent tidy data set with the mean of each variable for each activity and subject combination.
    a. Calculates the mean value of each variable by activity and subject using reshape2: melt and dcast
    b. Writes out a space delimited text file "tidy.txt" using write.table with row.names = FALSE
    c. Writes out a space delimited text file "codebook.txt" (only includes selected variables) using write.table with row.names = FALSE
    d. Code to properly read output file back into R is included (commented out at end)
    e. For description of variables see 3a, 5b & codebook.md