Getting-and-Cleaning-Data

The "Data Science" Specialization

Course Instructions:

You should create one R script called run_analysis.R that does the following.
Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive activity names.
Creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Documentation for run_analysis.R

Initial Housekeeping

Data Labels are read into R, scrubbed according to tidy data standards
Subject Vectos are read into R, and scrubbed.
Data sets for X (raw and calcualted values) and Y (activity) are read into R.
Subject Voctors are cbound to the data set X for the test and training
Activity vector (Y) is bound to the data set X for the test and training
rbind is used to "merge" the two data sets. This was used for simplicity given that a true merge wasn't necessary and rather costly in this case.

Part 1:

Grep the features vector for "mean" or "std" text
Supply the modified features vector to the columns requirement within the merged data set.
Output this data frame to a tab seperated .txt file
Included in github is the head of this output including 1000 rows of data. (part1_output.head.txt)

File details of part1_output.txt :

size: 10.1 MB
rows: 10299
cols: 86
sample col name: e.g. tbodyaccmeanx, tbodyaccstdx, etc.
only part1_output_head.txt is included due to size limitations (100 lines)

Part 2:

Using the originall merged data frame, split based on subject number and then calculate the average across each variable.
Output this data frame to a tab seperated .txt file
Included in github is the output file. (part2_output.txt)

File details of part2_output.txt :

size: 321.9 KB
rows: 30
cols: 561
sample col name: e.g. tbodyaccmeanx, tbodyaccmeany, etc.