The "Data Science" Specialization
- You should create one R script called run_analysis.R that does the following.
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive activity names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- Data Labels are read into R, scrubbed according to tidy data standards
- Subject Vectos are read into R, and scrubbed.
- Data sets for X (raw and calcualted values) and Y (activity) are read into R.
- Subject Voctors are cbound to the data set X for the test and training
- Activity vector (Y) is bound to the data set X for the test and training
- rbind is used to "merge" the two data sets. This was used for simplicity given that a true merge wasn't necessary and rather costly in this case.
- Grep the features vector for "mean" or "std" text
- Supply the modified features vector to the columns requirement within the merged data set.
- Output this data frame to a tab seperated .txt file
- Included in github is the head of this output including 1000 rows of data. (part1_output.head.txt)
- size: 10.1 MB
- rows: 10299
- cols: 86
- sample col name: e.g. tbodyaccmeanx, tbodyaccstdx, etc.
- only part1_output_head.txt is included due to size limitations (100 lines)
- Using the originall merged data frame, split based on subject number and then calculate the average across each variable.
- Output this data frame to a tab seperated .txt file
- Included in github is the output file. (part2_output.txt)
- size: 321.9 KB
- rows: 30
- cols: 561
- sample col name: e.g. tbodyaccmeanx, tbodyaccmeany, etc.