title | author | date | output |
---|---|---|---|
README.Rmd |
Simon Filiatrault |
Monday, August 18, 2014 |
html_document |
This README explains the content of the project files for the getting and cleaning data coursera course project. The goal of this project is to:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
Files | Description | Details |
---|---|---|
run_analysis.R | This file is the main analysis script | It will do the 5 steps above |
CodeBook.Rmd | This is the CookBook explaining the project | Listing all steps in "cooking" the data in details |
README.Rmd | This file | Explains the files and data input / output |
README.html | This file in HTML format | In case you want to see a nice output |
CodeBook.html | The cookbook in HTML | Easier to read |
X_Merge_Data_Averages_Grouped.txt | The merged data set | Created in step 5 above |
NOTE: For the details of how the DATA was "Cooked", please open the CookBook.html
The original dataset is seperated in many files with not very descriptive column names. The main data set has over 500 columns, so this project goal is to extract part of this dataset Another goal is to add the activity and users (two seperated dataset) in the main one When this is done, we can merge the two main data set, the training one and the test one. See the CookBook.html for details.
The column names have been "renamed" to have more descriptive names.
The approch I took is to replace terms like "t" with more descriptive name "Time" for example
The downside of having a more descriptive name is that the column name can become long
So you can choose what to search&replace to suite your needs and taste. Finding the balance that work for you.
Again, details are in the CookBook.html
First you need to download the zip datasets and extract in your working directory. This will create this structure:
- UCI HAR Dataset
- test
- Inertial Signals
- train
- Inertial Signals
- test
The datasets we need are located under the test and train subfolder. The scripts in run_analysis.R is coded for this.
Enjoy!