How to utilize the run_analysis.R functions
===========
The data used for this analysis was collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
The data will be referred to as the UCI HAR dataset.
To obtain the raw data and it’s associated documentation, visit here: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
The analysis requires the reshape2 library for the final steps which melt and cast the final tidy data
The file run_analysis.R contains the following functions (and a brief description of each):
- getTrainingData: extracts training data from the subject, x and y training files in the UCI HAR dataset
- getTestData: extracts training data from the subject, x and y test files in the UCI HAR dataset
- mergeData: merges the training and test data; calls functions 1 & 2 above
- extractData: subsets the relevant columns from the merged data created by the mergeData function; only mean and std deviation variables kept along with identifiers
- addActivityLabel: adds readable text describing the activities in the merged dataset, utilizing activity_labels.txt from the UCI HAR data files
- createTidyData: takes the labeled dataset created in the addActivityLabel functions as an input; melts the data with identifiers and measurements separated; all measurements columns have their mean calculated by activity and subject and the data is restructured using the dcast function; this function calls exportTinyData as the final step
- exportTinyData: simply takes the data from createTidyData and writes it to a text file called tidydata.txt
The following steps should be followed to generate the final tidy dataset
- Create a data frame which contains a merger of the test and training datasets by assigning the output of mergeData() to the object. For example: > data <- mergeData()
- Create a subset of the data from the step above which includes just variables with “mean” or “std” (standard deviation) as measurement variables and identifiers (ActivityID and SubjectID). This is done using the extractData() function. For example:> data2 <- extractData(data)
- Add descriptive labels to the activities by merging the activity_labels.txt file to the extracted dataset using the addActivityLabel function. Example:> actData <- addActivityLabel(data2)
- Get the mean value for each measurement in the dataset by Activity and Subject using the createTidyData function. This function expects the dataset as an argument and then returns the tidy data and saves it to a file called “tidydata.txt” in the current working directory. Ex: > tidyData <- createTidyData(actData)