Study Design

title	author	date	output
Getdata Assignment 1	Robert Hadow	5 October 2015	html_document

The reader is referred to UCIHAR Codebook.Rmd which contains all of the following text as well as code and explanatory notes.

Study Design

The UCI HAR dataset described herein was generated in a series of observations of human physical activity using a smartphone's inbuilt accelerometer and gyroscope. These observations of thirty volunteers in six activities were collected in Genoa in 2012.

Two detectors of this type can directly measure the following at a rate of fifty observations per second:

linear acceleration in three dimensions
orientation in three dimensions

Using time-series of these observations, an analyst may calculate other measures, including:

axial acceleration in three dimensions
jerk in three dimensions (time derivative of acceleration)
angular jerk in three dimensions (time derivative of angular acceleration)
decomposition of these measures in the frequency domain (Fourier Transform)

The data is further analyzed for measures of central tendency and error. In total, 561 measures are generated.

This data was delivered to us after the first round of calculation and analysis. It includes 18 million data points in 26 data files. There are two additional files of human-readable description. The following data files were used and reorganized here:

features - descriptions of summary observations data
X_test - summary observation data
X_train - summary observation data
subject_test - subject identification
subject_train - subject identification
y_test - exercise type
y_train - exercise type

Data Approach

We used the seven data files to develop, conceptually, a tidy cube of data along three dimensions:

6 exercises
30 subjects
77 summary measures

Principles of tidy data organization limited us to presentations in two dimensions: columns, describing attributes of the data, and rows, describing particular observations. A tidy format should allow use of the data with the minimum amount of manipulation,based on the purposes for which the data will be analyzed. We designed our data set to allow its use with the following operations, listed here in decreasing order of priority:

viewing as is (no manipulation)
summary measures, including sum, mean, median,etc
grouping and subsetting (logical rearrangement, row-wise)
reordering (physical rearrangement, row-wise)
numerical manipulation (modifying or adding columns)
disassembly and reassembly of the data block

Ease of Viewing

To accommodate viewing (goal 1) we put exercise and subject leftmost in the tableau.
Insofar as we anticipated the user would rearrange the row order to meet his or her needs, we left the order as it was delivered to us, first test data then training data (because that was alphabetical if for no other reason).

Accomodate Summary Measures

All values of a particular type, i.e. "tBodyAcc-mean" are in the same column (a "wide" or "un-stacked" layout), accommodating summary measures. We stripped parentheses from the column names. Because the parentheses were common, they added no information, but they did add significantly to the file size.

Accomodate Grouping

Each observation appears in a row, all of the attributes in columns.

Reordering

Observations may be reordered row-wise without other manipulation.

Numerical Manipulation

We chose to leave activity as a numerical attribute in the main data table. This keeps the entire table numeric data. Having one character column in a 79 column table, we thought, is unnecessary complexity and adds no information. We provided the activity codes as a separate table.

Wholesale Changes to Data Structure

A two-table structure is far easier to manipulate than the 26 we started with.

Details of Data Conversion

We downloaded the raw data and reference materials according to the script described in the codebook.

Reading Data Files into R

We read all of the data files into R. We used this opportunity to check the parameters for this conversion and to see which files were relevant.

Inspection and House Cleaning

We inspected the files, made adjustments to the scripts, and dumped unnecessary files and objects.

Tidy Data

We created a tidy data set using the following steps:

organize and apply column names to the test data set
organize subject data, append it to the observations
organize exercise data, append it to the observations
redact the data,eliminating unwanted columns
combine the test and train datasets

Create Codebook

We were careful not to make any assumptions about the data that was not in the raw data package.

Codebook

The documentation that came with the raw data contains descriptive attribute names from which a reader may surmise the origin and use of the associated data elements. For more detailed explanations, the user is referred to the Non Linear Complex Systems Laboratory, Genoa, activity [email protected].

The data dictionary that follows was generated from the UCIHAR data tables. Similarly named attributes may be used as primary keys.

Trailing digits refer to raw data not included in this set.

All data in the set is normalized -1.00 - 1.00.

Summary Dataset - UCIHAR_summary

UCIHAR contains r I(dataPoints) measures. It has r I(nrow(ucihar)) rows of data.

UCIHAR_summary is a summary of UCIHAR. It provides the means of the data in UCIHAR.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
README.md		README.md
UCIHAR Codebook.Rmd		UCIHAR Codebook.Rmd
UCIHAR_summary.txt		UCIHAR_summary.txt
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Study Design

Data Approach

Ease of Viewing

Accomodate Summary Measures

Accomodate Grouping

Reordering

Numerical Manipulation

Wholesale Changes to Data Structure

Details of Data Conversion

Reading Data Files into R

Inspection and House Cleaning

Tidy Data

Create Codebook

Codebook

Summary Dataset - UCIHAR_summary

About

Uh oh!

Releases

Packages

Languages

roberthadow/getdata

Folders and files

Latest commit

History

Repository files navigation

Study Design

Data Approach

Ease of Viewing

Accomodate Summary Measures

Accomodate Grouping

Reordering

Numerical Manipulation

Wholesale Changes to Data Structure

Details of Data Conversion

Reading Data Files into R

Inspection and House Cleaning

Tidy Data

Create Codebook

Codebook

Summary Dataset - UCIHAR_summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages