GitHub - strimber/RunAnalysisCleanData: Coursera class assignment (Getting and Cleaning Data)

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Code Book.txt		Code Book.txt
README.txt		README.txt
features_info_modified.txt		features_info_modified.txt
run_analysis.R		run_analysis.R

Repository files navigation

README for Coursera Assignment

This assignment will perform the following operations on the UCI HAR Dataset

Coursera course "Getting and CLeaning Data" course assignment
Purpose: Demonstrate basic skills in data cleaning
Author: Scott Trimber
Date: 4/21/15

1) Check to see if the proper packages are installed and available for data
This script depends on the dplyr library to perform the data manipulation.
It will check to see if the package is installed first and will install the
library if it's not part of the installed packages.

2) check that the data files exist in the current working directory.
PLEASE NOTE. This script will not look down the directory structure the way it is
unzipped from the original zip file. You need to copy the following files into
the current working directory (flatten out the directory hierarchy).
"subject_train.txt",
"subject_test.txt",
"X_train.txt",
"X_test.txt",
"y_train.txt",
"y_text.txt",
"features.txt",
"activity_labels.txt")

3) load the data files into separate objects for manipulation

4) Merge the training and test data sets into one data set

fix the table variable names into something meaningful and combine the dataset into one table.

5) Filter the dataset and retain only the standard deviation and mean columns from the combined dataset and add a variable for subjectId and activity.

6) relabel the activity names to more descriptive terms from their original "int" values.

7) Create a "tidy" dataset with the averages for the values grouped by the subject ID and the activity.

8) write out the new 'tidy' dataset in the current working directory as a text file (tidy.txt) with row.names = FALSE