GitHub - jcfoster02/TidyDataAssignment: Getting & Cleaning Data Assignment

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CodeBook.rmd		CodeBook.rmd
README.rmd		README.rmd
avg_dataset.txt		avg_dataset.txt
run_analysis.r		run_analysis.r

Repository files navigation

---
#README
###author: Julie Foster
###date: September 4, 2016

## Project summary:

The purpose of this project is to demonstrate the process of collecting, working with and cleaning a dataset.

The dataset is from the Human Activity Recognition Using Smartphones at the University of California, Irvine, UCI, Machine Learning Repository. Data processing is achieved by means of the run_analysis.R script while CoodBook.md serves as a detailed Code Book for creating on output tidy dataset: avg_dataset.

****************
###Table of contents
* README.md (this file)
* run_analysis.R
* CodeBook.md

*****************

##Data Description
###The raw dataset
Details about The Human Activity Recognition Using Smartphones Dataset, HARUS [1] experiments are available using this link as well as from the measurements associated 'README.txt' file. The database was collected in experiments using recordings of 30 volunteer persons (Subject), aged [19-48], performing daily-living activities (Activity) while carrying a waist-mounted Samsung Galaxy S II smartphone. The smartphone was equipped with two sensors: an accelerometer and a gyroscope that recorded, respectively, the 3-axial linear acceleration and 3-axial angular velocity signals of the subject. The experiments were video-recorded to label the data manually. The measurements were randomly partitioned into two sub-sets: Training (70%) and Test (30%).
###The tidy dataset
The HARUS raw data files used to create the tidy set "avg_dataset" are summarized in the accompanying CodeBook.md. Data processing performed by run_analysis.R includes: grouping information from data files, subsetting to retain only a set of variables, and aggregating these variables as averages per Subject and per Activity.

The "avg_dataset" contains:
* Each variable stored in one column.
* Each row provides a different observation of that variable.
* The table produces only one "kind" of variables.

##run_analysis.R:
The "run_analysis.R" script processes the HARUS raw data to create the "avg_dataset" tidy dataset. The latter dataset is saved to the "avg_dataset.txt" file. The script:
* Downloads the raw dataset archive (.zip) file to the local working directory.
* Reads in the datasets, renames the columns, and merges them into the dataset, "largeData".
* Extracts the mean measurements into "meanData" and std deviation measurements into "stdData", and merges them into the dataset, "meanstdData".
* Renames the columns with descriptive variable names.
* Creates a second , independent tidy dataset, "avg_dataset", based on the average of each variable for each subject and each activity.

##CodeBook.md:
The Code Book contains the steps for processing the raw data into the tidy dataset "avg_dataset". It summarizes the raw data files employed by run_analysis.R as well as the variables units and dictionary.