Skip to content

jcfoster02/TidyDataAssignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

---
#README
###author: Julie Foster
###date: September 4, 2016



## Project summary: 

The purpose of this project is to demonstrate the process of collecting, working with and cleaning a dataset.  

The dataset is from the Human Activity Recognition Using Smartphones at the University of California, Irvine, UCI, Machine Learning Repository. Data processing is achieved by means of the run_analysis.R script while CoodBook.md serves as a detailed Code Book for creating on output tidy dataset: avg_dataset.
 
****************
###Table of contents    
*	README.md (this file)  
*	run_analysis.R
*	CodeBook.md     

*****************

##Data Description
###The raw dataset
Details about The Human Activity Recognition Using Smartphones Dataset, HARUS [1] experiments are available using this link as well as from the measurements associated 'README.txt' file. The database was collected in experiments using recordings of 30 volunteer persons (Subject), aged [19-48], performing daily-living activities (Activity) while carrying a waist-mounted Samsung Galaxy S II smartphone. The smartphone was equipped with two sensors: an accelerometer and a gyroscope that recorded, respectively, the 3-axial linear acceleration and 3-axial angular velocity signals of the subject. The experiments were video-recorded to label the data manually. The measurements were randomly partitioned into two sub-sets: Training (70%) and Test (30%).
###The tidy dataset
The HARUS raw data files used to create the tidy set "avg_dataset" are summarized in the accompanying CodeBook.md. Data processing performed by run_analysis.R includes: grouping information from data files, subsetting to retain only a set of variables, and aggregating these variables as averages per Subject and per Activity.

The "avg_dataset" contains: 
*	Each variable stored in one column.
*	Each row provides a different observation of that variable.
*	The table produces only one "kind" of variables.

##run_analysis.R:
The "run_analysis.R" script processes the HARUS raw data to create the "avg_dataset" tidy dataset. The latter dataset is saved to the "avg_dataset.txt" file.  The script:
*	Downloads the raw dataset archive (.zip) file to the local working directory.
*	Reads in the datasets, renames the columns, and merges them into the dataset,  "largeData".
*	Extracts the mean measurements into "meanData" and std deviation measurements into "stdData", and merges them into the dataset, "meanstdData".
*	Renames the columns with descriptive variable names.
*	Creates a second , independent tidy dataset, "avg_dataset", based on the average  of each variable for each subject and each activity.

##CodeBook.md:
The Code Book contains the steps for processing the raw data into the tidy dataset "avg_dataset". It summarizes the raw data files employed by run_analysis.R as well as the variables units and dictionary.


About

Getting & Cleaning Data Assignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages