jcfoster02/TidyDataAssignment
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
--- #README ###author: Julie Foster ###date: September 4, 2016 ## Project summary: The purpose of this project is to demonstrate the process of collecting, working with and cleaning a dataset. The dataset is from the Human Activity Recognition Using Smartphones at the University of California, Irvine, UCI, Machine Learning Repository. Data processing is achieved by means of the run_analysis.R script while CoodBook.md serves as a detailed Code Book for creating on output tidy dataset: avg_dataset. **************** ###Table of contents * README.md (this file) * run_analysis.R * CodeBook.md ***************** ##Data Description ###The raw dataset Details about The Human Activity Recognition Using Smartphones Dataset, HARUS [1] experiments are available using this link as well as from the measurements associated 'README.txt' file. The database was collected in experiments using recordings of 30 volunteer persons (Subject), aged [19-48], performing daily-living activities (Activity) while carrying a waist-mounted Samsung Galaxy S II smartphone. The smartphone was equipped with two sensors: an accelerometer and a gyroscope that recorded, respectively, the 3-axial linear acceleration and 3-axial angular velocity signals of the subject. The experiments were video-recorded to label the data manually. The measurements were randomly partitioned into two sub-sets: Training (70%) and Test (30%). ###The tidy dataset The HARUS raw data files used to create the tidy set "avg_dataset" are summarized in the accompanying CodeBook.md. Data processing performed by run_analysis.R includes: grouping information from data files, subsetting to retain only a set of variables, and aggregating these variables as averages per Subject and per Activity. The "avg_dataset" contains: * Each variable stored in one column. * Each row provides a different observation of that variable. * The table produces only one "kind" of variables. ##run_analysis.R: The "run_analysis.R" script processes the HARUS raw data to create the "avg_dataset" tidy dataset. The latter dataset is saved to the "avg_dataset.txt" file. The script: * Downloads the raw dataset archive (.zip) file to the local working directory. * Reads in the datasets, renames the columns, and merges them into the dataset, "largeData". * Extracts the mean measurements into "meanData" and std deviation measurements into "stdData", and merges them into the dataset, "meanstdData". * Renames the columns with descriptive variable names. * Creates a second , independent tidy dataset, "avg_dataset", based on the average of each variable for each subject and each activity. ##CodeBook.md: The Code Book contains the steps for processing the raw data into the tidy dataset "avg_dataset". It summarizes the raw data files employed by run_analysis.R as well as the variables units and dictionary.