This repository contains the code I have written to perform the specified analysis task on the UCI HAR Dataset for the course project for the coursera course - Getting and Cleaning Data.
The raw data contains both test and training data set from the experiment carried out on 30 subjects for various activities like Walking , Standing etc wearing smartphone and measurements for acceleration etc are catprued.
Download and extract the source data from the URL : "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" Extract the downloaded Zip to working directory and you should have a folder named UCI HAR Dataset with the test and training folders inside for running the code
#download data url<- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" download.file(url,"UCI HAR Dataset.zip") unzip("UCI HAR Dataset.zip",exdir="UCI HAR Dataset")
The R script reads and extracts test and training data set. It combines them into a single data set and extracts only mean and std measurements for all subjects and activities. The tidy data set contains aggregated value for the mean and std measurements for all subjects and activities and is written to disk.
The tidy data set can be viewed in excel by modifying the extention to csv(as it is created using ',' as seperator for enabling easy readability)
- The code read the training data sets X-training, Y-training and subject-training files.
- It assigns the column names Subject , Activity , Measurements from the features.txt and activity.txt files.
- Then the data from all three files are combined into one "Training" data set.
- The step 1,2,3 is performed for the test data set also.
- The training and test data sets are combined into one data set (named as "mergeddata").
- The requirement is to extract only mean and std measurements and this is done by extracting only meand and std columns from combined test and training data and the data set named "needed" is created.
- The tidy dataset is created by calculating the aggredated data from this "needed" data set for each subject and activity and the two unwanted column at the last is removed.
- At last , the resultant "tidy" data set is written to disk with ',' seperator for enabling easy readabilty(using excel)
Codebook explains about the resultant data.