Coursera - Getting and cleaning data: Project

Introduction

(From the course project webpage)

One of the most exciting areas in all of data science right now is wearable computing - see for example this article: http://www.insideactivitytracking.com/data-science-activity-tracking-and-the-battle-for-the-worlds-top-sports-brand/ . Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:

http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Objective

(From the course project webpage)

Create one R script called run_analysis.R that does the following.

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Files of the study

README.md: This file.
Codebook.md: The codebook with the information about the tidy dataset and the summary data set.
run_analysis.R: The script with the steps to process the raw data into the tidy dataset and the summary data set.

Technical information

Tests

The script run_analysis.R was executed more than once to confirm it generates the same result each time.

Tool versions

The script was written and executed using RStudio 0.99.465 with R 3.2.1 installed. The machine used to write and execute the script is using Windows 8.1 64bits.

Library dependencies

The script uses the following R libraries:

data.table
plyr
dplyr

Instruction list

In order to generate the tidy datasets:

Step 1: Run the script run_analysis.R in R version 3.2.1 or newer.
Step 2: After the execution, in R (or R studio) you can use the tidy data set by using the object mergedDataset.
Step 3: On the /data/ subdirectory of the working directory you will find the summary data set in the file Summary UCI HAR Dataset - Average values by subject and activity.txt .

Explanation of the script

The script run_analysis.R does the following:

Create the subdirectory /data/ if it doesn't exists,
Download the raw data files from the URL mentioned on the Introduction section, if they aren't already downloaded.
Read the labels for features and activities
Read and merge the test and training sets of subjects, measurements and activities
Select only the measurements for the mean and standard deviation. In this step, the meanFreq() measurements (as explained on the features_info.txt file of the original experiment) are not included here because the nature of those values is out of the current project scope.
Rename the variables for measurements with more descriptive names. At this step, the mergedDataset object is generated in R and can be used after the execution of the script (as described on step 2 of the Instruction list section)
Generate the summary data set and write the output to the file Summary UCI HAR Dataset - Average values by subject and activity.txt into the working directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coursera - Getting and cleaning data: Project

Introduction

Objective

Files of the study

Technical information

Tests

Tool versions

Library dependencies

Instruction list

Explanation of the script

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Codebook.md		Codebook.md
README.md		README.md
run_analysis.R		run_analysis.R

Folders and files

Latest commit

History

Repository files navigation

Coursera - Getting and cleaning data: Project

Introduction

Objective

Files of the study

Technical information

Tests

Tool versions

Library dependencies

Instruction list

Explanation of the script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages