Google Analytics, Capstone: How Can a Wellness Technology Company Play It Smart?

Introduction

In this case study, I will perform real-world analysis at a fictional company, Bellabeat, a high-tech manufacturer of health-focused products for women using R and Tableau.

Data Source

FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness trackers from thirty Fitbit users.

Scenario

You are a junior data analyst on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, co-founder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide the company's marketing strategy. You will present your analysis to the Bellabeat executive team and your high-level recommendations for Bellabeat’s marketing strategy.

Background

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

Ask

Business Task

Analyze smart device usage data to gain insight into how consumers use non-Bellabeat smart devices.

Analysis Questions

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?

Data Preparation

Load CSV files

Remember to upload your CSV files to your project from the relevant data source: https://www.kaggle.com/arashnic/fitbit

daily_activity <- read.csv("dailyActivity_merged.csv")

Repeat this step for all csv files

Load and install common packages and libraries

#set working directory
#setwd("~/Fitbit Case Study")
#install.packages('tidyverse')
#install.packages('skimr')
library(tidyverse) #wrangle data
library(dplyr) #clean data
library(skimr) #get summary data
library(ggplot2) #visualize data
library(readr) #save csv

Data Exploration

Explore a few key tables

Take a look at the daily_activity data.

head(daily_activity)

Identify all the columns in the daily_activity data.

colnames(daily_activity)

Take a look at the sleep_day data.

head(sleep_day)

Identify all the columns in the daily_activity data.

colnames(sleep_day)

Data Summary Statistics

How many unique participants are there in each dataframe? It looks like there may be more participants in the daily activity dataset than in the sleep dataset.

n_distinct(daily_activity$Id)
n_distinct(sleep_day$Id)
n_distinct(calries$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

33

24

33

8

How many observations are there in each dataframe?

nrow(daily_activity)
nrow(sleep_day)

Summary statistics for dailyActivity, sleepDay, hourlySteps, and weightInfo

dailyActivity_merged %>% 
  select(TotalSteps,
         TotalDistance,
         VeryActiveMinutes,
         FairlyActiveMinutes,
         LightlyActiveMinutes,
         SedentaryMinutes,
         Calories) %>% 
  summary()

sleepDay_merged %>% 
  select(TotalSleepRecords,
         TotalMinutesAsleep,
         TotalTimeInBed) %>% 
  summary()

hourlySteps_merged %>% 
  select(ActivityHour,
         StepTotal) %>% 
  summary()

weightLogInfo_merged %>% 
  select(WeightPounds,
         Fat,
         BMI) %>% 
summary()

What are some quick summary statistics we'd want to know about each data frame?

Daily activity dataframe:

daily_activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes) %>%
  summary()

Percentage of Activity Types

# Summarize the total minutes spent on different activities
activity_summary <- dailyActivity_merged %>%
  summarise(
    TotalSedentaryMinutes = sum(SedentaryMinutes),
    TotalLightlyActiveMinutes = sum(LightlyActiveMinutes),
    TotalFairlyActiveMinutes = sum(FairlyActiveMinutes),
    TotalVeryActiveMinutes = sum(VeryActiveMinutes)
  )

# Convert the data to long format for easier plotting
activity_long <- activity_summary %>%
  pivot_longer(
    cols = everything(),
    names_to = "ActivityType",
    values_to = "Minutes"
  )

# Calculate the percentage of each activity type
activity_long <- activity_long %>%
  mutate(Percentage = Minutes / sum(Minutes) * 100)

# Define high contrast colors
high_contrast_colors <- c("#FFFF00", "#008000", "#FFA500", "#FF0000")

# Create the pie chart with custom colors and percentage labels
pie_chart <- ggplot(activity_long, aes(x = "", y = Percentage, fill = ActivityType)) +
  geom_bar(width = 1, stat = "identity", color = "white") +
  coord_polar("y") +
  theme_void() +
  scale_fill_manual(values = high_contrast_colors) +
  geom_text(aes(label = sprintf("%.1f%%", Percentage)), 
            position = position_stack(vjust = 0.5)) +
  labs(title = "Percentage of Activity Types")

# Display the pie chart
print(pie_chart)

## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016         13162          8.5             8.5 
## 2 1503960366 4/13/2016         10735          6.97            6.97
## 3 1503960366 4/14/2016         10460          6.74            6.74
## 4 1503960366 4/15/2016          9762          6.28            6.28
## 5 1503960366 4/16/2016         12669          8.16            8.16
## 6 1503960366 4/17/2016          9705          6.48            6.48

Sleep dataframe:

sleep_day %>%  
  select(TotalSleepRecords,
         TotalMinutesAsleep,
         TotalTimeInBed) %>%
  summary()

 ## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
 ## Min.   :1.000     Min.   : 58.0      Min.   : 61.0  
 ## 1st Qu.:1.000     1st Qu.:361.0      1st Qu.:403.0  
 ## Median :1.000     Median :433.0      Median :463.0  
 ## Mean   :1.119     Mean   :419.5      Mean   :458.6  
 ## 3rd Qu.:1.000     3rd Qu.:490.0      3rd Qu.:526.0  
 ## Max.   :3.000     Max.   :796.0      Max.   :961.0

What does this tell us about our sample of activities?

Data Plots

What's the relationship between steps taken in a day and sedentary minutes? How could this help inform the customer segments that we can market to? E.g. position this more as a way to get started in walking more? Or to measure steps that you're already taking?

ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point()

What's the relationship between minutes asleep and time in bed? You might expect it to be almost completely linear - are there any unexpected trends?

ggplot(data=sleep_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point()

What could these trends tell you about how to help market this product? Or areas where you might want to explore further?

### Summary statistics and merged data:
```r
combined_data <- merge(sleep_day, daily_activity, by="Id")

Relationship between minutes and time in bed

Relationship between steps taken in a day and sedentary minutes

ggplot(data=dailyActivity_merged, aes(x=TotalSteps, y=SedentaryMinutes)) + 
  geom_point() + 
  geom_smooth() + 
  labs(title="Total Steps vs. Sedentary Minutes",
       x = "Steps", y = "Minutes")

Process

R Studio is used to upload the dataset, format, clean, and prepare data to be loaded into Tableau

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
code		code
data		data
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Analytics, Capstone: How Can a Wellness Technology Company Play It Smart?

Introduction

Data Source

Scenario

Background

Ask

Business Task

Analysis Questions

Data Preparation

Data Exploration

Data Summary Statistics

What are some quick summary statistics we'd want to know about each data frame?

Daily activity dataframe:

Percentage of Activity Types

Sleep dataframe:

Data Plots

Process

R Visualization in Tableau

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Google Analytics, Capstone: How Can a Wellness Technology Company Play It Smart?

Introduction

Data Source

Scenario

Background

Ask

Business Task

Analysis Questions

Data Preparation

Data Exploration

Data Summary Statistics

What are some quick summary statistics we'd want to know about each data frame?

Daily activity dataframe:

Percentage of Activity Types

Sleep dataframe:

Data Plots

Process

R Visualization in Tableau

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages