Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Question 1)

diff=0.3 and sd=2.9, at n=50 and alpha=5%. We cannot reject the hypothesis.

i. Would this scenario create systematic or random error (or neither)? ii. Which part of the t-statistic (diff, sd, n, alpha) would be affected? iii. Will it increase or decrease our power to reject the null hypothesis? (see class notes on power) iv. Which kind of error (Type I or Type II) becomes more likely because of this scenario?

Consider each of the following scenarios and answer the above questions.

a).You discover that your colleague wanted to target the general population of Taiwanese users of the product. However, he sent out a survey on popular college social networks, and missed many older customers who you suspect might be more satisfied with the product.

i. This scenario would create neither systematic nor random error since the error is not generated by measurement. In fact, the error here is coverage error that the sample used does not properly represent the underlying population being measured.

ii. As the scenario states, there's still a large part of population that we missed. Therefore, comparing with complete sampling, the t-statistic diff, sd would be affected. alpha would not change here.

iii. The missed older customers seem to be more satisfied with the product but we did not include them. Thus the mistake will decrease our power to reject the null hypothesis.

iv. Type II error becomes more likely.

b).You find that 20 of the respondents don't seem to have bought the watch, so they should be removed from the data. These 20 people are just like the others in every other respect.

i. The sample size will change from 50 to 30, and it will increase the standard error, which is divided by sample size. Therefore, this scenario creates random error.

ii. n will change here.

iii. Decreasing of n will reduce the purple area; that is, the power to reject the null hypothesis will be weaker.

iv. Type II error becomes more likely.

c).A very annoying professor visiting your company has criticized your colleague's "95% confidence" criteria, and has suggested relaxing it to just 90%.

i. Here we just modified the criteria and no extra error would be generated.

ii. alpha will change.

iii. This scenario will increase the power to reject the null hypothesis since the purple area increase.

iv. We make alpha bigger so chance to meet type I error increase.

d).Your colleague has measured satisfaction as "I would consider buying another digital watch for a friend or family member". But you feel that this measure will score very low for teenagers without disposable incomes, whereas older people might exaggerate their high intentions to buy another watch.

i. Due to the bias states of satisfaction measurement, some error with certain pattern that teenagers tend to score low will show. Therefore, this scenario would create systematic error.

ii. diff(depend on the ratio of older and young people ) and sd will change here.

iii. The sd might become larger and the purple area decrease, that is, the power to reject the null hypothesis will be weaker.

iv. Type II error becomes more likely.

Question 2) A psychological research paper has published an experiment to see if emotion affects our perception of color on different color-axes. Participants viewed one of two videos: either the famous death scene in the Lion King, or a video of a desktop screensaver — let’s call these the sad and neutral conditions, respectively. Afterwards, participants performed a color discrimination task requiring them to classify colors along the red-green color-axis and blue-yellow color axis. The dependent measures are the accuracy in each of the color conditions (red-green and blue-yellow). The researchers found some potential difference in the blue-yellow accuracy of sad versus neutral participants, but not so for red-green accuracy. Let’s examine their findings more carefully. You will find the experiment data in the file study2Data.csv on Canvas.

a).Visualize the differences between blue-yellow accuracy (BY_ACC) and red-green accuracy (RG_ACC) for both the sad and neutral viewers (Emotion_Condition). You are free to choose any visualization method you wish, but only report the most useful or interesting visualizations and any first impressions

study2Data <- read.csv("C:/Users/tsunh/Desktop/Schoolwork/BASM/study2Data.csv")
sadness <- study2Data[study2Data$Emotion_Condition=="Sadness",]
neutral <- study2Data[study2Data$Emotion_Condition=="Neutral",]
plot(density(sadness$RG_ACC),main="RG_ACC",lwd = 2)
lines(density(neutral$RG_ACC),col="dodgerblue1",lwd = 2)
legend(0.1,1.5,c("Sadness","Neutral"),lty = c(1,1),col=c("black","dodgerblue1"),cex=0.7)
abline(v=mean(sadness$RG_ACC),lwd = 2,lty=2)
abline(v=mean(neutral$RG_ACC),lwd = 2,lty=2,col="dodgerblue1")
plot(density(sadness$BY_ACC),main="BY_ACC",lwd=2)
lines(density(neutral$BY_ACC),col="dodgerblue1",lwd=2)
legend(0.8,4,c("Sadness","Neutral"),lty = c(1,1),col=c("black","dodgerblue1"),cex=0.7)
abline(v=mean(sadness$BY_ACC),lwd = 2,lty=2)
abline(v=mean(neutral$BY_ACC),lwd = 2,lty=2,col="dodgerblue1")
library(dplyr)
sadness <- sadness %>% mutate(RG_BY_diff = RG_ACC - BY_ACC)
neutral <- neutral %>% mutate(RG_BY_diff = RG_ACC - BY_ACC)
plot(density(sadness$RG_BY_diff),main="RG_ACC and BY_ACC difference",lwd = 2)
lines(density(neutral$RG_BY_diff),col="dodgerblue1",lwd = 2)
legend(0.3,2,c("Sadness","Neutral"),lty = c(1,1),col=c("black","dodgerblue1"),cex=0.7)
abline(v=mean(sadness$RG_BY_diff),lwd = 2,lty=2)
abline(v=mean(neutral$RG_BY_diff),lwd = 2,lty=2,col="dodgerblue1")

From the above figures, I show the density plot of the accuracy from different treatment. In addition, means from different are also shown (dash line). Basically, there's no significant difference between them and we can use some hypothesis test for futher discussion.

library(ggplot2)
library(reshape2)
four_variable_df <- data.frame(neutral_RG = neutral$RG_ACC, neutral_BY = neutral$BY_ACC,
                               sadness_RG = sadness$RG_ACC, sadness_BY = sadness$BY_ACC)
mm <- melt(four_variable_df)
means <- aggregate(value ~  variable, mm, mean)
ggplot(mm,aes(x =variable,y =value))+
  geom_boxplot(aes(fill=factor(variable)),show.legend = F)+
  coord_flip()+
  geom_point( alpha = 0.3)+
  stat_summary(fun.y=mean, colour="red1", geom="point", shape=18, size=4)+
  geom_hline(yintercept = mean(mm$value),linetype = 2,colour = "dodgerblue2",size = 1,show.legend = T)+
  xlab("")+
  ylab("ACC")+
  guides(fill=guide_legend(title=NULL))

Here I use a boxplot to summarize the above density plots. Each treatment is in different color, the red dots are within means and the blue dash line is total mean. It seems that there are a little difference between them in terms of median.

b).Run a t-test (traditional or bootstrapped) to check if there's a significant difference in blue-yellow accuracy between sad and neutral participants at 95% confidence.

t.test(neutral$BY_ACC,sadness$BY_ACC)

After conducting a t-test and received a summary, I think we should reject the null hypothesis, which indicates that difference in means is zero, since the p-value is under 0.05. Briefly, there is a significant difference in blue-yellow accuracy between two treatment.

c).Run a t-test (traditional or bootstrapped) to check if there's a significant difference in red-green accuracy between sad and neutral participants at 95% confidence.

t.test(neutral$RG_ACC,sadness$RG_ACC)

The p-value is greater than 0.05 and 95 percent confidence interval contains zero. Therefore, I think there's no a significant difference in red-green accuracy between two group.

d).(not graded) Do the above t-tests support a claim that there is an interaction between emotion and color axis? (i.e., does people’s accuracy of color perception along different color-axes depend on their emotion? Here, accuracy is an outcome variable, while color-axis and emotion are independent factors)

The above t-tests did not consider difference of two type discrimination task and the interaction between color and emotion. Maybe the interaction will affact the accuracy, so we could do a futher analysis.

e).Run a factorial design ANOVA where color perception accuracy is determined by emotion (sad vs. neutral), color-axis (RG vs. BY), and the interaction of emotion and color-axis. Note that you will likely have to reshape the data and create new columns -- please ask/discuss/share your data shaping strategy online. Are any of these three factors (emotion/color-axis/interaction) possibly influencing color perception accuracy at any meaningful level of confidence?

# first I combine data from two treament and melt the data frame to meet the function need
combined_mm <- rbind(neutral %>% select(Emotion_Condition,RG_ACC,BY_ACC),
                  sadness %>% select(Emotion_Condition,RG_ACC,BY_ACC)) %>% melt()
colnames(combined_mm) <- c("Emotion_Condition","color","accuracy")
discrim_aov <- aov(accuracy ~ Emotion_Condition + color + Emotion_Condition:color,
                   data = combined_mm)
summary(discrim_aov)

Acccording to the above summarization, the factor "Eomtion_Condition" seems to be protruding than other factor. However, the p-value is not small enough. To deal with that, we could set the level of confidence to 90%. In this way, the p-value of Emotion_Condition is smaller than 0.1 and we can claim that emotion possibly influences color perception under 90% confidence level. The following figures reveal that the mean of residuals are similar and with normal distribution.

layout(matrix(c(1,2,3,4),2,2)) 
plot(discrim_aov)
with(combined_mm, interaction.plot(x.factor = Emotion_Condition, trace.factor = color,
                                   response = accuracy,lwd=3))

The plot is under small scale thus might mislead people.