# lab homework using r 1

#Lab 9

#274-Wilcox (Fall 2019)

#Name:

#Student ID:

rm(list=ls())

source(‘Rallfun-v33.txt’)

#PART 1

#Suppose you want to test whether or not boys like the T.V. show Rick and Morty better than girls. You ask 10 boys and 10 girls to rate the T.V. show on a scale from 1 to 7, where 7 is highly enjoyable and 1 is not enjoyable.

#For the boys, you observe the values:

# 5 7 6 3 1 1 2 4 5 6

#For the girls, you observe the values:

# 1 3 7 7 6 4 3 5 7 4

**#1.1) Are the groups independent?**

**#1.2) Examine the means and SDs for each of the groups**

**#1.3) Look at a histogram for both groups**

**#1.4) Use the appropriate t-test to test the null hypothesis that mean score for boys=girls **

**#1.5) Do you reject or fail to reject the null?**

#PART 2

#Suppose you want to test the effectiveness of a new drug, Cholmed, that helps treat high cholesterol in adults. In your study, you have 16 people who have been diagnosed with high cholesterol. To 8 of them, you give regular doses of Cholmed over one month. To the other 8 participants, you give regular doses of a placebo over one month. Then, you calculate the difference in the cholesterol levels for each participant.

#For the Cholmed group, you observe the values (in mg/dl):

# -33 -15 1 -75 -23 -3 -60 -57

#For the placebo group, you observe the values:

# -17 4 15 -8 -30 -7 -2 4

**#2.1) Are the groups independent? **

**#2.2) Examine the means and SDs for each of the groups **

**#2.3) Look at a histogram for both groups **

**#2.4) Use the appropriate t-test to test the null hypothesis that mean score for cholmed=placebo **

**#2.5) Do you reject or fail to reject the null?**

#PART 3

#You are curious about whether or not eating too much of a good thing will decrease a person’s preference for that type of food. You ask 12 people how much they like pizza on a scale from 1 to 7, where 7 is highly enjoyable and 1 is not enjoyable. Then, you make them eat a pizza for dinner every day for two weeks. After that, you ask them how much they like pizza.

#Before the study, you observe the values:

# 7 7 3 1 6 5 4 3 3 4 6 7

#After the study, you observe the values:

# 2 1 4 4 1 1 2 1 6 4 5 4

**#3.1) Are the groups independent? Examine the means and SDs for each of the groups**

**#3.2) Use the appropriate t-test to test the null hypothesis that mean score for before=after. Do you reject or fail to reject the null?**

**Cannot use external packages**

**Please see lecture notes below provided by professor to do lab homework above, and to follow their packages and formulas please**

#Lab 9-Contents (Lecture Notes)

#1. Comparing Independent Groups

#2. The T-test: The influence of differences in the Means

#3. The T-test: The influence of differences in the Variance

#4. The T-test: The influence of differences in Skewness

#5. The T-test with Trimmed Means: Yuen’s T

#6. Comparing Dependent Groups: The Paired T-test

#Goal: In this lab we will look at comparing

#independent and dependent groups in R

#———————————————————————————

# 1. Comparing Independent Groups

#———————————————————————————

#Previously we learned the function t.test(),

#which we used to conduct one-sample t-tests.

#We can extend our knowledge of this function

#for using it to compare 2 independent groups.

#All we need to do is to add another variable to the function

#Two ways to perform an Indepedent Sample t-test in R:

# t.test(y~g)

#where y is a vector of scores of the data and

#g is a grouping variable

# t.test(x, y)

#where x and y represent scores from group 1 and

#group 2, respectively

#Which to use depends on how your DATA are STRUCTURED.

#In Chapter 9, you learned two types of two-sample t tests:

#1) Student’s two-sample t-test (assumes variance is equal)

#2) Welch’s test (assumes variance is NOT equal)

#The function t.test() can run either;

#you change the parameter “var.equal”

#to change which test you use

#(remember: Student’s t assumes equal variances).

#To run two-sample t-test:

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# Student’s t-test: t.test(x, y, var.equal=TRUE) OR

# t.test(y~g, var.equal=TRUE)

# Welch’s test: t.test(x,y, var.equal=FALSE) OR

# t.test(y~g, var.equal=FALSE)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#We’re going to create some data we can practice on using

#the rnorm() function we learned before.

#Let’s assume that I measured 20 young people and

#25 old people on some measure

#(could be anything…perhaps IQ)

#I’ll simulate what their data might look like.

young=rnorm(n=20, mean=0, sd=1)

old=rnorm(n=25, mean=0.5, sd=1)

#Our interest here is find out if the mean for this

#variable is different between young and old people.

mean(young); mean(old)

#Because our data is structured such that each group

#has it’s own variable, we’ll use the t.test function

#that allows for that.

#Similarly, we need to decide if we’re going to use the

#student’s t-test or the welch’s t-test.

#NOTE: we simulated the data so that the variance would be

#equal, so we’ll presume that the student t-test is fine.

#Remember we are testing:

#H0: mu1 = mu2

#HA: mu1 != mu2

#We also have degrees_of_freedom=N1+N2-2

t.test(old, young, var.equal=TRUE)

#We then compare the p-value to our alpha level

#A) If pval < alpha, then Reject the Null Hypothesis

#B) If pval > alpha, then Fail to Reject the Null Hypothesis

#Fail to reject!

#OK…now, we’re going to rearrange our data so that

#it looks more like it will in real life when you collect it:

yourdata=as.data.frame(matrix(, ncol=2, nrow=45))

colnames(yourdata) = c(“outcome”, “group”)

yourdata[1:20, “outcome”]=young

yourdata[1:20, “group”]=1

yourdata[21:45, “outcome”]=old

yourdata[21:45, “group”]=2

yourdata

#We can then run the more usual specification which is:

t.test(yourdata$outcome ~ yourdata$group, var.equal=TRUE)

#For the rest of the lab, we will use the t.test()

#specification where each group has it’s own variable.

#??????????????????????????????????????????????????????????????#

#Thought Question 1: Given all that we’ve just discussed, what does

#it mean to reject the null hypothesis using our t.test?

#??????????????????????????????????????????????????????????????#

#———————————————————————————

# 2. The T-test: The influence of differences in the Means

#———————————————————————————

#Let’s first load the table object lab9.txt

lab9=read.table (‘lab9.txt’, header=T) #Choose lab9.txt

head(lab9)#Check if the table is properly stored

#Your table should look like the one below:

# x y1a y1b y2a y2b y3a y3b

#1 -3.16919102 -1.16919102 -0.669191 -6.575259 0.6328316 3.7191954 -0.2191954

#2 -6.35526894 -4.35526894 -3.855269 -16.133493 -0.4291944 3.1886523 0.3113477

#3 -1.91975197 0.08024803 0.580248 -2.826942 1.0493113 3.7193226 -0.2193226

#4 2.58377901 4.58377901 5.083779 10.683651 2.5504883 0.4866459 3.0133541

#5 -0.88984272 1.11015728 1.610157 0.262786 1.3926144 -9.3365268 12.8365268

#6 0.02129019 2.02129019 2.521290 2.996185 1.6963254 2.3916263 1.1083737

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 2-1:

# A) What are the means and SDs for the variables x, y1a, and y1b?

# B) Look histograms of the three variables.

# C) Based on what you saw in #A),

# what are the differences between these 3 variables?

# D) Run the appropriate t-test to test the following 2 null hypotheses:

#H0: mean x = mean y1a

#AND

#H0: mean x = mean y1b

# E) Do you reject or fail to reject the null hypotheses tested above in #D)?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

mean(lab9$x); sd(lab9$x)

mean(lab9$y1a); sd(lab9$y1a)

mean(lab9$y1b); sd(lab9$y1b)

#B)

hist(lab9$x)

hist(lab9$y1a)

hist(lab9$y1b)

#C)

#D)

t.test(lab9$x, lab9$y1a, var.equal=TRUE)

t.test(lab9$x, lab9$y1b, var.equal=TRUE)

#E)

#We then compare the p-value to our alpha level

#A) If pval < alpha, then Reject the Null Hypothesis

#B) If pval > alpha, then Fail to Reject the Null Hypothesis

#H0: mean x = mean y1a

#p-value=… therefore …

#H0: mean x = mean y1b

#p-value=… therefore …

#What conclusion can you make based on the results from

#the two t-tests? In other words, how does the difference

#in Means between two samples affect our t-test result

#assuming other things (e.g., sample size, difference in variance, skewness) are held constant.

#See Figure 9.3 in your book

#Power of any method based on means is highly sensitive to

#small changes in the tails of the distributions and that

#situations where outliers tend to occur have the potential

#of masking an important difference among the bulk of the

#participants.

#———————————————————————————

# 3. The T-test: The influence of differences in the Variance

#———————————————————————————

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 3-1:

# A) What are the means and SDs for the variables x, y2a, and y2b?

# B) Look histograms of the three variables.

# C) Based on what you saw in #A), what are the differences between these 3 variables?

# D) Run the appropriate t-test to test the following 2 null hypotheses:

#H0: mean x = mean y2a

#AND

#H0: mean x = mean y2b

# E) Do you reject or fail to reject the null hypotheses tested above in #D)?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

mean(lab9$x); sd(lab9$x)

mean(lab9$y2a); sd(lab9$y2a)

mean(lab9$y2b); sd(lab9$y2b)

#B)

hist(lab9$x)

hist(lab9$y2a)

hist(lab9$y2b)

#C)

#??

#D)

#??

#E)

#We then compare the p-value to our alpha level

#A) If pval < alpha, then Reject the Null Hypothesis

#B) If pval > alpha, then Fail to Reject the Null Hypothesis

#???

#What conclusion can you make based on the results from the

#two t-tests? In other words, how does the difference

#in variance between two samples affect our t-test result

#assuming other things (e.g., sample size,

#difference in means, skewness) are held constant.

#Problems with controlling the probability of a Type I error occur:

#It can yield inaccurate confidence intervals when sampling

#from normal distributions with unequal sample sizes

#and unequal variances.

#———————————————————————————

# 4. The T-test: The influence of differences in Skewness

#———————————————————————————

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 4-1:

# A) What are the means and SDs for the variables x, y3a, and y3b?

# B) Look histograms of the three variables.

# C) Based on what you saw in #A), what are the differences between these 3 variables?

# D) Run the appropriate t-test to test the following 2 null hypotheses:

#H0: mean x = mean y3a

#AND

#H0: mean x = mean y3b

# E) Do you reject or fail to reject the null hypotheses tested above in #D)?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

mean(lab9$x); sd(lab9$x)

mean(lab9$y3a); sd(lab9$y3a)

mean(lab9$y3b); sd(lab9$y3b)

#B)

hist(lab9$x)

hist(lab9$y3a) #Left Skew

hist(lab9$y3b) #Right Skew

#C)

#Same Means (kinda), same variance, just different skew

#D)

t.test(lab9$x, lab9$y3a, var.equal=TRUE)

t.test(lab9$x, lab9$y3b, var.equal=TRUE)

#E)

#We then compare the p-value to our alpha level

#A) If pval < alpha, then Reject the Null Hypothesis

#B) If pval > alpha, then Fail to Reject the Null Hypothesis

#Fail to reject H0

#Reject H0

#Figure 9.2 in your book

#When dealing with groups that differ in skewness,

#again problems with controlling the probability of a Type I error occur,

#and the combination of unequal variances and different amounts of skewness

#makes matters worse.

#———————————————————————————

# 5. The T-test with Trimmed Means: Yuen’s T

#———————————————————————————

#As we saw above, skewness can have an influence on

#our ability to detect statistical significance.

#Using our univariate outlier detection method (MAD-Median),

#we can see that both y3a and y3b have numerous outliers

#NOTE: You must load the Rallfun-v33 source code to use the out() function

out(lab9$y3a) #5 outliers

out(lab9$y3b) #5 outliers

#Given that we have these outliers, we should consider using

#a technique that compares the Trimmed Means

#The Trimmed Means version of a t-test is called Yuen’s Method

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# Yuen’s Trimmed t-test: yuen(x, y, tr=0.2)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#Exercise 5-1:

# A) Re-test the hypotheses from Exercise 4-1 using Yuen’s

# method (with 20% trimming)

# B) How do the results comapre to what you concluded previously?

#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#*#

#A)

#HINT: find the p-value to make conclusion

yuen(lab9$x, lab9$y3a, tr=0.2)

yuen(lab9$x, lab9$y3b, tr=0.2)

#B)

#???

#Given what we’ve just seen, how might skewness

#affect the results we get?

#———————————————————————————

# 6. Comparing Dependent Groups: The Paired T-test

#———————————————————————————

#We just learned the various ways to conduct a t-test

#for independent groups (ie. different people in each group).

#We used Student’s t-test, Welch’s, and Yuen’s.

#Now we are going to learn about what to do when

#our groups are dependent (ie. same people in each group).

#Often the dependent groups we see are based on time,

#such that people are measured at baseline (group1)

#and we want to see how much they’ve changed at followup (group2).

#Here is an example:

#We are interested in measuring the weight of college freshmen before and after the first year

#(to test whether or not there is any truth to the “freshman 15”?). We tested 16 freshmen on the

#first day of school, and tested them again after finals week of the spring semester. The data

#is recorded in a table in frosh.txt.

frosh=read.table(‘frosh.txt’, header=T)

frosh

#It is not valid to use a two-sample t-test to compare

#the weight gain becuase the data are dependent (on the

#same person)

#Instead we will use the paired t-test.

#The formula for calculating a paired t-test in R is again t.test().

#However, we change one parameter:

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

# Paired t-test: t.test(x, y, paired=TRUE)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^#

#Let’s run a PAIRED t-test for our data from the freshmen:

t.test(frosh$Weight_before,frosh$Weight_after, paired=TRUE)

#What would have happend if we ignored the fact that

#this was a PAIRED or REPEATED Measure Design?

#Let’s find out by running the normal independent samples T-Test

t.test(frosh$Weight_before,frosh$Weight_after) #Remember: This is WRONG