t-test
Problem
You want to test whether two samples are drawn from populations with different means, or test whether one sample is drawn from a population with a mean different from some theoretical mean.
Solution
Sample data
We will use the built-in sleep
data set.
sleep
#> extra group ID
#> 1 0.7 1 1
#> 2 -1.6 1 2
#> 3 -0.2 1 3
#> 4 -1.2 1 4
#> 5 -0.1 1 5
#> 6 3.4 1 6
#> 7 3.7 1 7
#> 8 0.8 1 8
#> 9 0.0 1 9
#> 10 2.0 1 10
#> 11 1.9 2 1
#> 12 0.8 2 2
#> 13 1.1 2 3
#> 14 0.1 2 4
#> 15 -0.1 2 5
#> 16 4.4 2 6
#> 17 5.5 2 7
#> 18 1.6 2 8
#> 19 4.6 2 9
#> 20 3.4 2 10
We’ll also make a wide version of the sleep
data; below we’ll see how to work with data in both long and wide formats.
sleep_wide <- data.frame(
ID=1:10,
group1=sleep$extra[1:10],
group2=sleep$extra[11:20]
)
sleep_wide
#> ID group1 group2
#> 1 1 0.7 1.9
#> 2 2 -1.6 0.8
#> 3 3 -0.2 1.1
#> 4 4 -1.2 0.1
#> 5 5 -0.1 -0.1
#> 6 6 3.4 4.4
#> 7 7 3.7 5.5
#> 8 8 0.8 1.6
#> 9 9 0.0 4.6
#> 10 10 2.0 3.4
Comparing two groups: independent two-sample t-test
Suppose the two groups are independently sampled; we’ll ignore the ID variable for the purposes here.
The t.test
function can operate on long-format data like sleep
, where one column (extra
) records the measurement, and the other column (group
) specifies the grouping; or it can operate on two separate vectors.
# Welch t-test
t.test(extra ~ group, sleep)
#>
#> Welch Two Sample t-test
#>
#> data: extra by group
#> t = -1.8608, df = 17.776, p-value = 0.07939
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -3.3654832 0.2054832
#> sample estimates:
#> mean in group 1 mean in group 2
#> 0.75 2.33
# Same for wide data (two separate vectors)
# t.test(sleep_wide$group1, sleep_wide$group2)
By default, t.test
does not assume equal variances; instead of Student’s t-test, it uses the Welch t-test by default. Note that in the Welch t-test, df=17.776, because of the adjustment for unequal variances. To use Student’s t-test, set var.equal=TRUE
.
# Student t-test
t.test(extra ~ group, sleep, var.equal=TRUE)
#>
#> Two Sample t-test
#>
#> data: extra by group
#> t = -1.8608, df = 18, p-value = 0.07919
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -3.363874 0.203874
#> sample estimates:
#> mean in group 1 mean in group 2
#> 0.75 2.33
# Same for wide data (two separate vectors)
# t.test(sleep_wide$group1, sleep_wide$group2, var.equal=TRUE)
Paired-sample t-test
You can also compare paired data, using a paired-sample t-test. You might have observations before and after a treatment, or of two matched subjects with different treatments.
Again, the t-test
function can be used on a data frame with a grouping variable, or on two vectors. It relies the relative position to determine the pairing. If you are using long-format data with a grouping variable, the first row with group=1 is paired with the first row with group=2. It is important to make sure that the data is sorted and there are not missing observations; otherwise the pairing can be thrown off. In this case, we can sort by the group
and ID
variables to ensure that the order is the same. For more on sorting see Sorting.
# Sort by group then ID
sleep <- sleep[order(sleep$group, sleep$ID), ]
# Paired t-test
t.test(extra ~ group, sleep, paired=TRUE)
#>
#> Paired t-test
#>
#> data: extra by group
#> t = -4.0621, df = 9, p-value = 0.002833
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -2.4598858 -0.7001142
#> sample estimates:
#> mean of the differences
#> -1.58
# Same for wide data (two separate vectors)
# t.test(sleep.wide$group1, sleep.wide$group2, paired=TRUE)
The paired t-test is equivalent to testing whether difference between each pair of observations has a population mean of 0. (See below for comparing a single group to a population mean.)
t.test(sleep.wide$group1 - sleep.wide$group2, mu=0, var.equal=TRUE)
#> Error in t.test(sleep.wide$group1 - sleep.wide$group2, mu = 0, var.equal = TRUE): object 'sleep.wide' not found
Comparing a group against an expected population mean: one-sample t-test
Suppose that you want to test whether the data in column extra
is drawn from a population whose true mean is 0. In this case, the group
and ID
columns are ignored.
t.test(sleep$extra, mu=0)
#>
#> One Sample t-test
#>
#> data: sleep$extra
#> t = 3.413, df = 19, p-value = 0.002918
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#> 0.5955845 2.4844155
#> sample estimates:
#> mean of x
#> 1.54
To visualize the groups, see ../../Graphs/Plotting distributions (ggplot2), ../../Graphs/Histogram and density plot,and ../../Graphs/Box plot.