1 Aims of this exercise

In this tutorial you will further sharpen your skills in

  • data wrangling
  • formulating the null and alternative hypothesis of t-tests
  • critically evaluating the assumptions of t-tests,
  • in selecting the appropriate test for answering the research question, and
  • in formulating your conclusion in terms of the research question.

2 Cuckoo dataset

The common cuckoo does not build its own nest: it prefers to lay its eggs in another birds’ nest. It is known, since 1892, that the type of cuckoo bird eggs are different between different locations. In a study from 1940, it was shown that cuckoos return to the same nesting area each year, and that they always pick the same bird species to be a “foster parent” for their eggs.

Over the years, this has lead to the development of geographically determined subspecies of cuckoos. These subspecies have evolved in such a way that their eggs look as similar as possible as those of their foster parents.

The cuckoo dataset contains information on 120 Cuckoo eggs, obtained from randomly selected “foster” nests. For these eggs, researchers have measured the length (in mm) and established the type (species) of foster parent. The type column is coded as follows:

  • type=1: Meadow pipit
  • type=2: Tree pipit
  • type=3: Dunnock
  • type=4: European robin
  • type=5: White wagtail
  • type=6: Eurasian wren

3 Research question

The researchers want to test if the type of foster parent has an effect on the average length of the cuckoo eggs.

In practice, they want to study this for all six species. However, a t-test can only be used to study mean differences between two groups. If we want to analyze multiple groups, there are two options.

  1. We perform t-tests on all pairwise combinations of types. This mean we need to perform n*(n-1)/2 = 15 t-tests.

  2. We perform an ANOVA analysis.

The second strategy is much more efficient and has a higher statistical power. We will learn all about ANOVA in chapter 7.

In this tutorial, we will assess a single pairwise comparison, between the European robin and the European wren. In a following tutorial, we will come back to this dataset and make a full analysis with ANOVA.

Load the required libraries

library(tidyverse)

4 Import data

Cuckoo <- read_tsv("https://raw.githubusercontent.com/statOmics/PSLSData/main/Cuckoo.txt")
head(Cuckoo)

5 Tidy data

For this exercise, we only care about the European robin and the Eurasian wren. Therefore, we can remove the observations of the other types. In addition, it seems that the type column rather than a factor. Let’s fix this:

Cuckoo <- Cuckoo %>%
  filter(type %in% c("4", "6")) %>%
  mutate(type = as.factor(type))

6 Data exploration

How many birds do we have for each type?

Cuckoo %>%
  count(type)

Visualize the data using a suitable strategy

Cuckoo %>%
  ggplot(aes(x = type, y = length, fill = type)) +
  geom_boxplot() +
  theme_bw() +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  scale_fill_manual(values = c("dimgrey", "firebrick")) +
  ggtitle("Boxplot of the length of eggs per type") +
  ylab("length (mm)") +
  stat_summary(
    fun = mean, geom = "point",
    shape = 5, size = 3, color = "black",
  )

We clearly see that, on average, the eggs laid in the nest of the European robin (type=4) are larger than those laid in the nest of the Eurasian wren. But is this difference significant?

7 Analysis

We can test this with an unpaired, two-sample t-test. But before we can start the analysis, we must check if all assumptions to perform a t-test are met.

7.1 Check the assumptions

  1. The observations are independent of each other (in both groups)

  2. The data (length) must be normally distributed (in both groups)

Additionally, we should check if the variability within both groups is similar or not (in the lattter case we should use a Welch t-test).

  1. The variability within both groups is similar

The first assumption is met, as we may assume that there are no specific patterns of correlation randomly selected nests.

To check the normality assumption, we will use QQ plots.

Cuckoo %>%
  ggplot(aes(sample = length)) +
  geom_qq() +
  geom_qq_line() +
  facet_grid(~ type)

There seems to be no clear deviation from normality.

The third assumption seems to be met based on our visualization with boxplots. Indeed the interquartile range of the boxes are very comparable. As all assumptions are met, we may proceed with the analysis.

7.2 Hypothesis testing

output <- t.test(length ~ type, data = Cuckoo, var.equal = TRUE)
output
## 
##  Two Sample t-test
## 
## data:  length by type
## t = 5.633, df = 29, p-value = 4.378e-06
## alternative hypothesis: true difference in means between group 4 and group 6 is not equal to 0
## 95 percent confidence interval:
##  0.9203528 1.9696472
## sample estimates:
## mean in group 4 mean in group 6 
##          22.575          21.130

7.3 Conclusion

There is an extremely significant difference in mean length of Cuckoo eggs fostered by the European robin and those fostered by the Eurasian wren (p << 0.001). The eggs fostered by the European wren are on average 1.45 mm longer (95% CI [0.92, 1.97]).

LS0tCnRpdGxlOiAiRXhlcmNpc2UgNS4zOiBIeXBvdGhlc2lzIHRlc3Rpbmcgb24gdGhlIGN1Y2tvbyBkYXRhc2V0IC0gc29sdXRpb24iCmF1dGhvcjogIkxpZXZlbiBDbGVtZW50LCBKZXJvZW4gR2lsaXMgYW5kIE1pbGFuIE1hbGZhaXQiCmRhdGU6ICJzdGF0T21pY3MsIEdoZW50IFVuaXZlcnNpdHkgKGh0dHBzOi8vc3RhdG9taWNzLmdpdGh1Yi5pbykiCi0tLQoKIyBBaW1zIG9mIHRoaXMgZXhlcmNpc2UKCkluIHRoaXMgdHV0b3JpYWwgeW91IHdpbGwgZnVydGhlciBzaGFycGVuIHlvdXIgc2tpbGxzIGluCgotIGRhdGEgd3JhbmdsaW5nCi0gZm9ybXVsYXRpbmcgdGhlIG51bGwgYW5kIGFsdGVybmF0aXZlIGh5cG90aGVzaXMgb2YgdC10ZXN0cwotIGNyaXRpY2FsbHkgZXZhbHVhdGluZyB0aGUgYXNzdW1wdGlvbnMgb2YgdC10ZXN0cywKLSBpbiBzZWxlY3RpbmcgdGhlIGFwcHJvcHJpYXRlIHRlc3QgZm9yIGFuc3dlcmluZyB0aGUgcmVzZWFyY2ggcXVlc3Rpb24sIGFuZAotIGluIGZvcm11bGF0aW5nIHlvdXIgY29uY2x1c2lvbiBpbiB0ZXJtcyBvZiB0aGUgcmVzZWFyY2ggcXVlc3Rpb24uCgoKIyBDdWNrb28gZGF0YXNldAoKVGhlIGNvbW1vbiBjdWNrb28gZG9lcyBub3QgYnVpbGQgaXRzIG93biBuZXN0OiBpdCBwcmVmZXJzCnRvIGxheSBpdHMgZWdncyBpbiBhbm90aGVyIGJpcmRzJyBuZXN0LiBJdCBpcyBrbm93biwgc2luY2UgMTg5MiwKdGhhdCB0aGUgdHlwZSBvZiBjdWNrb28gYmlyZCBlZ2dzIGFyZSBkaWZmZXJlbnQgYmV0d2VlbiBkaWZmZXJlbnQKbG9jYXRpb25zLiBJbiBhIHN0dWR5IGZyb20gMTk0MCwgaXQgd2FzIHNob3duIHRoYXQgY3Vja29vcyByZXR1cm4KdG8gdGhlIHNhbWUgbmVzdGluZyBhcmVhIGVhY2ggeWVhciwgYW5kIHRoYXQgdGhleSBhbHdheXMgcGljawp0aGUgc2FtZSBiaXJkIHNwZWNpZXMgdG8gYmUgYSAiZm9zdGVyIHBhcmVudCIgZm9yIHRoZWlyIGVnZ3MuCgpPdmVyIHRoZSB5ZWFycywgdGhpcyBoYXMgbGVhZCB0byB0aGUgZGV2ZWxvcG1lbnQgb2YgZ2VvZ3JhcGhpY2FsbHkKZGV0ZXJtaW5lZCBzdWJzcGVjaWVzIG9mIGN1Y2tvb3MuIFRoZXNlIHN1YnNwZWNpZXMgaGF2ZSBldm9sdmVkIGluCnN1Y2ggYSB3YXkgdGhhdCB0aGVpciBlZ2dzIGxvb2sgYXMgc2ltaWxhciBhcyBwb3NzaWJsZSBhcyB0aG9zZQpvZiB0aGVpciBmb3N0ZXIgcGFyZW50cy4KClRoZSBjdWNrb28gZGF0YXNldCBjb250YWlucyBpbmZvcm1hdGlvbiBvbiAxMjAgQ3Vja29vIGVnZ3MsCm9idGFpbmVkIGZyb20gcmFuZG9tbHkgc2VsZWN0ZWQgImZvc3RlciIgbmVzdHMuCkZvciB0aGVzZSBlZ2dzLCByZXNlYXJjaGVycyBoYXZlIG1lYXN1cmVkIHRoZSBgbGVuZ3RoYCAoaW4gbW0pCmFuZCBlc3RhYmxpc2hlZCB0aGUgYHR5cGVgIChzcGVjaWVzKSBvZiBmb3N0ZXIgcGFyZW50LgpUaGUgdHlwZSBjb2x1bW4gaXMgY29kZWQgYXMgZm9sbG93czoKCi0gYHR5cGU9MWA6IE1lYWRvdyBwaXBpdAotIGB0eXBlPTJgOiBUcmVlIHBpcGl0Ci0gYHR5cGU9M2A6IER1bm5vY2sKLSBgdHlwZT00YDogRXVyb3BlYW4gcm9iaW4KLSBgdHlwZT01YDogV2hpdGUgd2FndGFpbAotIGB0eXBlPTZgOiBFdXJhc2lhbiB3cmVuCgojIFJlc2VhcmNoIHF1ZXN0aW9uCgpUaGUgcmVzZWFyY2hlcnMgd2FudCB0byB0ZXN0IGlmIHRoZSB0eXBlIG9mIGZvc3RlciBwYXJlbnQKaGFzIGFuIGVmZmVjdCBvbiB0aGUgYXZlcmFnZSBsZW5ndGggb2YgdGhlIGN1Y2tvbyBlZ2dzLgoKSW4gcHJhY3RpY2UsIHRoZXkgd2FudCB0byBzdHVkeSB0aGlzIGZvciBhbGwgc2l4IHNwZWNpZXMuCkhvd2V2ZXIsIGEgdC10ZXN0IGNhbiBvbmx5IGJlIHVzZWQgdG8gc3R1ZHkgbWVhbiBkaWZmZXJlbmNlcwpiZXR3ZWVuIHR3byBncm91cHMuIElmIHdlIHdhbnQgdG8gYW5hbHl6ZSBtdWx0aXBsZSBncm91cHMsIHRoZXJlCmFyZSB0d28gb3B0aW9ucy4KCjEuIFdlIHBlcmZvcm0gdC10ZXN0cyBvbiBhbGwgcGFpcndpc2UgY29tYmluYXRpb25zIG9mIHR5cGVzLgpUaGlzIG1lYW4gd2UgbmVlZCB0byBwZXJmb3JtIG4qKG4tMSkvMiA9IDE1IHQtdGVzdHMuCgoyLiBXZSBwZXJmb3JtIGFuIEFOT1ZBIGFuYWx5c2lzLgoKVGhlIHNlY29uZCBzdHJhdGVneSBpcyBtdWNoIG1vcmUgZWZmaWNpZW50IGFuZCBoYXMgYSBoaWdoZXIKc3RhdGlzdGljYWwgcG93ZXIuIFdlIHdpbGwgbGVhcm4gYWxsIGFib3V0IEFOT1ZBIGluIGNoYXB0ZXIgNy4KCkluIHRoaXMgdHV0b3JpYWwsIHdlIHdpbGwgYXNzZXNzIGEgc2luZ2xlIHBhaXJ3aXNlIGNvbXBhcmlzb24sCmJldHdlZW4gdGhlIEV1cm9wZWFuIHJvYmluIGFuZCB0aGUgRXVyb3BlYW4gd3Jlbi4gSW4gYSBmb2xsb3dpbmcKdHV0b3JpYWwsIHdlIHdpbGwgY29tZSBiYWNrIHRvIHRoaXMgZGF0YXNldCBhbmQgbWFrZSBhIGZ1bGwKYW5hbHlzaXMgd2l0aCBBTk9WQS4KCkxvYWQgdGhlIHJlcXVpcmVkIGxpYnJhcmllcwoKYGBge3IsIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMgSW1wb3J0IGRhdGEKCmBgYHtyLCBtZXNzYWdlPUZBTFNFfQpDdWNrb28gPC0gcmVhZF90c3YoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9zdGF0T21pY3MvUFNMU0RhdGEvbWFpbi9DdWNrb28udHh0IikKaGVhZChDdWNrb28pCmBgYAoKIyBUaWR5IGRhdGEKCkZvciB0aGlzIGV4ZXJjaXNlLCB3ZSBvbmx5IGNhcmUgYWJvdXQgdGhlIEV1cm9wZWFuIHJvYmluCmFuZCB0aGUgRXVyYXNpYW4gd3Jlbi4gVGhlcmVmb3JlLCB3ZSBjYW4gcmVtb3ZlIHRoZSBvYnNlcnZhdGlvbnMKb2YgdGhlIG90aGVyIHR5cGVzLiBJbiBhZGRpdGlvbiwgaXQgc2VlbXMgdGhhdCB0aGUgYHR5cGVgCmNvbHVtbiByYXRoZXIgdGhhbiBhIGZhY3Rvci4gTGV0J3MgZml4IHRoaXM6CgpgYGB7cn0KQ3Vja29vIDwtIEN1Y2tvbyAlPiUKICBmaWx0ZXIodHlwZSAlaW4lIGMoIjQiLCAiNiIpKSAlPiUKICBtdXRhdGUodHlwZSA9IGFzLmZhY3Rvcih0eXBlKSkKYGBgCgojIERhdGEgZXhwbG9yYXRpb24KCkhvdyBtYW55IGJpcmRzIGRvIHdlIGhhdmUgZm9yIGVhY2ggdHlwZT8KCmBgYHtyfQpDdWNrb28gJT4lCiAgY291bnQodHlwZSkKYGBgCgpWaXN1YWxpemUgdGhlIGRhdGEgdXNpbmcgYSBzdWl0YWJsZSBzdHJhdGVneQoKYGBge3J9CkN1Y2tvbyAlPiUKICBnZ3Bsb3QoYWVzKHggPSB0eXBlLCB5ID0gbGVuZ3RoLCBmaWxsID0gdHlwZSkpICsKICBnZW9tX2JveHBsb3QoKSArCiAgdGhlbWVfYncoKSArCiAgZ2VvbV9ib3hwbG90KG91dGxpZXIuc2hhcGUgPSBOQSkgKwogIGdlb21faml0dGVyKHdpZHRoID0gMC4yKSArCiAgc2NhbGVfZmlsbF9tYW51YWwodmFsdWVzID0gYygiZGltZ3JleSIsICJmaXJlYnJpY2siKSkgKwogIGdndGl0bGUoIkJveHBsb3Qgb2YgdGhlIGxlbmd0aCBvZiBlZ2dzIHBlciB0eXBlIikgKwogIHlsYWIoImxlbmd0aCAobW0pIikgKwogIHN0YXRfc3VtbWFyeSgKICAgIGZ1biA9IG1lYW4sIGdlb20gPSAicG9pbnQiLAogICAgc2hhcGUgPSA1LCBzaXplID0gMywgY29sb3IgPSAiYmxhY2siLAogICkKYGBgCgpXZSBjbGVhcmx5IHNlZSB0aGF0LCBvbiBhdmVyYWdlLCB0aGUgZWdncyBsYWlkIGluIHRoZQpuZXN0IG9mIHRoZSBFdXJvcGVhbiByb2JpbiAodHlwZT00KSBhcmUgbGFyZ2VyIHRoYW4gdGhvc2UKbGFpZCBpbiB0aGUgbmVzdCBvZiB0aGUgRXVyYXNpYW4gd3Jlbi4gQnV0IGlzIHRoaXMgZGlmZmVyZW5jZSAqKnNpZ25pZmljYW50Kio/CgojIEFuYWx5c2lzCgpXZSBjYW4gdGVzdCB0aGlzIHdpdGggYW4gdW5wYWlyZWQsIHR3by1zYW1wbGUgdC10ZXN0LgpCdXQgYmVmb3JlIHdlIGNhbiBzdGFydCB0aGUgYW5hbHlzaXMsIHdlIG11c3QgY2hlY2sgaWYKYWxsIGFzc3VtcHRpb25zIHRvIHBlcmZvcm0gYSB0LXRlc3QgYXJlIG1ldC4KCiMjIENoZWNrIHRoZSBhc3N1bXB0aW9ucwoKMS4gVGhlIG9ic2VydmF0aW9ucyBhcmUgaW5kZXBlbmRlbnQgb2YgZWFjaCBvdGhlciAoaW4gYm90aCBncm91cHMpCgoyLiBUaGUgZGF0YSAobGVuZ3RoKSBtdXN0IGJlIG5vcm1hbGx5IGRpc3RyaWJ1dGVkIChpbiBib3RoIGdyb3VwcykKCkFkZGl0aW9uYWxseSwgd2Ugc2hvdWxkIGNoZWNrIGlmIHRoZSB2YXJpYWJpbGl0eSB3aXRoaW4gYm90aApncm91cHMgaXMgc2ltaWxhciBvciBub3QgKGluIHRoZSBsYXR0dGVyIGNhc2Ugd2Ugc2hvdWxkIHVzZQphIFdlbGNoIHQtdGVzdCkuCgozLiBUaGUgdmFyaWFiaWxpdHkgd2l0aGluIGJvdGggZ3JvdXBzIGlzIHNpbWlsYXIKClRoZSBmaXJzdCBhc3N1bXB0aW9uIGlzIG1ldCwgYXMgd2UgbWF5IGFzc3VtZSB0aGF0IHRoZXJlIGFyZSBubwpzcGVjaWZpYyBwYXR0ZXJucyBvZiBjb3JyZWxhdGlvbiByYW5kb21seSBzZWxlY3RlZCBuZXN0cy4KClRvIGNoZWNrIHRoZSBub3JtYWxpdHkgYXNzdW1wdGlvbiwgd2Ugd2lsbCB1c2UgUVEgcGxvdHMuCgpgYGB7cn0KQ3Vja29vICU+JQogIGdncGxvdChhZXMoc2FtcGxlID0gbGVuZ3RoKSkgKwogIGdlb21fcXEoKSArCiAgZ2VvbV9xcV9saW5lKCkgKwogIGZhY2V0X2dyaWQofiB0eXBlKQpgYGAKClRoZXJlIHNlZW1zIHRvIGJlIG5vIGNsZWFyIGRldmlhdGlvbiBmcm9tIG5vcm1hbGl0eS4KClRoZSB0aGlyZCBhc3N1bXB0aW9uIHNlZW1zIHRvIGJlIG1ldCBiYXNlZCBvbiBvdXIKdmlzdWFsaXphdGlvbiB3aXRoIGJveHBsb3RzLiBJbmRlZWQgdGhlIGludGVycXVhcnRpbGUgcmFuZ2Ugb2YgdGhlIGJveGVzIGFyZSB2ZXJ5IGNvbXBhcmFibGUuIEFzIGFsbCBhc3N1bXB0aW9ucyBhcmUgbWV0LCB3ZSBtYXkgcHJvY2VlZCB3aXRoIHRoZSBhbmFseXNpcy4KCiMjIEh5cG90aGVzaXMgdGVzdGluZwoKYGBge3J9Cm91dHB1dCA8LSB0LnRlc3QobGVuZ3RoIH4gdHlwZSwgZGF0YSA9IEN1Y2tvbywgdmFyLmVxdWFsID0gVFJVRSkKb3V0cHV0CmBgYAoKIyMgQ29uY2x1c2lvbgoKVGhlcmUgaXMgYW4gZXh0cmVtZWx5IHNpZ25pZmljYW50IGRpZmZlcmVuY2UgaW4gbWVhbgpsZW5ndGggb2YgQ3Vja29vIGVnZ3MgZm9zdGVyZWQgYnkgdGhlIEV1cm9wZWFuIHJvYmluIGFuZCB0aG9zZSBmb3N0ZXJlZCBieSB0aGUgRXVyYXNpYW4gd3JlbiAgKHAgPDwgMC4wMDEpLiBUaGUgZWdncwpmb3N0ZXJlZCBieSB0aGUgRXVyb3BlYW4gd3JlbiBhcmUgb24gYXZlcmFnZQpgciByb3VuZChvdXRwdXQkZXN0aW1hdGVbMV0tb3V0cHV0JGVzdGltYXRlWzJdLDIpYCBtbSBsb25nZXIKKDk1JSBDSSBbYHIgcm91bmQob3V0cHV0JGNvbmYuaW50W2MoMSwyKV0sMilgXSkuCg==