Smelly armpit dataset
Smelly armpits are not caused by sweat, itself. The smell is caused by specific micro-organisms belonging to the group of Corynebacterium spp. that metabolise sweat. Another group of abundant bacteria are the Staphylococcus spp., these bacteria do not metabolise sweat in smelly compounds.
The CMET-group at Ghent University does research to on transplanting the armpit microbiome to save people with smelly armpits.
Proposed Therapy:
- Remove armpit-microbiome with antibiotics
- Influence armpit microbiome with microbial transplant, see this 2 minute talk on youtube
Experiment:
- 20 students with smelly armpits are attributed to one of two treatment groups
- placebo (only antibiotics)
- transplant (antibiotica followed by microbial transplant).
- The microbiome is sampled 6 weeks upon the treatment
- The relative abundance of Staphylococcus spp. on Corynebacterium spp. + Staphylococcus spp. in the microbiome is measured via DGGE (Denaturing Gradient Gel Electrophoresis).
Goal
The overarching goal of this research was to assess if the relative abundance Staphylococcus spp. in the microbiome of the armpit is affected by transplanting the microbiome. To this end the researchers randomized patients to two treatment: A treatment with antibiotics only and a treatment with antibiotics and a microbial transplant.
In the tutorial on hypotheses testing we will use a formal statistical test to generalize the results from the sample to that of the population.
Import the dataset
#Load the libraries
library(tidyverse)
Import the data
ap <- read_csv("https://raw.githubusercontent.com/statOmics/PSLS21/data/armpit.csv")
## Rows: 20 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): trt
## dbl (1): rel
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 20
## Columns: 2
## $ trt <chr> "placebo", "placebo", "placebo", "placebo", "placebo", "placebo", …
## $ rel <dbl> 54.99208, 31.84466, 41.09948, 59.52064, 63.57341, 41.48649, 30.440…
Data Exploration
A crucial first step in a data analysis is to visualize and to explore the raw data.
ap %>% ggplot(aes(x=trt,y=rel,fill=trt)) +
geom_boxplot(outlier.shape=NA) +
geom_point(position="jitter") +
ylab("relative abundance (%)") +
xlab("treatment group") +
stat_summary(fun.y=mean, geom="point", shape=5, size=3, color="black", fill="black")
## Warning: `fun.y` is deprecated. Use `fun` instead.
We clearly see that, on average, the subjects who had a microbial transplant have a higher relative abundance of Staphylococcus spp. But is this difference statistically
significant so that we can generalized what we observe in the sample to the population?
We can test this with an unpaired, two-sample t-test, which falsifies the null hypothesis that there is on average no difference in relative abundance of Staphylococcus in the armpit microbiome between the transplant and the placebo group against the alternative hypothesis that there is a difference in average abundance of Staphyloccocus in the armpit microbiome between the transplant and placebo treatment.
But, before we can start the analysis, we must check if all assumptions to perform a t-test are met.
Check the assumptions
The observations are independent both within and between groups. This has to be guaranteed by the design.
The data (rel) are normally distributed in each of the groups
The variability within both groups is similar.
To check the normality assumption, we will use QQ plots.
Interpret the QQ plot.
For the third assumption, we must compare the within-group variability of both groups. We can do this visually based on the boxplots.
Interpret the boxplot.
Assess the research question with the two-sample t-test
Analysis
Conclusion
ADDENDUM: Train yourself in checking the assumptions
In order for the learners to get more proficient in evaluating the assumptions we will simulate 9 dataset with sample sizes similar to our data for which the assumptions of normality and equal variance do hold. For the QQ-plots we will only plot the one from one of the groups.
Simulate data
We simulate 9 datasets with the same sample sizes, means and pooled variance as in the sample.
ap <- read_csv("https://raw.githubusercontent.com/statOmics/PSLS21/data/armpit.csv")
## Rows: 20 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): trt
## dbl (1): rel
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nSamp <- 9
## descriptive statistics
apRelSum<-ap %>%
group_by(trt)%>%
summarize_at("rel",
list(mean=~mean(.,na.rm=TRUE),
sd=~sd(.,na.rm=TRUE),
n=function(x) x%>%is.na%>%`!`%>%sum)) %>%
mutate(se=sd/sqrt(n))
sigma <- sqrt(sum(apRelSum$sd^2*(apRelSum$n-1))/(sum(apRelSum$n)-2))
normSim <- matrix(rnorm(sum(apRelSum$n)*nSamp,
mean=c(rep(apRelSum$mean[1],apRelSum$n[1]),
rep(apRelSum$mean[2],apRelSum$n[2])),
sd=sigma),nrow=sum(apRelSum$n)) %>%
as.data.frame %>%
mutate(trt=ap$trt)
Comparisons of variances
normSim %>% gather(samp,data,-trt) %>%
ggplot(aes(x=trt,y=data)) +
geom_boxplot() +
facet_wrap(~samp)
Evaluation of normality
Placebo group
normSim %>%
gather(samp,data,-trt) %>%
filter(trt=="placebo") %>%
ggplot(aes(sample=data)) +
geom_qq() +
geom_qq_line() +
facet_wrap(~samp)
Transplant group
normSim %>%
gather(samp,data,-trt) %>%
filter(trt=="transplant") %>%
ggplot(aes(sample=data)) +
geom_qq() +
geom_qq_line() +
facet_wrap(~samp)
LS0tCnRpdGxlOiAiRXhlcmNpc2UgNS4yOiBIeXBvdGhlc2lzIHRlc3Rpbmcgb24gdGhlIGFybXBpdCBkYXRhc2V0IiAgIAphdXRob3I6ICJMaWV2ZW4gQ2xlbWVudCBhbmQgSmVyb2VuIEdpbGlzIgpkYXRlOiAic3RhdE9taWNzLCBHaGVudCBVbml2ZXJzaXR5IChodHRwczovL3N0YXRvbWljcy5naXRodWIuaW8pIiAgCm91dHB1dDoKICAgIGh0bWxfZG9jdW1lbnQ6CiAgICAgIGNvZGVfZG93bmxvYWQ6IHRydWUgICAgCiAgICAgIHRoZW1lOiBjb3NtbwogICAgICB0b2M6IHRydWUKICAgICAgdG9jX2Zsb2F0OiB0cnVlCiAgICAgIGhpZ2hsaWdodDogdGFuZ28KICAgICAgbnVtYmVyX3NlY3Rpb25zOiB0cnVlCi0tLQoKIyBTbWVsbHkgYXJtcGl0IGRhdGFzZXQKClNtZWxseSBhcm1waXRzIGFyZSBub3QgY2F1c2VkIGJ5IHN3ZWF0LCBpdHNlbGYuIFRoZSBzbWVsbCBpcyBjYXVzZWQKYnkgc3BlY2lmaWMgbWljcm8tb3JnYW5pc21zIGJlbG9uZ2luZyB0byB0aGUgZ3JvdXAgb2YKKkNvcnluZWJhY3Rlcml1bSBzcHAuKiB0aGF0IG1ldGFib2xpc2Ugc3dlYXQuIEFub3RoZXIgZ3JvdXAgb2YgYWJ1bmRhbnQgYmFjdGVyaWEKYXJlIHRoZSAqU3RhcGh5bG9jb2NjdXMgc3BwLiosIHRoZXNlIGJhY3RlcmlhIGRvIG5vdCBtZXRhYm9saXNlIHN3ZWF0IGluIHNtZWxseSAKY29tcG91bmRzLgoKVGhlIENNRVQtZ3JvdXAgYXQgR2hlbnQgVW5pdmVyc2l0eSBkb2VzIHJlc2VhcmNoIHRvIG9uIHRyYW5zcGxhbnRpbmcgdGhlIGFybXBpdCAKbWljcm9iaW9tZSB0byBzYXZlIHBlb3BsZSB3aXRoIHNtZWxseSBhcm1waXRzLgoKLSBQcm9wb3NlZCBUaGVyYXB5OgogIAkxLiBSZW1vdmUgYXJtcGl0LW1pY3JvYmlvbWUgd2l0aCBhbnRpYmlvdGljcwogICAgMi4gSW5mbHVlbmNlIGFybXBpdCBtaWNyb2Jpb21lIHdpdGggbWljcm9iaWFsIHRyYW5zcGxhbnQsIHNlZSB0aGlzIDIgbWludXRlCiAgICAgICB0YWxrIG9uIFt5b3V0dWJlXShodHRwczovL3lvdXR1LmJlLzlSSUZ5cUxYZFZ3KQoKLSBFeHBlcmltZW50OgoKICAgIC0gMjAgc3R1ZGVudHMgd2l0aCBzbWVsbHkgYXJtcGl0cyBhcmUgYXR0cmlidXRlZCB0byBvbmUgb2YgCiAgICAgIHR3byB0cmVhdG1lbnQgZ3JvdXBzCiAgICAtIHBsYWNlYm8gKG9ubHkgYW50aWJpb3RpY3MpCiAgICAtIHRyYW5zcGxhbnQgKGFudGliaW90aWNhIGZvbGxvd2VkIGJ5IG1pY3JvYmlhbCB0cmFuc3BsYW50KS4KICAgIC0gVGhlIG1pY3JvYmlvbWUgaXMgc2FtcGxlZCA2IHdlZWtzIHVwb24gdGhlIHRyZWF0bWVudAogICAgLSBUaGUgcmVsYXRpdmUgYWJ1bmRhbmNlIG9mICpTdGFwaHlsb2NvY2N1cyBzcHAuKiBvbgogICAgICAqQ29yeW5lYmFjdGVyaXVtIHNwcC4qICsgKlN0YXBoeWxvY29jY3VzIHNwcC4qIGluIHRoZQogICAgICBtaWNyb2Jpb21lIGlzIG1lYXN1cmVkIHZpYSBER0dFICgqRGVuYXR1cmluZyBHcmFkaWVudCBHZWwKICAgICAgRWxlY3Ryb3Bob3Jlc2lzKikuCiAgICAKIyBHb2FsCgpUaGUgb3ZlcmFyY2hpbmcgZ29hbCBvZiB0aGlzIHJlc2VhcmNoIHdhcyB0byBhc3Nlc3MgaWYgdGhlIHJlbGF0aXZlIGFidW5kYW5jZSAKKlN0YXBoeWxvY29jY3VzIHNwcC4qCmluIHRoZSBtaWNyb2Jpb21lIG9mIHRoZSBhcm1waXQgaXMgYWZmZWN0ZWQgYnkgdHJhbnNwbGFudGluZyB0aGUgbWljcm9iaW9tZS4gClRvIHRoaXMgZW5kIHRoZSByZXNlYXJjaGVycyByYW5kb21pemVkIHBhdGllbnRzIHRvIHR3byB0cmVhdG1lbnQ6CkEgdHJlYXRtZW50IHdpdGggYW50aWJpb3RpY3Mgb25seSBhbmQgYSB0cmVhdG1lbnQgd2l0aAphbnRpYmlvdGljcyBhbmQgYSBtaWNyb2JpYWwgdHJhbnNwbGFudC4KCkluIHRoZSB0dXRvcmlhbCBvbiBoeXBvdGhlc2VzIHRlc3Rpbmcgd2Ugd2lsbCB1c2UgYSBmb3JtYWwgc3RhdGlzdGljYWwgdGVzdCB0byAKZ2VuZXJhbGl6ZSB0aGUgcmVzdWx0cyBmcm9tIHRoZSBzYW1wbGUgdG8gdGhhdCBvZiB0aGUgcG9wdWxhdGlvbi4KCiMgSW1wb3J0IHRoZSBkYXRhc2V0CgpgYGB7ciwgbWVzc2FnZT1GQUxTRX0KI0xvYWQgdGhlIGxpYnJhcmllcwpsaWJyYXJ5KHRpZHl2ZXJzZSkKYGBgCgpJbXBvcnQgdGhlIGRhdGEKCmBgYHtyfQphcCA8LSByZWFkX2NzdigiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL3N0YXRPbWljcy9QU0xTMjEvZGF0YS9hcm1waXQuY3N2IikKYGBgCgpgYGB7cn0KZ2xpbXBzZShhcCkKYGBgCgojIERhdGEgRXhwbG9yYXRpb24KCkEgY3J1Y2lhbCBmaXJzdCBzdGVwIGluIGEgZGF0YSBhbmFseXNpcyBpcyB0byB2aXN1YWxpemUgYW5kIHRvIGV4cGxvcmUgdGhlIHJhdyAKZGF0YS4KCmBgYHtyfQphcCAlPiUgZ2dwbG90KGFlcyh4PXRydCx5PXJlbCxmaWxsPXRydCkpICsgCiAgZ2VvbV9ib3hwbG90KG91dGxpZXIuc2hhcGU9TkEpICsgCiAgZ2VvbV9wb2ludChwb3NpdGlvbj0iaml0dGVyIikgKwogIHlsYWIoInJlbGF0aXZlIGFidW5kYW5jZSAoJSkiKSArCiAgeGxhYigidHJlYXRtZW50IGdyb3VwIikgKyAKICBzdGF0X3N1bW1hcnkoZnVuLnk9bWVhbiwgZ2VvbT0icG9pbnQiLCBzaGFwZT01LCBzaXplPTMsIGNvbG9yPSJibGFjayIsIGZpbGw9ImJsYWNrIikKYGBgCgpXZSBjbGVhcmx5IHNlZSB0aGF0LCBvbiBhdmVyYWdlLCB0aGUgc3ViamVjdHMgd2hvIGhhZCBhCm1pY3JvYmlhbCB0cmFuc3BsYW50IGhhdmUgYSBoaWdoZXIgcmVsYXRpdmUgYWJ1bmRhbmNlIG9mClN0YXBoeWxvY29jY3VzIHNwcC4gQnV0IGlzIHRoaXMgZGlmZmVyZW5jZSBzdGF0aXN0aWNhbGx5ICAKKnNpZ25pZmljYW50KiBzbyB0aGF0IHdlIGNhbiBnZW5lcmFsaXplZCB3aGF0IHdlIG9ic2VydmUgCmluIHRoZSBzYW1wbGUgdG8gdGhlIHBvcHVsYXRpb24/CgpXZSBjYW4gdGVzdCB0aGlzIHdpdGggYW4gdW5wYWlyZWQsIHR3by1zYW1wbGUgdC10ZXN0LCB3aGljaCBmYWxzaWZpZXMgdGhlIG51bGwgCmh5cG90aGVzaXMgdGhhdCB0aGVyZSBpcyBvbiBhdmVyYWdlIG5vIGRpZmZlcmVuY2UgaW4gcmVsYXRpdmUgYWJ1bmRhbmNlIG9mIAoqU3RhcGh5bG9jb2NjdXMqIGluIHRoZSBhcm1waXQgbWljcm9iaW9tZSBiZXR3ZWVuIHRoZSB0cmFuc3BsYW50IGFuZCB0aGUgCnBsYWNlYm8gZ3JvdXAgYWdhaW5zdCB0aGUgYWx0ZXJuYXRpdmUgaHlwb3RoZXNpcyB0aGF0IHRoZXJlIGlzIGEgZGlmZmVyZW5jZSAKaW4gYXZlcmFnZSBhYnVuZGFuY2Ugb2YgKlN0YXBoeWxvY2NvY3VzKiBpbiB0aGUgYXJtcGl0IG1pY3JvYmlvbWUgYmV0d2VlbiAKdGhlIHRyYW5zcGxhbnQgYW5kIHBsYWNlYm8gdHJlYXRtZW50LiAKCkJ1dCwgYmVmb3JlIHdlIGNhbiBzdGFydCB0aGUgYW5hbHlzaXMsIHdlIG11c3QgY2hlY2sgaWYgYWxsIGFzc3VtcHRpb25zIHRvIApwZXJmb3JtIGEgdC10ZXN0IGFyZSBtZXQuCgojIyBDaGVjayB0aGUgYXNzdW1wdGlvbnMKCjEuIFRoZSBvYnNlcnZhdGlvbnMgYXJlIGluZGVwZW5kZW50IGJvdGggd2l0aGluIGFuZCBiZXR3ZWVuIGdyb3Vwcy4gVGhpcyBoYXMgdG8gYmUgCmd1YXJhbnRlZWQgYnkgdGhlIGRlc2lnbi4KCjIuIFRoZSBkYXRhIChyZWwpIGFyZSBub3JtYWxseSBkaXN0cmlidXRlZCBpbiBlYWNoIG9mIHRoZSBncm91cHMKCjMuIFRoZSB2YXJpYWJpbGl0eSB3aXRoaW4gYm90aCBncm91cHMgaXMgc2ltaWxhci4KClRvIGNoZWNrIHRoZSBub3JtYWxpdHkgYXNzdW1wdGlvbiwgd2Ugd2lsbCB1c2UgUVEgcGxvdHMuCgpgYGB7cixldmFsPUZBTFNFfQphcCAlPiUKICAuLi4KYGBgCgpJbnRlcnByZXQgdGhlIFFRIHBsb3QuCgpGb3IgdGhlIHRoaXJkIGFzc3VtcHRpb24sIHdlIG11c3QgY29tcGFyZSB0aGUgd2l0aGluLWdyb3VwCnZhcmlhYmlsaXR5IG9mIGJvdGggZ3JvdXBzLiBXZSBjYW4gZG8gdGhpcyB2aXN1YWxseSBiYXNlZCBvbiB0aGUgYm94cGxvdHMuIAoKYGBge3IsZXZhbD1GQUxTRX0KYXAgJT4lCiAgLi4uCmBgYAoKSW50ZXJwcmV0IHRoZSBib3hwbG90LgoKIyBBc3Nlc3MgdGhlIHJlc2VhcmNoIHF1ZXN0aW9uIHdpdGggdGhlIHR3by1zYW1wbGUgdC10ZXN0CgojIyBBbmFseXNpcwoKYGBge3J9CgpgYGAKCiMjIENvbmNsdXNpb24KCgoKIyBBRERFTkRVTTogVHJhaW4geW91cnNlbGYgaW4gY2hlY2tpbmcgdGhlIGFzc3VtcHRpb25zCgpJbiBvcmRlciBmb3IgdGhlIGxlYXJuZXJzIHRvIGdldCBtb3JlIHByb2ZpY2llbnQgaW4gZXZhbHVhdGluZyB0aGUgYXNzdW1wdGlvbnMgd2Ugd2lsbCBzaW11bGF0ZSA5IGRhdGFzZXQgd2l0aCBzYW1wbGUgc2l6ZXMgc2ltaWxhciB0byBvdXIgZGF0YSBmb3Igd2hpY2ggdGhlIGFzc3VtcHRpb25zIG9mIG5vcm1hbGl0eSBhbmQgZXF1YWwgdmFyaWFuY2UgZG8gaG9sZC4gRm9yIHRoZSBRUS1wbG90cyB3ZSB3aWxsIG9ubHkgcGxvdCB0aGUgb25lIGZyb20gb25lIG9mIHRoZSBncm91cHMuIAoKIyMgU2ltdWxhdGUgZGF0YSAKCldlIHNpbXVsYXRlIDkgZGF0YXNldHMgd2l0aCB0aGUgc2FtZSBzYW1wbGUgc2l6ZXMsIG1lYW5zIGFuZCBwb29sZWQgdmFyaWFuY2UgYXMgaW4gdGhlIHNhbXBsZS4gCgpgYGB7cn0KYXAgPC0gcmVhZF9jc3YoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9zdGF0T21pY3MvUFNMUzIxL2RhdGEvYXJtcGl0LmNzdiIpCgpuU2FtcCA8LSA5CiMjIGRlc2NyaXB0aXZlIHN0YXRpc3RpY3MKYXBSZWxTdW08LWFwICU+JQogIGdyb3VwX2J5KHRydCklPiUKICBzdW1tYXJpemVfYXQoInJlbCIsCiAgICAgICAgICAgICAgIGxpc3QobWVhbj1+bWVhbiguLG5hLnJtPVRSVUUpLAogICAgICAgICAgICAgICAgICAgIHNkPX5zZCguLG5hLnJtPVRSVUUpLAogICAgICAgICAgICAgICAgICAgIG49ZnVuY3Rpb24oeCkgeCU+JWlzLm5hJT4lYCFgJT4lc3VtKSkgJT4lCiAgbXV0YXRlKHNlPXNkL3NxcnQobikpCgpzaWdtYSA8LSBzcXJ0KHN1bShhcFJlbFN1bSRzZF4yKihhcFJlbFN1bSRuLTEpKS8oc3VtKGFwUmVsU3VtJG4pLTIpKSAKCm5vcm1TaW0gPC0gbWF0cml4KHJub3JtKHN1bShhcFJlbFN1bSRuKSpuU2FtcCwKICAgICAgICAgICAgICBtZWFuPWMocmVwKGFwUmVsU3VtJG1lYW5bMV0sYXBSZWxTdW0kblsxXSksCiAgICAgICAgICAgICAgICAgICAgIHJlcChhcFJlbFN1bSRtZWFuWzJdLGFwUmVsU3VtJG5bMl0pKSwKICAgICAgICAgICAgICAgICAgICAgc2Q9c2lnbWEpLG5yb3c9c3VtKGFwUmVsU3VtJG4pKSAlPiUKICBhcy5kYXRhLmZyYW1lICU+JSAKICBtdXRhdGUodHJ0PWFwJHRydCkKYGBgCgojIyBDb21wYXJpc29ucyBvZiB2YXJpYW5jZXMKCmBgYHtyfQpub3JtU2ltICU+JSBnYXRoZXIoc2FtcCxkYXRhLC10cnQpICU+JQogIGdncGxvdChhZXMoeD10cnQseT1kYXRhKSkgKwogIGdlb21fYm94cGxvdCgpICsKICBmYWNldF93cmFwKH5zYW1wKQpgYGAKCiMjIEV2YWx1YXRpb24gb2Ygbm9ybWFsaXR5CgojIyMgUGxhY2VibyBncm91cAoKYGBge3J9Cm5vcm1TaW0gJT4lIAogIGdhdGhlcihzYW1wLGRhdGEsLXRydCkgJT4lCiAgZmlsdGVyKHRydD09InBsYWNlYm8iKSAlPiUKICBnZ3Bsb3QoYWVzKHNhbXBsZT1kYXRhKSkgKwogIGdlb21fcXEoKSArCiAgZ2VvbV9xcV9saW5lKCkgKwogIGZhY2V0X3dyYXAofnNhbXApCmBgYAoKIyMjIFRyYW5zcGxhbnQgZ3JvdXAKCmBgYHtyfQpub3JtU2ltICU+JSAKICBnYXRoZXIoc2FtcCxkYXRhLC10cnQpICU+JQogIGZpbHRlcih0cnQ9PSJ0cmFuc3BsYW50IikgJT4lCiAgZ2dwbG90KGFlcyhzYW1wbGU9ZGF0YSkpICsKICBnZW9tX3FxKCkgKwogIGdlb21fcXFfbGluZSgpICsKICBmYWNldF93cmFwKH5zYW1wKQpgYGAK