The lettuce dataset
In a previous tutorial, we analysed the dataset on
lettuce plants using ANOVA. However, it was not clear
if all the assumptions of ANOVA were met. Indeed, with
only 7 datapoints per group, it is very hard to assess
the assumptions of normality and equal variances.
Therefore, we will re-analyse the dataset by using the
non-parametric alternative to ANOVA, the Kruskal-Wallis test
We will first give a concise overview of what we saw in the
ANOVA analysis, which can be found in the
The researchers want to find out if biochar, compost and
a combination of both biochar and compost have an influence
on the growth of lettuce plants. To this end, they grew up
lettuce plants in a greenhouse. The pots were filled with
one of four soil types;
- Soil only (control)
- Soil supplemented with biochar (refoak)
- Soil supplemented with compost (compost)
- Soil supplemented with both biochar and compost (cobc)
The dataset freshweight_lettuce.txt
contains the freshweight
(in grams) for 28 lettuce plants (7 per condition).
Load the required libraries
Data import
lettuce <- read_csv("")
Take a glimpse at the data
## Rows: 28
## Columns: 3
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ treatment <chr> "control", "control", "control", "control", "control", "co…
## $ freshweight <dbl> 38, 34, 41, 43, 43, 29, 38, 59, 64, 57, 56, 50, 64, 62, 38…
Data tidying
## set treatment to factor
## ...
Data exploration
## Count the number of observations per treatment
Now let’s make a boxplot displaying the freshweight
of each treatment condition:
Interpret the visualization!
In the analysis in chapter 7 (ANOVA_lettuce_plants_half.rmd
we accepted the assumptions for analyzing the data with an ANOVA.
However, it was not clear if all the assumptions of ANOVA were met.
Indeed, with only 7 values per group, it is very hard to assess
the assumptions of normality and equal variances.
Therefore, we will re-analyse the dataset by using the
non-parametric alternative to ANOVA: the Kruskal-Wallis test.
Kruskal-Wallis rank test
Formulate a correct null and alternative hypothesis for the Kruskal-Wallis test in this analysis.
# set.seed(1)
# kw <- kruskal_test(...)
# kw
Interpret the results!
Post-hoc analysis
We will perform a post-hoc analysis with pairwise Wilcoxon rank
sum test. As we did not want to assume the location shift, we
will interpret the outcome in terms of probabilistic indices.
Note that after the analysis, we will need to correct the acquired
p-values for multiple testing.
Formulate a correct null and alternative hypothesis for the Wilcoxon test post-hoc analysis.
## pairwise.wilcox.test(...)
What do you observe?
## Alternative: caluculate the p-value for each treatment combination with wilcoxon_test
treatments <- levels(lettuce$treatment)
freshweight <- lettuce$freshweight
pvalues <- combn(treatments,2,function(x){
## Pairwise Wilcoxon test
test = wilcox_test(freshweight~treatment,subset(lettuce,treatment%in%x), distribution = 'exact')
## Get and store p-value of test
## Adjust for multiple testing
pvalues_bonf = p.adjust(pvalues,method = 'bonferroni')
## link the p-value with the correct pairwise test
names(pvalues_bonf) <- combn(levels(lettuce$treatment),2,paste,collapse="_VS_")
Based on the chunk of code above, can extract the point estimates
for the probabilistic indices? Interpret those as well.
Formulate a proper conclusion that answers the research hypothesis.