Assess genotoxicity of 1,2-dimethylhydrazine dihydrochloride (DMH) (EU directive)
dna <- read_delim("https://raw.githubusercontent.com/GTPB/PSLS20/master/data/dna.txt",delim=" ")
dna$dose <- as.factor(dna$dose)
dna
The Kruskal-Wallis Rank Test (KW-test) is a non-parameteric alternative for ANOVA F-test.
Classical \(F\)-teststatistic can be written as \[ F = \frac{\text{SST}/(g-1)}{\text{SSE}/(n-g)} = \frac{\text{SST}/(g-1)}{(\text{SSTot}-\text{SST})/(n-g)} , \]
with \(g\) the number of groups.
SSTot depends only on outcomes \(\mathbf{y}\) and will not vary in permutation test.
SST can be used as statistic \[\text{SST}=\sum_{j=1}^t n_j(\bar{Y}_j-\bar{Y})^2\]
The KW test statistic is based on SST on rank-transformed outcomes1, \[ \text{SST} = \sum_{j=1}^g n_j \left(\bar{R}_j - \bar{R}\right)^2 = \sum_{j=1}^t n_j \left(\bar{R}_j - \frac{n+1}{2}\right)^2 , \]
with \(\bar{R}_j\) the mean of the ranks in group \(j\), and \(\bar{R}\) the mean of all ranks, \[ \bar{R} = \frac{1}{n}(1+2+\cdots + n) = \frac{1}{n}\frac{1}{2}n(n+1) = \frac{n+1}{2}. \]
The KW teststatistic is given by \[ KW = \frac{12}{n(n+1)} \sum_{j=1}^g n_j \left(\bar{R}_j - \frac{n+1}{2}\right)^2. \]
The factor \(\frac{12}{n(n+1)}\) is used so that \(KW\) has a simple asymptotic null distribution. In particular under \(H_0\), given thart \(\min(n_1,\ldots, n_g)\rightarrow \infty\), \[ KW \rightarrow \chi^2_{t-1}. \]
The exact KW-test can be executed by calculating the permutation null distribution (that only depends on \(n_1, \ldots, n_g\)) to test \[H_0: f_1=\ldots=f_g \text{ vs } H_1: \text{ at least two means are different}.\]
In order to allow \(H_1\) to be formulated in terms of means, the assumption of locations shift should be valid.
For DMH example this is not the case.
If location-shift is invalid, we have to formulate \(H_1\) in terms of probabilistic indices: \[H_0: f_1=\ldots=f_g \text{ vs } H_1: \exists\ j,k \in \{1,\ldots,g\} : \text{P}\left[Y_j\geq Y_k\right]\neq 0.5\]
Kruskal-Wallis rank sum test
data: length by dose
Kruskal-Wallis chi-squared = 14, df = 3, p-value = 0.002905
On the \(5\%\) level of significance we can reject the null hypothesis.
R-functie kruskal.test
only returns the asymptotic approximation for \(p\)-values.
With only 6 observaties per groep, this is not a good approximation of the \(p\)-value
With the coin
R package we can calculate the exacte \(p\)-value
library(coin)
kwPerm <- kruskal_test(length~dose,data=dna,
distribution=approximate(B=100000))
kwPerm
Approximative Kruskal-Wallis Test
data: length by dose (0, 1.25, 2.5, 5)
chi-squared = 14, p-value = 0.00043
We conclude that the difference in the distribution of the DNA damages due to the DMH dose is extremely significantly different.
Posthoc analysis with WMW tests.
Pairwise comparisons using Wilcoxon rank sum exact test
data: dna$length and dna$dose
0 1.25 2.5
1.25 0.013 - -
2.5 0.013 0.818 -
5 0.013 0.721 0.788
P value adjustment method: holm
pairwise.wilcox.test
output. Point estimate on probability on higher DNA-damage?nGroup <- table(dna$dose)
probInd <- combn(levels(dna$dose),2,function(x)
{
test <- wilcox.test(length~dose,subset(dna,dose%in%x))
return(test$statistic/prod(nGroup[x]))
}
)
names(probInd) <- combn(levels(dna$dose),2,paste,collapse="vs")
probInd
0vs1.25 0vs2.5 0vs5 1.25vs2.5 1.25vs5 2.5vs5
0.0000000 0.0000000 0.0000000 0.4444444 0.2777778 0.3333333
Because there are doubts on the location-shift model we draw our conclusions in terms of the probabilistic index.
we assume that no ties are available↩︎