1 Approximating the variance stabilizing transformation of a Poisson random variable

  • A random variable \(Y \sim Poi(\mu)\) has \(Var(Y) = E(Y) = \mu\).
  • We are looking for a variance stabilizing transformation (VST) \(f(Y)\) such that \(Var(f(Y)) = c\), with \(c\) any constant. In particular, we need \(Var(f(Y))\) to be independent of \(\mu\).
  • A first-order Taylor series gives us \(f(Y) \approx f(\mu) + (Y - \mu) f'(\mu)\).
  • Rearranging gives us \(\left\{ f(Y) - f(\mu) \right\}^2 = (Y-\mu)^2 f'(\mu)^2\).
  • Which may be written as \(Var(f(Y)) = Var(Y)f'(\mu)^2\).
  • This shows us that we want a transformation such that \(f'(\mu)^2 = 1/\mu\) because then \(\forall \mu: Var(f(Y)) = 1\)! Let’s find out.
  • The last bullet point can be written as \(f'(\mu) = \frac{1}{\sqrt{\mu}}\) and therefore \(f(\mu) = \int \frac{1}{\sqrt{\mu}} d\mu = 2 \mu^{1/2}\).
  • Finally, this shows us that the transformation \(f(Y) = 2 Y^{1/2}\) ensures \(Var(f(Y)) = 1\). Similar, as is often written in the scientific literature, the transformation \(f(Y) = Y^{1/2}\) ensures \(Var(f(Y)) = 1/4\).
  • Note that these derivations all rely on the first-order Taylor expansion to be a good approximation. The VST will therefore work better for random variables with a high mean \(\mu\) as the distribution will be more discrete for random variables with a low mean. We show this using simulation below.
set.seed(871)
N <- 1e3
df <- data.frame(mean=rep(NA, N),
                 var=rep(NA, N),
                 transVar=rep(NA, N))
for(kk in 1:N){
  sampledMu <- runif(n=1, min=1, max=500)
  y <- rpois(n=500, lambda=sampledMu)
  df[kk,] <- c(mean(y), var(y), var(sqrt(y)))
}

plot(x=df$mean, y=df$var,
     xlab = "Mean of RV", ylab="Variance of RV")

plot(x=df$mean, y=df$transVar,
     xlab = "Mean of RV", ylab="Variance of transformed RV")

Question. Why is the first plot heteroscedastic?

Answer.

Remember that \(Var(\hat{\mu}) = \frac{\hat{\mu}}{n}\). Since for all random variables in our simulation \(n\) is equal, in our case we have that \(Var(\hat{\mu}) = c\hat{\mu}\). The variance on estimating the mean thus increases with the mean.

LS0tCnRpdGxlOiAiU2luZ2xlLWNlbGwgUk5BLXNlcXVlbmNpbmc6IHZhcmlhbmNlIHN0YWJpbGl6aW5nIHRyYW5zZm9ybWF0aW9ucyIKYXV0aG9yOiAiS29lbiBWYW4gZGVuIEJlcmdlIgpkYXRlOiAiTGFzdCBjb21waWxlZCBvbiBgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVkICVCLCAlWScpYCIKb3V0cHV0OiAKICBodG1sX2RvY3VtZW50OgogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKICBwZGZfZG9jdW1lbnQ6CiAgICB0b2M6IHRydWUKICAgIG51bWJlcl9zZWN0aW9uczogdHJ1ZQogICAgbGF0ZXhfZW5naW5lOiB4ZWxhdGV4Ci0tLQoKCmBgYHtyLCBlY2hvPUZBTFNFfQppZighIkJpb2NNYW5hZ2VyIiAlaW4lIGluc3RhbGxlZC5wYWNrYWdlcygpWywxXSl7CiAgaW5zdGFsbC5wYWNrYWdlcygiQmlvY01hbmFnZXIiKQp9CmBgYAoKIyBBcHByb3hpbWF0aW5nIHRoZSB2YXJpYW5jZSBzdGFiaWxpemluZyB0cmFuc2Zvcm1hdGlvbiBvZiBhIFBvaXNzb24gcmFuZG9tIHZhcmlhYmxlCgogLSBBIHJhbmRvbSB2YXJpYWJsZSAkWSBcc2ltIFBvaShcbXUpJCBoYXMgJFZhcihZKSA9IEUoWSkgPSBcbXUkLgogLSBXZSBhcmUgbG9va2luZyBmb3IgYSB2YXJpYW5jZSBzdGFiaWxpemluZyB0cmFuc2Zvcm1hdGlvbiAoVlNUKSAkZihZKSQgc3VjaCB0aGF0ICRWYXIoZihZKSkgPSBjJCwgd2l0aCAkYyQgYW55IGNvbnN0YW50LiBJbiBwYXJ0aWN1bGFyLCB3ZSBuZWVkICRWYXIoZihZKSkkIHRvIGJlIGluZGVwZW5kZW50IG9mICRcbXUkLgogLSBBIGZpcnN0LW9yZGVyIFRheWxvciBzZXJpZXMgZ2l2ZXMgdXMgJGYoWSkgXGFwcHJveCBmKFxtdSkgKyAoWSAtIFxtdSkgZicoXG11KSQuCiAtIFJlYXJyYW5naW5nIGdpdmVzIHVzICRcbGVmdFx7IGYoWSkgLSBmKFxtdSkgXHJpZ2h0XH1eMiA9IChZLVxtdSleMiBmJyhcbXUpXjIkLgogLSBXaGljaCBtYXkgYmUgd3JpdHRlbiBhcyAkVmFyKGYoWSkpID0gVmFyKFkpZicoXG11KV4yJC4KIC0gVGhpcyBzaG93cyB1cyB0aGF0ICoqd2Ugd2FudCBhIHRyYW5zZm9ybWF0aW9uIHN1Y2ggdGhhdCAkZicoXG11KV4yID0gMS9cbXUkKiogYmVjYXVzZSB0aGVuICRcZm9yYWxsIFxtdTogVmFyKGYoWSkpID0gMSQhIExldCdzIGZpbmQgb3V0LgogLSBUaGUgbGFzdCBidWxsZXQgcG9pbnQgY2FuIGJlIHdyaXR0ZW4gYXMgJGYnKFxtdSkgPSBcZnJhY3sxfXtcc3FydHtcbXV9fSQgYW5kIHRoZXJlZm9yZSAkZihcbXUpID0gXGludCBcZnJhY3sxfXtcc3FydHtcbXV9fSBkXG11ID0gMiBcbXVeezEvMn0kLgogLSBGaW5hbGx5LCB0aGlzIHNob3dzIHVzIHRoYXQgdGhlIHRyYW5zZm9ybWF0aW9uICRmKFkpID0gMiBZXnsxLzJ9JCBlbnN1cmVzICRWYXIoZihZKSkgPSAxJC4gU2ltaWxhciwgYXMgaXMgb2Z0ZW4gd3JpdHRlbiBpbiB0aGUgc2NpZW50aWZpYyBsaXRlcmF0dXJlLCB0aGUgdHJhbnNmb3JtYXRpb24gJGYoWSkgPSBZXnsxLzJ9JCBlbnN1cmVzICRWYXIoZihZKSkgPSAxLzQkLgogLSBOb3RlIHRoYXQgdGhlc2UgZGVyaXZhdGlvbnMgYWxsICoqcmVseSBvbiB0aGUgZmlyc3Qtb3JkZXIgVGF5bG9yIGV4cGFuc2lvbiB0byBiZSBhIGdvb2QgYXBwcm94aW1hdGlvbioqLiBUaGUgVlNUIHdpbGwgdGhlcmVmb3JlIHdvcmsgYmV0dGVyIGZvciByYW5kb20gdmFyaWFibGVzIHdpdGggYSBoaWdoIG1lYW4gJFxtdSQgYXMgdGhlIGRpc3RyaWJ1dGlvbiB3aWxsIGJlIG1vcmUgZGlzY3JldGUgZm9yIHJhbmRvbSB2YXJpYWJsZXMgd2l0aCBhIGxvdyBtZWFuLiBXZSBzaG93IHRoaXMgdXNpbmcgc2ltdWxhdGlvbiBiZWxvdy4KIAogCmBgYHtyfQpzZXQuc2VlZCg4NzEpCk4gPC0gMWUzCmRmIDwtIGRhdGEuZnJhbWUobWVhbj1yZXAoTkEsIE4pLAogICAgICAgICAgICAgICAgIHZhcj1yZXAoTkEsIE4pLAogICAgICAgICAgICAgICAgIHRyYW5zVmFyPXJlcChOQSwgTikpCmZvcihrayBpbiAxOk4pewogIHNhbXBsZWRNdSA8LSBydW5pZihuPTEsIG1pbj0xLCBtYXg9NTAwKQogIHkgPC0gcnBvaXMobj01MDAsIGxhbWJkYT1zYW1wbGVkTXUpCiAgZGZba2ssXSA8LSBjKG1lYW4oeSksIHZhcih5KSwgdmFyKHNxcnQoeSkpKQp9CgpwbG90KHg9ZGYkbWVhbiwgeT1kZiR2YXIsCiAgICAgeGxhYiA9ICJNZWFuIG9mIFJWIiwgeWxhYj0iVmFyaWFuY2Ugb2YgUlYiKQpwbG90KHg9ZGYkbWVhbiwgeT1kZiR0cmFuc1ZhciwKICAgICB4bGFiID0gIk1lYW4gb2YgUlYiLCB5bGFiPSJWYXJpYW5jZSBvZiB0cmFuc2Zvcm1lZCBSViIpCgpgYGAKCioqUXVlc3Rpb24qKi4gV2h5IGlzIHRoZSBmaXJzdCBwbG90IGhldGVyb3NjZWRhc3RpYz8KCjxkZXRhaWxzPjxzdW1tYXJ5PiBBbnN3ZXIuIDwvc3VtbWFyeT48cD4KUmVtZW1iZXIgdGhhdCAkVmFyKFxoYXR7XG11fSkgPSBcZnJhY3tcaGF0e1xtdX19e259JC4KU2luY2UgZm9yIGFsbCByYW5kb20gdmFyaWFibGVzIGluIG91ciBzaW11bGF0aW9uICRuJCBpcyBlcXVhbCwgaW4gb3VyIGNhc2Ugd2UgaGF2ZSB0aGF0ICRWYXIoXGhhdHtcbXV9KSA9IGNcaGF0e1xtdX0kLgpUaGUgdmFyaWFuY2Ugb24gZXN0aW1hdGluZyB0aGUgbWVhbiB0aHVzIGluY3JlYXNlcyB3aXRoIHRoZSBtZWFuLgo8L3A+PC9kZXRhaWxzPgoKCgo=