Aims of this exercise
In this exercise, you will acquire the skills to
- recognize paired data
- conduct a data exploration in R for data from paired experimental designs.
- interpret the results of a data exploration for paired experimental designs
The diabetes dataset
The diabetes
dataset holds information on a small experiment with 8 patients that are subjected to a glucose tolerance test.
Patients had to fast for eight hours before the test. When the patients entered the hospital their baseline glucose level was measured (mmol/l).
Patients then had to drink 250 ml of a syrupy glucose solution containing 100 grams of sugar. Two hours later, their blood glucose level was measured again.
The data consist of three variables:
- before: glucose concentration upon 8 hours of fasting (mmol/l)
- after: glucose concentration 2 hours after drinking glucose solution (mmol/l).
- patient: identifier for the patient
Import the data
Data path:
https://raw.githubusercontent.com/statOmics/PSLSData/main/diabetes.txt
Have a first look at the data
Data visualization
Note, that the dataset is not in the tidy format. The glucose concentration variable is spread around 2 columns: before
and after
, while the “time” variable is encoded in the column names instead of in a dedicated column. Data in this form is also called wide data. Instead, we want to transform the data to a long format.
To tidy the data, we can use the gather()
function to pivot the data. In this case, we want to “gather” the time
(encoded in the column names before
and after
) and concentration
variables (which is encoded in the actual values). The patient
column should stay the same. We can specify this with the following syntax.
diabetes_tidy <- diabetes %>%
gather(time, concentration, -patient)
diabetes_tidy
Barplot
Not all visualization types will be equally informative.
A barplot is a plot that you will commonly find in scientific publications. The code for generating such a barplot is provided below:
diabetes_tidy %>%
## Calculate summarry statistics for the "bp" variable for each "type"
group_by(time) %>%
summarize(
mean = mean(concentration, na.rm = TRUE),
sd = sd(concentration, na.rm = TRUE),
n = n()
) %>%
## Compute the standard errors for the means
mutate(se = sd / sqrt(n)) %>%
ggplot(aes(x = time, y = mean, fill = time)) +
theme_bw() +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2) +
ggtitle("Barplot of glucose measurements") +
ylab("concentration (mmol/l)")
A barplot, however, is not very informative. The height of the bars only provides us with information of the mean blood pressure. However, we don’t see the actual underlying values, so we for instance don’t have any information on the spread of the data. It is usually more informative to represent to underlying values as raw as possible. Note that it is possible to add the raw data on the barplot, but we still would not see any measures of the spread, such as the interquartile range.
Another crucial aspect of the data are also not displayed: the data are paired!
Based on these critisisms, can you think of a better visualization strategy for the captopril data?
Add your proposed visualization strategy here
Descriptive statistics
- Generate a code chunk to calculate useful summary statistics for the diabetes data
LS0tCnRpdGxlOiAiRXhlcmNpc2UgNC4zOiBFeHBsb3JpbmcgdGhlIGRpYWJldGVzIGRhdGFzZXQiCmF1dGhvcjogIkxpZXZlbiBDbGVtZW50LCBKZXJvZW4gR2lsaXMgYW5kIE1pbGFuIE1hbGZhaXQiCmRhdGU6ICJzdGF0T21pY3MsIEdoZW50IFVuaXZlcnNpdHkgKGh0dHBzOi8vc3RhdG9taWNzLmdpdGh1Yi5pbykiCi0tLQoKIyBBaW1zIG9mIHRoaXMgZXhlcmNpc2UKCkluIHRoaXMgZXhlcmNpc2UsIHlvdSB3aWxsIGFjcXVpcmUgdGhlIHNraWxscyB0bwoKLSByZWNvZ25pemUgcGFpcmVkIGRhdGEKLSBjb25kdWN0IGEgZGF0YSBleHBsb3JhdGlvbiBpbiBSIGZvciBkYXRhIGZyb20KcGFpcmVkIGV4cGVyaW1lbnRhbCBkZXNpZ25zLgotIGludGVycHJldCB0aGUgcmVzdWx0cyBvZiBhIGRhdGEgZXhwbG9yYXRpb24gZm9yIHBhaXJlZCBleHBlcmltZW50YWwgZGVzaWducwoKIyBUaGUgZGlhYmV0ZXMgZGF0YXNldAoKVGhlIGBkaWFiZXRlc2AgZGF0YXNldCBob2xkcyBpbmZvcm1hdGlvbiBvbiBhIHNtYWxsIGV4cGVyaW1lbnQgd2l0aAo4IHBhdGllbnRzIHRoYXQgYXJlIHN1YmplY3RlZCB0byAgYSBnbHVjb3NlIHRvbGVyYW5jZSB0ZXN0LgoKUGF0aWVudHMgaGFkIHRvIGZhc3QgZm9yIGVpZ2h0IGhvdXJzIGJlZm9yZSB0aGUgdGVzdC4KV2hlbiB0aGUgcGF0aWVudHMgZW50ZXJlZCB0aGUgaG9zcGl0YWwgdGhlaXIgYmFzZWxpbmUgZ2x1Y29zZSBsZXZlbCB3YXMgbWVhc3VyZWQgKG1tb2wvbCkuCgpQYXRpZW50cyB0aGVuICBoYWQgdG8gZHJpbmsgMjUwIG1sIG9mIGEgc3lydXB5IGdsdWNvc2Ugc29sdXRpb24gY29udGFpbmluZyAxMDAgZ3JhbXMgb2Ygc3VnYXIuClR3byBob3VycyBsYXRlciwgdGhlaXIgYmxvb2QgZ2x1Y29zZSBsZXZlbCB3YXMgbWVhc3VyZWQgYWdhaW4uCgpUaGUgZGF0YSBjb25zaXN0IG9mIHRocmVlIHZhcmlhYmxlczoKCi0gYmVmb3JlOiBnbHVjb3NlIGNvbmNlbnRyYXRpb24gdXBvbiA4IGhvdXJzIG9mIGZhc3RpbmcgKG1tb2wvbCkKLSBhZnRlcjogZ2x1Y29zZSBjb25jZW50cmF0aW9uIDIgaG91cnMgYWZ0ZXIgZHJpbmtpbmcgZ2x1Y29zZSBzb2x1dGlvbiAobW1vbC9sKS4KLSBwYXRpZW50OiBpZGVudGlmaWVyIGZvciB0aGUgcGF0aWVudAoKIyBJbXBvcnQgdGhlIGRhdGEKCkRhdGEgcGF0aDoKCiAgYGh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9zdGF0T21pY3MvUFNMU0RhdGEvbWFpbi9kaWFiZXRlcy50eHRgCgpgYGB7ciwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpCmBgYAoKYGBge3IsIGV2YWwgPSBGQUxTRX0KZGlhYmV0ZXMgPC0gLi4uCmBgYAoKSGF2ZSBhIGZpcnN0IGxvb2sgYXQgdGhlIGRhdGEKCmBgYHtyfQoKYGBgCgojIERhdGEgdmlzdWFsaXphdGlvbgoKTm90ZSwgdGhhdCB0aGUgZGF0YXNldCBpcyBub3QgaW4gdGhlIHRpZHkgZm9ybWF0LiBUaGUgZ2x1Y29zZSBjb25jZW50cmF0aW9uCnZhcmlhYmxlIGlzIHNwcmVhZCBhcm91bmQgMiBjb2x1bW5zOiBgYmVmb3JlYCBhbmQgYGFmdGVyYCwgd2hpbGUgdGhlICJ0aW1lIgp2YXJpYWJsZSBpcyBlbmNvZGVkIGluIHRoZSBjb2x1bW4gbmFtZXMgaW5zdGVhZCBvZiBpbiBhIGRlZGljYXRlZCBjb2x1bW4uIERhdGEKaW4gdGhpcyBmb3JtIGlzIGFsc28gY2FsbGVkICp3aWRlKiBkYXRhLiBJbnN0ZWFkLCB3ZSB3YW50IHRvIHRyYW5zZm9ybSB0aGUgZGF0YQp0byBhICpsb25nKiBmb3JtYXQuCgpUbyB0aWR5IHRoZSBkYXRhLCB3ZSBjYW4gdXNlIHRoZSBgZ2F0aGVyKClgIGZ1bmN0aW9uIHRvCltwaXZvdF0oaHR0cHM6Ly9yNGRzLmhhZC5jby5uei90aWR5LWRhdGEuaHRtbCNwaXZvdGluZykgdGhlIGRhdGEuIEluIHRoaXMgY2FzZSwKd2Ugd2FudCB0byAiZ2F0aGVyIiB0aGUgYHRpbWVgIChlbmNvZGVkIGluIHRoZSBjb2x1bW4gbmFtZXMgYGJlZm9yZWAgYW5kCmBhZnRlcmApIGFuZCBgY29uY2VudHJhdGlvbmAgdmFyaWFibGVzICh3aGljaCBpcyBlbmNvZGVkIGluIHRoZSBhY3R1YWwgdmFsdWVzKS4KVGhlIGBwYXRpZW50YCBjb2x1bW4gc2hvdWxkIHN0YXkgdGhlIHNhbWUuIFdlIGNhbiBzcGVjaWZ5IHRoaXMgd2l0aCB0aGUKZm9sbG93aW5nIHN5bnRheC4KCmBgYHtyLCBldmFsPUZBTFNFfQpkaWFiZXRlc190aWR5IDwtIGRpYWJldGVzICU+JQogIGdhdGhlcih0aW1lLCBjb25jZW50cmF0aW9uLCAtcGF0aWVudCkKZGlhYmV0ZXNfdGlkeQpgYGAKCiMjIEJhcnBsb3QKCk5vdCBhbGwgdmlzdWFsaXphdGlvbiB0eXBlcyB3aWxsIGJlIGVxdWFsbHkgaW5mb3JtYXRpdmUuCgpBIGJhcnBsb3QgaXMgYSBwbG90IHRoYXQgeW91IHdpbGwKY29tbW9ubHkgZmluZCBpbiBzY2llbnRpZmljIHB1YmxpY2F0aW9ucy4KVGhlIGNvZGUgZm9yIGdlbmVyYXRpbmcgc3VjaCBhIGJhcnBsb3QKaXMgcHJvdmlkZWQgYmVsb3c6CgpgYGB7ciwgZXZhbD1GQUxTRX0KIGRpYWJldGVzX3RpZHkgJT4lCiAgIyMgQ2FsY3VsYXRlIHN1bW1hcnJ5IHN0YXRpc3RpY3MgZm9yIHRoZSAiYnAiIHZhcmlhYmxlIGZvciBlYWNoICJ0eXBlIgogIGdyb3VwX2J5KHRpbWUpICU+JQogIHN1bW1hcml6ZSgKICAgIG1lYW4gPSBtZWFuKGNvbmNlbnRyYXRpb24sIG5hLnJtID0gVFJVRSksCiAgICBzZCA9IHNkKGNvbmNlbnRyYXRpb24sIG5hLnJtID0gVFJVRSksCiAgICBuID0gbigpCiAgKSAlPiUKICAjIyBDb21wdXRlIHRoZSBzdGFuZGFyZCBlcnJvcnMgZm9yIHRoZSBtZWFucwogIG11dGF0ZShzZSA9IHNkIC8gc3FydChuKSkgJT4lCiAgZ2dwbG90KGFlcyh4ID0gdGltZSwgeSA9IG1lYW4sIGZpbGwgPSB0aW1lKSkgKwogIHRoZW1lX2J3KCkgKwogIGdlb21fYmFyKHN0YXQgPSAiaWRlbnRpdHkiKSArCiAgZ2VvbV9lcnJvcmJhcihhZXMoeW1pbiA9IG1lYW4gLSBzZSwgeW1heCA9IG1lYW4gKyBzZSksIHdpZHRoID0gMC4yKSArCiAgZ2d0aXRsZSgiQmFycGxvdCBvZiBnbHVjb3NlIG1lYXN1cmVtZW50cyIpICsKICB5bGFiKCJjb25jZW50cmF0aW9uIChtbW9sL2wpIikKYGBgCgoKQSBiYXJwbG90LCBob3dldmVyLCBpcyBub3QgdmVyeSBpbmZvcm1hdGl2ZS4KVGhlIGhlaWdodCBvZiB0aGUKYmFycyBvbmx5IHByb3ZpZGVzIHVzIHdpdGggaW5mb3JtYXRpb24gb2YgdGhlIG1lYW4gYmxvb2QgcHJlc3N1cmUuCkhvd2V2ZXIsIHdlIGRvbid0IHNlZSB0aGUgYWN0dWFsIHVuZGVybHlpbmcgdmFsdWVzLCBzbyB3ZSBmb3IKaW5zdGFuY2UgZG9uJ3QgaGF2ZSBhbnkgaW5mb3JtYXRpb24gb24gdGhlIHNwcmVhZCBvZiB0aGUgZGF0YS4KSXQgaXMgdXN1YWxseSBtb3JlIGluZm9ybWF0aXZlIHRvIHJlcHJlc2VudCB0byB1bmRlcmx5aW5nCnZhbHVlcyBhcyBfcmF3XyBhcyBwb3NzaWJsZS4KTm90ZSB0aGF0IGl0IGlzIHBvc3NpYmxlIHRvIGFkZCB0aGUKcmF3IGRhdGEgb24gdGhlIGJhcnBsb3QsIGJ1dCB3ZSBzdGlsbCB3b3VsZCBub3Qgc2VlIGFueSBtZWFzdXJlcwpvZiB0aGUgc3ByZWFkLCBzdWNoIGFzIHRoZSBpbnRlcnF1YXJ0aWxlIHJhbmdlLgoKQW5vdGhlciBjcnVjaWFsIGFzcGVjdCBvZiB0aGUgZGF0YSBhcmUgYWxzbyBub3QgZGlzcGxheWVkOgp0aGUgZGF0YSBhcmUgcGFpcmVkIQoKKipCYXNlZCBvbiB0aGVzZSBjcml0aXNpc21zLCBjYW4geW91IHRoaW5rIG9mIGEgYmV0dGVyKioKKip2aXN1YWxpemF0aW9uIHN0cmF0ZWd5IGZvciB0aGUgY2FwdG9wcmlsIGRhdGE/KioKCioqQWRkIHlvdXIgcHJvcG9zZWQgdmlzdWFsaXphdGlvbiBzdHJhdGVneSBoZXJlKioKCmBgYHtyLCBldmFsPUZBTFNFfQoKYGBgCgojIERlc2NyaXB0aXZlIHN0YXRpc3RpY3MKCi0gR2VuZXJhdGUgYSBjb2RlIGNodW5rIHRvIGNhbGN1bGF0ZSB1c2VmdWwgc3VtbWFyeSBzdGF0aXN0aWNzIGZvcgp0aGUgZGlhYmV0ZXMgZGF0YQo=