Canonical Correlation Analysis (CCA) is a multivariate data analysis method that aims at finding correlations between two multivariate data sets, \(X\) and \(Y\). The method looks for the linear combination of the \(X\)-variables and the linear combination of the \(Y\)-variables that show maximal correlation. When the number of variables in \(X\) and/or \(Y\) is very large (high-dimensional), the classical CCA method needs to be adapted to deal with the high dimensionality.

The aim of this homework assignment is:

You may consult the literature to find a description of the CCA method. Here I give one possible reference (it is a paper about an R package, but remember that you may not use this R package for the implementation):

González, I., Déjean, S., Martin, P. G., & Baccini, A. (2008). CCA: An R package to extend canonical correlation analysis. Journal of Statistical Software, 23(12), 1-14. http://dx.doi.org/10.18637/jss.v023.i12

The paper also describes a regularised CCA method (section 2.4), which is applicable to high-dimensional data. However, there are other high-dimensional CCA methods described in the literature. You are free to choose the regularised CCA from the paper, or any other appropriate high-dimensional CCA method.

Note that in the paper a cross-validation method is proposed for selecting e.g. the tuning parameters in the regularised CCA. You are not required to implement this. If tuning parameters are involved, you may set them manually to an arbitrary value (or play with it when analysing the dataset and set it to a value that seems appropriate to you – no need to motivate your choice).

You must apply your implemented method to the nutrimouse data, which is part of the CCA R package. More information about the data can be found in the paper. You must only look at the first two dimensions of the CCA, which will allow you to make two-dimensional graphs.

The dataset can be accessed in R as follows:

# Check if CCA package is installed and install it if it's not
if (!requireNamespace("CCA", quietly = TRUE)) {
    install.packages("CCA")
}

library(CCA)
data("nutrimouse")

X <- nutrimouse$gene  # the gene expression matrix
dim(X)
#> [1]  40 120
Y <- nutrimouse$lipid # the lipids matrix
dim(Y)
#> [1] 40 21

The assignment should be done alone or in groups of 2.

You should write a report containing the following:

The length of the written report (excluding R code, R output and graphs) should be about 2 pages.

It is recommended (but not mandatory) to prepare your report in RMarkdown. You can render it to either HTML (output: html_document) or to PDF (output: pdf_document). In both cases the original .Rmd file should be included when handing in the assignment. If you don’t use RMarkdown, you should include the .R file(s) containing your implementation and analysis scripts.

When submitting, please use the following format:

Submissions should be done through UFora under the “Assignments” tab (UFora-tools --> Assignments).

The deadline for submission is November 12th at 23:59.

LS0tCnRpdGxlOiAiSG9tZXdvcms6IENhbm9uaWNhbCBDb3JyZWxhdGlvbiBBbmFseXNpcyIKc3VidGl0bGU6ICJIaWdoIERpbWVuc2lvbmFsIERhdGEgQW5hbHlzaXMgMjAyMSIKZGF0ZTogIjI4IE9jdCAyMDIxIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogZmFsc2UKICAgIG51bWJlcl9zZWN0aW9uczogZmFsc2UKICBwZGZfZG9jdW1lbnQ6CiAgICB0b2M6IGZhbHNlCiAgICBudW1iZXJfc2VjdGlvbnM6IGZhbHNlCi0tLQoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0UsIGNhY2hlPUZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoCiAgY29sbGFwc2UgPSBUUlVFLAogIGNvbW1lbnQgPSAiIz4iCikKYGBgCgoqKioKCkNhbm9uaWNhbCBDb3JyZWxhdGlvbiBBbmFseXNpcyAoQ0NBKSBpcyBhIG11bHRpdmFyaWF0ZSBkYXRhIGFuYWx5c2lzIG1ldGhvZCB0aGF0IGFpbXMgYXQgZmluZGluZwpjb3JyZWxhdGlvbnMgYmV0d2VlbiB0d28gbXVsdGl2YXJpYXRlIGRhdGEgc2V0cywgJFgkIGFuZCAkWSQuIFRoZSBtZXRob2QgbG9va3MgZm9yIHRoZSBsaW5lYXIKY29tYmluYXRpb24gb2YgdGhlICRYJC12YXJpYWJsZXMgYW5kIHRoZSBsaW5lYXIgY29tYmluYXRpb24gb2YgdGhlICRZJC12YXJpYWJsZXMgdGhhdCBzaG93IG1heGltYWwKY29ycmVsYXRpb24uIFdoZW4gdGhlIG51bWJlciBvZiB2YXJpYWJsZXMgaW4gJFgkIGFuZC9vciAkWSQgaXMgdmVyeSBsYXJnZSAoaGlnaC1kaW1lbnNpb25hbCksIHRoZQpjbGFzc2ljYWwgQ0NBIG1ldGhvZCBuZWVkcyB0byBiZSBhZGFwdGVkIHRvIGRlYWwgd2l0aCB0aGUgaGlnaCBkaW1lbnNpb25hbGl0eS4KClRoZSBhaW0gb2YgdGhpcyBob21ld29yayBhc3NpZ25tZW50IGlzOgoKKiB0byB1bmRlcnN0YW5kIHRoZSBjbGFzc2ljYWwgQ0NBIG1ldGhvZCAoYmFzZWQgb24gdGhlIGxpdGVyYXR1cmUpIGFuZCBhIENDQSBtZXRob2QgZm9yCiAgaGlnaC1kaW1lbnNpb25hbCBkYXRhCgoqIHRvIGltcGxlbWVudCB0aGUgQ0NBIG1ldGhvZCBhbmQgaXRzIGhpZ2gtZGltZW5zaW9uYWwgdmVyc2lvbiAobm90IHVzaW5nIGV4aXN0aW5nIFIgcGFja2FnZXMgb3IgUgogIGZ1bmN0aW9ucyBmb3IgQ0NBKQoKKiBhcHBseSB0aGUgbWV0aG9kIHRvIGEgZGF0YXNldAoKWW91IG1heSBjb25zdWx0IHRoZSBsaXRlcmF0dXJlIHRvIGZpbmQgYSBkZXNjcmlwdGlvbiBvZiB0aGUgQ0NBIG1ldGhvZC4gSGVyZSBJIGdpdmUgb25lIHBvc3NpYmxlCnJlZmVyZW5jZSAoaXQgaXMgYSBwYXBlciBhYm91dCBhbiBSIHBhY2thZ2UsIGJ1dCByZW1lbWJlciB0aGF0IHlvdSBtYXkgbm90IHVzZSB0aGlzIFIgcGFja2FnZSBmb3IKdGhlIGltcGxlbWVudGF0aW9uKToKCkdvbnrDoWxleiwgSS4sIETDqWplYW4sIFMuLCBNYXJ0aW4sIFAuIEcuLCAmIEJhY2NpbmksIEEuICgyMDA4KS4gQ0NBOiBBbiBSIHBhY2thZ2UgdG8gZXh0ZW5kIGNhbm9uaWNhbApjb3JyZWxhdGlvbiBhbmFseXNpcy4gSm91cm5hbCBvZiBTdGF0aXN0aWNhbCBTb2Z0d2FyZSwgMjMoMTIpLCAxLTE0Lgo8aHR0cDovL2R4LmRvaS5vcmcvMTAuMTg2MzcvanNzLnYwMjMuaTEyPgoKVGhlIHBhcGVyIGFsc28gZGVzY3JpYmVzIGEgKnJlZ3VsYXJpc2VkIENDQSogbWV0aG9kIChzZWN0aW9uIDIuNCksIHdoaWNoIGlzIGFwcGxpY2FibGUgdG8KaGlnaC1kaW1lbnNpb25hbCBkYXRhLiBIb3dldmVyLCB0aGVyZSBhcmUgb3RoZXIgaGlnaC1kaW1lbnNpb25hbCBDQ0EgbWV0aG9kcyBkZXNjcmliZWQgaW4gdGhlCmxpdGVyYXR1cmUuIFlvdSBhcmUgZnJlZSB0byBjaG9vc2UgdGhlIHJlZ3VsYXJpc2VkIENDQSBmcm9tIHRoZSBwYXBlciwgb3IgYW55IG90aGVyIGFwcHJvcHJpYXRlCmhpZ2gtZGltZW5zaW9uYWwgQ0NBIG1ldGhvZC4KCk5vdGUgdGhhdCBpbiB0aGUgcGFwZXIgYSBjcm9zcy12YWxpZGF0aW9uIG1ldGhvZCBpcyBwcm9wb3NlZCBmb3Igc2VsZWN0aW5nIGUuZy4gdGhlIHR1bmluZwpwYXJhbWV0ZXJzIGluIHRoZSByZWd1bGFyaXNlZCBDQ0EuIF9fKllvdSBhcmUgbm90IHJlcXVpcmVkIHRvIGltcGxlbWVudCB0aGlzKl9fLiBJZiB0dW5pbmcKcGFyYW1ldGVycyBhcmUgaW52b2x2ZWQsIHlvdSBtYXkgc2V0IHRoZW0gbWFudWFsbHkgdG8gYW4gYXJiaXRyYXJ5IHZhbHVlIChvciBwbGF5IHdpdGggaXQgd2hlbgphbmFseXNpbmcgdGhlIGRhdGFzZXQgYW5kIHNldCBpdCB0byBhIHZhbHVlIHRoYXQgc2VlbXMgYXBwcm9wcmlhdGUgdG8geW91IC0tIG5vIG5lZWQgdG8gbW90aXZhdGUKeW91ciBjaG9pY2UpLgoKWW91IG11c3QgYXBwbHkgeW91ciBpbXBsZW1lbnRlZCBtZXRob2QgdG8gdGhlICoqbnV0cmltb3VzZSoqIGRhdGEsIHdoaWNoIGlzIHBhcnQgb2YgdGhlCmByIEJpb2NTdHlsZTo6Q1JBTnBrZygiQ0NBIilgIFIgcGFja2FnZS4gTW9yZSBpbmZvcm1hdGlvbiBhYm91dCB0aGUgZGF0YSBjYW4gYmUgZm91bmQgaW4gdGhlIHBhcGVyLgpZb3UgbXVzdCBvbmx5IGxvb2sgYXQgdGhlIGZpcnN0IHR3byBkaW1lbnNpb25zIG9mIHRoZSBDQ0EsIHdoaWNoIHdpbGwgYWxsb3cgeW91IHRvIG1ha2UKdHdvLWRpbWVuc2lvbmFsIGdyYXBocy4KClRoZSBkYXRhc2V0IGNhbiBiZSBhY2Nlc3NlZCBpbiBSIGFzIGZvbGxvd3M6CgpgYGB7ciwgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KIyBDaGVjayBpZiBDQ0EgcGFja2FnZSBpcyBpbnN0YWxsZWQgYW5kIGluc3RhbGwgaXQgaWYgaXQncyBub3QKaWYgKCFyZXF1aXJlTmFtZXNwYWNlKCJDQ0EiLCBxdWlldGx5ID0gVFJVRSkpIHsKICAgIGluc3RhbGwucGFja2FnZXMoIkNDQSIpCn0KCmxpYnJhcnkoQ0NBKQpkYXRhKCJudXRyaW1vdXNlIikKClggPC0gbnV0cmltb3VzZSRnZW5lICAjIHRoZSBnZW5lIGV4cHJlc3Npb24gbWF0cml4CmRpbShYKQpZIDwtIG51dHJpbW91c2UkbGlwaWQgIyB0aGUgbGlwaWRzIG1hdHJpeApkaW0oWSkKYGBgCgpUaGUgYXNzaWdubWVudCBzaG91bGQgYmUgZG9uZSBfX2Fsb25lX18gb3IgaW4gX19ncm91cHMgb2YgMl9fLgoKWW91IHNob3VsZCB3cml0ZSBhIHJlcG9ydCBjb250YWluaW5nIHRoZSBmb2xsb3dpbmc6CgoqIEEgc2hvcnQgKG1hdGhlbWF0aWNhbCkgZGVzY3JpcHRpb24gb2YgdGhlIENDQSBtZXRob2RzIChjbGFzc2ljYWwgYW5kIGhpZ2gtZGltZW5zaW9uYWwpIHRoYXQgeW91CiAgaGF2ZSBpbXBsZW1lbnRlZAoKKiBUaGUgYXBwbGljYXRpb24gb2YgeW91ciBtZXRob2QgdG8gdGhlICpudXRyaW1vdXNlKiBkYXRhCgogIC0gQ2xhc3NpY2FsIENDQSBvbiBtdWx0aXZhcmlhdGUgZGF0YSB3aXRoICRwIDwgbiQuICgqSGludDogaXQgd2lsbCBub3QgYmUgcG9zc2libGUgdG8gYXBwbHkgdGhlCiAgICBjbGFzc2ljYWwgQ0NBIG1ldGhvZCB0byB0aGUgZnVsbCBkYXRhIG1hdHJpeCAkWCQuIFlvdSBzaG91bGQgc3Vic2V0IHRoZSBkYXRhIHRvIHJlZmxlY3QgdGhlIGNhc2UKICAgIG9mICRwIDwgbiQqLikKCiAgLSBIaWdoLWRpbWVuc2lvbmFsIENDQSBvbiBkYXRhIHdpdGggJHAgPiBuJAoKKiBJbnRlcnByZXRhdGlvbiBhbmQgY29uY2x1c2lvbiBvZiB0aGUgZGF0YSBhbmFseXNpcyByZXN1bHRzCgpUaGUgbGVuZ3RoIG9mIHRoZSB3cml0dGVuIHJlcG9ydCAoZXhjbHVkaW5nIFIgY29kZSwgUiBvdXRwdXQgYW5kIGdyYXBocykgc2hvdWxkIGJlIGFib3V0IDIgcGFnZXMuCgpJdCBpcyByZWNvbW1lbmRlZCAoYnV0IG5vdCBtYW5kYXRvcnkpIHRvIHByZXBhcmUgeW91ciByZXBvcnQgaW4gX19STWFya2Rvd25fXy4gWW91IGNhbiByZW5kZXIgaXQgdG8KZWl0aGVyIEhUTUwgKGBvdXRwdXQ6IGh0bWxfZG9jdW1lbnRgKSBvciB0byBQREYgKGBvdXRwdXQ6IHBkZl9kb2N1bWVudGApLiBJbiBib3RoIGNhc2VzIHRoZSBvcmlnaW5hbApgLlJtZGAgZmlsZSBzaG91bGQgYmUgaW5jbHVkZWQgd2hlbiBoYW5kaW5nIGluIHRoZSBhc3NpZ25tZW50LiBJZiB5b3UgZG9uJ3QgdXNlIFJNYXJrZG93biwgeW91CnNob3VsZCBpbmNsdWRlIHRoZSBgLlJgIGZpbGUocykgY29udGFpbmluZyB5b3VyIGltcGxlbWVudGF0aW9uIGFuZCBhbmFseXNpcyBzY3JpcHRzLgoKV2hlbiBzdWJtaXR0aW5nLCBwbGVhc2UgdXNlIHRoZSBmb2xsb3dpbmcgZm9ybWF0OgoKKiBSZXBvcnQ6IGBIVy1OYW1lMS1OYW1lMi5bcGRmfGh0bWxdYAoKKiBTb3VyY2UgY29kZGU6IGBIVy1OYW1lMS1OYW1lMi5SbWRgIChvciBgSFcxLU5hbWUxLU5hbWUyLlJgKQoKU3VibWlzc2lvbnMgc2hvdWxkIGJlIGRvbmUgX190aHJvdWdoIFtVRm9yYV0oaHR0cHM6Ly91Zm9yYS51Z2VudC5iZS9kMmwvaG9tZS80NDQyMjYpX18gdW5kZXIgdGhlCiJBc3NpZ25tZW50cyIgdGFiIChgVUZvcmEtdG9vbHMgLS0+IEFzc2lnbm1lbnRzYCkuCgo8c3BhbiBzdHlsZT0iY29sb3I6cmVkIj5fX1RoZSBkZWFkbGluZSBmb3Igc3VibWlzc2lvbiBpcyBOb3ZlbWJlciAxMnRoIGF0IDIzOjU5Ll9fPC9zcGFuPgo=