Creating and manipulating matrices

We will create a \(4 \times 2\) matrix, starting from a vector of values.

## Define the matrix X, filling matrix "byrow": 4 x 2
X <- matrix(c(3, 6, 1, 2, 0.23, 0.46, -2, -4), nrow = 4, byrow = TRUE)
X
#>       [,1]  [,2]
#> [1,]  3.00  6.00
#> [2,]  1.00  2.00
#> [3,]  0.23  0.46
#> [4,] -2.00 -4.00

## Alternative way to define the matrix, `byrow = FALSE` is the default
X <- matrix(c(3, 1, 0.23, -2, 6, 2, 0.46, -4), ncol = 2, byrow = FALSE)
X
#>       [,1]  [,2]
#> [1,]  3.00  6.00
#> [2,]  1.00  2.00
#> [3,]  0.23  0.46
#> [4,] -2.00 -4.00

# Creating a second matrix: 2 x 4, using the square of `X`
Y <- matrix(X^2, nrow = 2, ncol = 4)
Y
#>      [,1]   [,2] [,3]    [,4]
#> [1,]    9 0.0529   36  0.2116
#> [2,]    1 4.0000    4 16.0000

## Check that we really have a matrix
class(X)
#> [1] "matrix" "array"

Generally you can check the arguments of a function, e.g. matrix, in R with ?matrix.

The dim function tells you the dimensions of the matrix, nrow and ncol give you the number of rows and columns, respectively.

dim(X)  # nr. of rows, nr. of columns
#> [1] 4 2
nrow(X)
#> [1] 4
ncol(X)
#> [1] 2

dim(Y)
#> [1] 2 4

Subsetting

Note that R sees a matrix as X[row, column], so you can subset X with

X[1, ] # first row of X
#> [1] 3 6
X[, 2] # second column of X
#> [1]  6.00  2.00  0.46 -4.00
X[1:2, ] # first 2 rows of X
#>      [,1] [,2]
#> [1,]    3    6
#> [2,]    1    2

Note that, by default, subsetting a single row or column from a matrix returns a vector, not a column- or row-matrix as might be expected. We can see this from the fact that the subset does not have dimensions anymore (an attribute only defined for matrices in R):

dim(X[, 1]) # NULL = non-existent
#> NULL
class(X[, 1]) # not matrix but numeric vector
#> [1] "numeric"

This can lead to unexpected behavior when doing matrix multiplication (see further below). To avoid this, we can set drop = FALSE in the subsetting brackets.

## This creates a "column-vector"
X[, 1, drop = FALSE]
#>       [,1]
#> [1,]  3.00
#> [2,]  1.00
#> [3,]  0.23
#> [4,] -2.00
dim(X[, 1, drop = FALSE])
#> [1] 4 1

Row- and column-binding

You can also create matrices by ‘binding’ together vectors (of the same length) column-wise or row-wise with cbind() and rbind(), respectively.

# Create vectors from the rows of X
(x1 <- X[1, ])
#> [1] 3 6
(x2 <- X[2, ])
#> [1] 1 2
(x3 <- X[3, ])
#> [1] 0.23 0.46
(x4 <- X[4, ])
#> [1] -2 -4

# Bind them back together row-wise: re-creates X
rbind(x1, x2, x3, x4)
#>     [,1]  [,2]
#> x1  3.00  6.00
#> x2  1.00  2.00
#> x3  0.23  0.46
#> x4 -2.00 -4.00

# Bind them back together column-wise: (essentially the transpose of X)
cbind(x1, x2, x3, x4)
#>      x1 x2   x3 x4
#> [1,]  3  1 0.23 -2
#> [2,]  6  2 0.46 -4

Binding together 2 matrices also works (but be mindful of their dimensions!).

# Row-binding two matrices
# X and Y don't have the same dimensions (4x2 vs. 2x4),
# so binding them directly doesn't work
rbind(X, Y)
#> Error in rbind(...): number of columns of matrices must match (see arg 2)

# For rbind we need the same number of columns
rbind(X, Y[, 1:2]) # works
#>       [,1]    [,2]
#> [1,]  3.00  6.0000
#> [2,]  1.00  2.0000
#> [3,]  0.23  0.4600
#> [4,] -2.00 -4.0000
#> [5,]  9.00  0.0529
#> [6,]  1.00  4.0000

# For cbind, same number of rows
cbind(X[1:2, ], Y)
#>      [,1] [,2] [,3]   [,4] [,5]    [,6]
#> [1,]    3    6    9 0.0529   36  0.2116
#> [2,]    1    2    1 4.0000    4 16.0000

Note that R automatically creates row names or column names from the variable names when using rbind() and cbind().

To add row and columns names to an existing matrix, use rownames(), colnames() or the more general dimnames().

rownames(X) <- c("row_1", "row_2", "row_3", "row_4")
colnames(X) <- c("col_1", "col_2")
X
#>       col_1 col_2
#> row_1  3.00  6.00
#> row_2  1.00  2.00
#> row_3  0.23  0.46
#> row_4 -2.00 -4.00

## Using `NULL` removes the dimnames again
dimnames(X) <- NULL
X
#>       [,1]  [,2]
#> [1,]  3.00  6.00
#> [2,]  1.00  2.00
#> [3,]  0.23  0.46
#> [4,] -2.00 -4.00

Diagonal elements and diagonal matrices: diag()

## Get diagonal elements from X, returns vector
diag(X)
#> [1] 3 2

## Create diagonal matrix, vector as input, returns matrix
diag(c(1, 2, 3 ,4))
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    2    0    0
#> [3,]    0    0    3    0
#> [4,]    0    0    0    4

Identity matrices

The diag function can also be used to create identity matrices with specific dimensions

diag(1, nrow = 4, ncol = 4)  # 4 x 4
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
#> [3,]    0    0    1    0
#> [4,]    0    0    0    1
diag(1, nrow = 2, ncol = 4)  # 2 x 4
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
diag(1, nrow = 4, ncol = 2)  # 4 x 2
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1
#> [3,]    0    0
#> [4,]    0    0

Math operations

Element-wise operations

Scalars and vectors

Scalar operations occur element-wise for matrices.

X # for reference
#>       [,1]  [,2]
#> [1,]  3.00  6.00
#> [2,]  1.00  2.00
#> [3,]  0.23  0.46
#> [4,] -2.00 -4.00
X + 1
#>       [,1]  [,2]
#> [1,]  4.00  7.00
#> [2,]  2.00  3.00
#> [3,]  1.23  1.46
#> [4,] -1.00 -3.00
X - 100
#>         [,1]    [,2]
#> [1,]  -97.00  -94.00
#> [2,]  -99.00  -98.00
#> [3,]  -99.77  -99.54
#> [4,] -102.00 -104.00
2 * X
#>       [,1]  [,2]
#> [1,]  6.00 12.00
#> [2,]  2.00  4.00
#> [3,]  0.46  0.92
#> [4,] -4.00 -8.00
X / 3
#>             [,1]       [,2]
#> [1,]  1.00000000  2.0000000
#> [2,]  0.33333333  0.6666667
#> [3,]  0.07666667  0.1533333
#> [4,] -0.66666667 -1.3333333
X^2
#>        [,1]    [,2]
#> [1,] 9.0000 36.0000
#> [2,] 1.0000  4.0000
#> [3,] 0.0529  0.2116
#> [4,] 4.0000 16.0000

Note that ^2 also occurs element-wise, not using matrix multiplication (see further below)!

You can also do operations with vectors, though you should be careful with the dimensions. R will add the vector to each column of the matrix. When using a vector with a different length than the number of rows (elements in a column), the vector elements will be recycled. When the longer object is not a multiple of the shorter, the operation still occurs but a warning is produced (technically a message).

X
#>       [,1]  [,2]
#> [1,]  3.00  6.00
#> [2,]  1.00  2.00
#> [3,]  0.23  0.46
#> [4,] -2.00 -4.00
v4 <- c(1, 2, 3, 4)  # vector of length 4
X + v4
#>      [,1] [,2]
#> [1,] 4.00 7.00
#> [2,] 3.00 4.00
#> [3,] 3.23 3.46
#> [4,] 2.00 0.00

v2 <- c(1, 2)  # vector of length 2, will be recycled
X + v2
#>      [,1]  [,2]
#> [1,] 4.00  7.00
#> [2,] 3.00  4.00
#> [3,] 1.23  1.46
#> [4,] 0.00 -2.00

v3 <- c(1, 2, 3) # vector or length 3, also recycled with warning
X + v3
#> Warning in X + v3: longer object length is not a multiple of shorter object
#> length
#>       [,1]  [,2]
#> [1,]  4.00  8.00
#> [2,]  3.00  5.00
#> [3,]  3.23  1.46
#> [4,] -1.00 -2.00

v5 <- c(1, 2, 3, 4, 5) # vector or length 5, also recycled with warning
X + v5
#> Warning in X + v5: longer object length is not a multiple of shorter object
#> length
#>      [,1]  [,2]
#> [1,] 4.00 11.00
#> [2,] 3.00  3.00
#> [3,] 3.23  2.46
#> [4,] 2.00 -1.00

Two matrices

Define a second matrix Y with same dimensions as X.

# We will generate a matrix Y with 2s and 4s at random places (using `sample`)
# Setting a seed allows reproducibility
set.seed(1)
Y <- matrix(sample(c(2, 4), size = nrow(X) * ncol(X), replace = TRUE), 
            nrow = nrow(X), ncol = ncol(X))
Y
#>      [,1] [,2]
#> [1,]    2    4
#> [2,]    4    2
#> [3,]    2    2
#> [4,]    2    2

Using the standard math operators occurs element-wise.

X + Y
#>      [,1]  [,2]
#> [1,] 5.00 10.00
#> [2,] 5.00  4.00
#> [3,] 2.23  2.46
#> [4,] 0.00 -2.00
X - Y
#>       [,1]  [,2]
#> [1,]  1.00  2.00
#> [2,] -3.00  0.00
#> [3,] -1.77 -1.54
#> [4,] -4.00 -6.00
X * Y
#>       [,1]  [,2]
#> [1,]  6.00 24.00
#> [2,]  4.00  4.00
#> [3,]  0.46  0.92
#> [4,] -4.00 -8.00
X / Y
#>        [,1]  [,2]
#> [1,]  1.500  1.50
#> [2,]  0.250  1.00
#> [3,]  0.115  0.23
#> [4,] -1.000 -2.00
X ^ Y
#>        [,1]      [,2]
#> [1,] 9.0000 1296.0000
#> [2,] 1.0000    4.0000
#> [3,] 0.0529    0.2116
#> [4,] 4.0000   16.0000

Matrix algebra

https://www.statmethods.net/advstats/matrix.html

Transpose: t()

\(\mathbf{X}^T\)

t(X)
#>      [,1] [,2] [,3] [,4]
#> [1,]    3    1 0.23   -2
#> [2,]    6    2 0.46   -4

Matrix multiplication: %*%

Note that X and Y have the same dimensions, so multiplying them straight away doesn’t work (incompatible dimensions)

## This produces an error because the dimensions are not compatible
X %*% Y
#> Error in X %*% Y: non-conformable arguments

Instead, we can transpose X and compute \(\mathbf{X}^T \cdot \mathbf{Y}\):

t(X) %*% Y
#>       [,1]  [,2]
#> [1,]  6.46 10.46
#> [2,] 12.92 20.92

## `crossprod()` is a shortcut for this operation, and is slightly faster
crossprod(X, Y)
#>       [,1]  [,2]
#> [1,]  6.46 10.46
#> [2,] 12.92 20.92

## `tcrossprod()` calculates X.Y^T
X %*% t(Y)
#>       [,1]   [,2]   [,3]   [,4]
#> [1,]  30.0  24.00  18.00  18.00
#> [2,]  10.0   8.00   6.00   6.00
#> [3,]   2.3   1.84   1.38   1.38
#> [4,] -20.0 -16.00 -12.00 -12.00
tcrossprod(X, Y)
#>       [,1]   [,2]   [,3]   [,4]
#> [1,]  30.0  24.00  18.00  18.00
#> [2,]  10.0   8.00   6.00   6.00
#> [3,]   2.3   1.84   1.38   1.38
#> [4,] -20.0 -16.00 -12.00 -12.00

The crossprod() and tcrossprod() functions also work with a single matrix as input, in which case they calculate \(\mathbf{X}^T\mathbf{X}\) and \(\mathbf{X}\mathbf{X}^T\), respectively:

## X^TX
crossprod(X)
#>         [,1]    [,2]
#> [1,] 14.0529 28.1058
#> [2,] 28.1058 56.2116
## XX^T
tcrossprod(X)
#>        [,1]   [,2]    [,3]  [,4]
#> [1,]  45.00  15.00  3.4500 -30.0
#> [2,]  15.00   5.00  1.1500 -10.0
#> [3,]   3.45   1.15  0.2645  -2.3
#> [4,] -30.00 -10.00 -2.3000  20.0

As already mentioned before, subsetting a matrix with [ returns a vector by default. Doing matrix multiplication with vectors in R can be ambiguous as R will “guess” itself whether the vector should be treated as a column- or row-vector. See ?`%*%` for details.

The example below illustrates some potentially unexpected results

## Selecting the first column of X, we would expect this to be treated as a column
## vector with dimensions 4 x 1...
X[, 1]
#> [1]  3.00  1.00  0.23 -2.00
dim(X[, 1])
#> NULL
## ...but in R vectors don't have dimensions

## Mathematically, multiplicating the first column of X with X is not possible
## (multiplying a 4x1 with a 4x2 matrix doesn't work)
## So we would expect the following code to throw an error...
X[, 1] %*% X
#>         [,1]    [,2]
#> [1,] 14.0529 28.1058
## ...but it doesn't and instead returns the result as if X[, 1] is a row-vector

To avoid confusion and when you intend to use single rows or vectors from a matrix as they were single-row or single-column matrices, it’s better to use drop = FALSE when subsetting.

X1 <- X[, 1, drop = FALSE]
## Now X1 does have dimensions
dim(X1)
#> [1] 4 1

## And the following code throws an error
X1 %*% X
#> Error in X1 %*% X: non-conformable arguments

You can also do matrix multiplication with 2 vectors, in which case again R will guess if they should be treated as column- or row-vectors.

(a <- seq_len(3))
#> [1] 1 2 3
(b <- a + 1)
#> [1] 2 3 4
a %*% b
#>      [,1]
#> [1,]   20

Inverse: solve()

Inverse: \(\mathbf{X}^{-1}\) (only for square matrices)

## Create new square matrix Z
(Z <- matrix(1:4, nrow = 2, ncol = 2))
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4

## Compute inverse
(Z_inv <- solve(Z))
#>      [,1] [,2]
#> [1,]   -2  1.5
#> [2,]    1 -0.5

## Check that it's indeed the inverse
Z %*% Z_inv  # the 2x2 identity matrix
#>      [,1] [,2]
#> [1,]    1    0
#> [2,]    0    1

Warning: DON’T USE X^-1 to compute a matrix inverse!

The notation might be confusing because of the way we mathematically describe the inverse of a matrix (\(\mathbf{X}^{-1}\)). In R, however, X^-1 computes element-wise inverses of each value.

Z^-1
#>      [,1]      [,2]
#> [1,]  1.0 0.3333333
#> [2,]  0.5 0.2500000

If you’re wondering why the name “solve” is used for the function to compute a matrix inverse, it’s because solve can do more than just that. See ?solve for details.

Rank of a matrix

Calculating the rank of the matrix \(\mathbf X\). There are several ways to do this in R. We’ll use the qr() function (see ?qr). An alternative would be to use rankMatrix from the Matrix package: Matrix::rankMatrix().

## Run `qr` and access the `rank` from the results with `$`
rank <- qr(X)$rank
rank
#> [1] 1

We see that the rank of X is lower than its number of columns and rows, meaning the matrix is not of full rank.

Considering the contents of X, can you see why this is?

Hint:

all(X[, 2] == 2 * X[, 1])
#> [1] TRUE

Session info

Session info

#> [1] "2021-05-11 14:31:59 UTC"
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2021-05-11                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                * version  date       lib source        
#>  AnnotationDbi            1.52.0   2020-10-27 [1] Bioconductor  
#>  AnnotationHub          * 2.22.1   2021-04-16 [1] Bioconductor  
#>  assertthat               0.2.1    2019-03-21 [1] CRAN (R 4.0.2)
#>  backports                1.2.1    2020-12-09 [1] CRAN (R 4.0.2)
#>  beachmat                 2.6.4    2020-12-20 [1] Bioconductor  
#>  beeswarm                 0.3.1    2021-03-07 [1] CRAN (R 4.0.2)
#>  Biobase                * 2.50.0   2020-10-27 [1] Bioconductor  
#>  BiocFileCache          * 1.14.0   2020-10-27 [1] Bioconductor  
#>  BiocGenerics           * 0.36.1   2021-04-16 [1] Bioconductor  
#>  BiocManager              1.30.15  2021-05-11 [1] CRAN (R 4.0.5)
#>  BiocNeighbors            1.8.2    2020-12-07 [1] Bioconductor  
#>  BiocParallel             1.24.1   2020-11-06 [1] Bioconductor  
#>  BiocSingular             1.6.0    2020-10-27 [1] Bioconductor  
#>  BiocVersion              3.12.0   2020-05-14 [1] Bioconductor  
#>  bit                      4.0.4    2020-08-04 [1] CRAN (R 4.0.2)
#>  bit64                    4.0.5    2020-08-30 [1] CRAN (R 4.0.2)
#>  bitops                   1.0-7    2021-04-24 [1] CRAN (R 4.0.2)
#>  blob                     1.2.1    2020-01-20 [1] CRAN (R 4.0.2)
#>  broom                    0.7.6    2021-04-05 [1] CRAN (R 4.0.2)
#>  bslib                    0.2.4    2021-01-25 [1] CRAN (R 4.0.2)
#>  cachem                   1.0.4    2021-02-13 [1] CRAN (R 4.0.2)
#>  CCA                    * 1.2.1    2021-03-01 [1] CRAN (R 4.0.2)
#>  cellranger               1.1.0    2016-07-27 [1] CRAN (R 4.0.2)
#>  cli                      2.5.0    2021-04-26 [1] CRAN (R 4.0.2)
#>  cluster                * 2.1.2    2021-04-17 [1] CRAN (R 4.0.2)
#>  colorspace               2.0-1    2021-05-04 [1] CRAN (R 4.0.2)
#>  crayon                   1.4.1    2021-02-08 [1] CRAN (R 4.0.2)
#>  curl                     4.3.1    2021-04-30 [1] CRAN (R 4.0.2)
#>  DBI                      1.1.1    2021-01-15 [1] CRAN (R 4.0.2)
#>  dbplyr                 * 2.1.1    2021-04-06 [1] CRAN (R 4.0.2)
#>  DelayedArray             0.16.3   2021-03-24 [1] Bioconductor  
#>  DelayedMatrixStats       1.12.3   2021-02-03 [1] Bioconductor  
#>  digest                   0.6.27   2020-10-24 [1] CRAN (R 4.0.2)
#>  dotCall64              * 1.0-1    2021-02-11 [1] CRAN (R 4.0.2)
#>  dplyr                  * 1.0.6    2021-05-05 [1] CRAN (R 4.0.2)
#>  ellipsis                 0.3.2    2021-04-29 [1] CRAN (R 4.0.2)
#>  evaluate                 0.14     2019-05-28 [1] CRAN (R 4.0.1)
#>  ExperimentHub          * 1.16.1   2021-04-16 [1] Bioconductor  
#>  fansi                    0.4.2    2021-01-15 [1] CRAN (R 4.0.2)
#>  farver                   2.1.0    2021-02-28 [1] CRAN (R 4.0.2)
#>  fastmap                  1.1.0    2021-01-25 [1] CRAN (R 4.0.2)
#>  fda                    * 5.1.9    2020-12-16 [1] CRAN (R 4.0.2)
#>  fds                    * 1.8      2018-10-31 [1] CRAN (R 4.0.2)
#>  fields                 * 11.6     2020-10-09 [1] CRAN (R 4.0.2)
#>  forcats                * 0.5.1    2021-01-27 [1] CRAN (R 4.0.2)
#>  fs                       1.5.0    2020-07-31 [1] CRAN (R 4.0.2)
#>  generics                 0.1.0    2020-10-31 [1] CRAN (R 4.0.2)
#>  GenomeInfoDb           * 1.26.7   2021-04-08 [1] Bioconductor  
#>  GenomeInfoDbData         1.2.4    2021-05-11 [1] Bioconductor  
#>  GenomicRanges          * 1.42.0   2020-10-27 [1] Bioconductor  
#>  ggbeeswarm               0.6.0    2017-08-07 [1] CRAN (R 4.0.2)
#>  ggplot2                * 3.3.3    2020-12-30 [1] CRAN (R 4.0.2)
#>  glue                     1.4.2    2020-08-27 [1] CRAN (R 4.0.2)
#>  gridExtra                2.3      2017-09-09 [1] CRAN (R 4.0.2)
#>  gtable                   0.3.0    2019-03-25 [1] CRAN (R 4.0.2)
#>  haven                    2.4.1    2021-04-23 [1] CRAN (R 4.0.2)
#>  hdrcde                   3.4      2021-01-18 [1] CRAN (R 4.0.2)
#>  highr                    0.9      2021-04-16 [1] CRAN (R 4.0.2)
#>  hms                      1.0.0    2021-01-13 [1] CRAN (R 4.0.2)
#>  htmltools                0.5.1.1  2021-01-22 [1] CRAN (R 4.0.2)
#>  httpuv                   1.6.1    2021-05-07 [1] CRAN (R 4.0.2)
#>  httr                     1.4.2    2020-07-20 [1] CRAN (R 4.0.2)
#>  interactiveDisplayBase   1.28.0   2020-10-27 [1] Bioconductor  
#>  IRanges                * 2.24.1   2020-12-12 [1] Bioconductor  
#>  irlba                    2.3.3    2019-02-05 [1] CRAN (R 4.0.2)
#>  jpeg                     0.1-8.1  2019-10-24 [1] CRAN (R 4.0.2)
#>  jquerylib                0.1.4    2021-04-26 [1] CRAN (R 4.0.2)
#>  jsonlite                 1.7.2    2020-12-09 [1] CRAN (R 4.0.2)
#>  KernSmooth               2.23-18  2020-10-29 [2] CRAN (R 4.0.5)
#>  knitr                    1.33     2021-04-24 [1] CRAN (R 4.0.2)
#>  ks                       1.12.0   2021-02-07 [1] CRAN (R 4.0.2)
#>  labeling                 0.4.2    2020-10-20 [1] CRAN (R 4.0.2)
#>  later                    1.2.0    2021-04-23 [1] CRAN (R 4.0.2)
#>  lattice                  0.20-41  2020-04-02 [2] CRAN (R 4.0.5)
#>  lifecycle                1.0.0    2021-02-15 [1] CRAN (R 4.0.2)
#>  lubridate                1.7.10   2021-02-26 [1] CRAN (R 4.0.2)
#>  magrittr                 2.0.1    2020-11-17 [1] CRAN (R 4.0.2)
#>  maps                     3.3.0    2018-04-03 [1] CRAN (R 4.0.2)
#>  MASS                   * 7.3-53.1 2021-02-12 [2] CRAN (R 4.0.5)
#>  Matrix                 * 1.3-2    2021-01-06 [2] CRAN (R 4.0.5)
#>  MatrixGenerics         * 1.2.1    2021-01-30 [1] Bioconductor  
#>  matrixStats            * 0.58.0   2021-01-29 [1] CRAN (R 4.0.2)
#>  mclust                   5.4.7    2020-11-20 [1] CRAN (R 4.0.2)
#>  memoise                  2.0.0    2021-01-26 [1] CRAN (R 4.0.2)
#>  mime                     0.10     2021-02-13 [1] CRAN (R 4.0.2)
#>  misc3d                   0.9-0    2020-09-06 [1] CRAN (R 4.0.2)
#>  modelr                   0.1.8    2020-05-19 [1] CRAN (R 4.0.2)
#>  munsell                  0.5.0    2018-06-12 [1] CRAN (R 4.0.2)
#>  muscData               * 1.4.0    2020-10-29 [1] Bioconductor  
#>  mvtnorm                  1.1-1    2020-06-09 [1] CRAN (R 4.0.2)
#>  pcaPP                  * 1.9-74   2021-04-23 [1] CRAN (R 4.0.2)
#>  pillar                   1.6.0    2021-04-13 [1] CRAN (R 4.0.2)
#>  pkgconfig                2.0.3    2019-09-22 [1] CRAN (R 4.0.2)
#>  plot3D                 * 1.3      2019-12-18 [1] CRAN (R 4.0.2)
#>  promises                 1.2.0.1  2021-02-11 [1] CRAN (R 4.0.2)
#>  ps                       1.6.0    2021-02-28 [1] CRAN (R 4.0.2)
#>  purrr                  * 0.3.4    2020-04-17 [1] CRAN (R 4.0.2)
#>  R6                       2.5.0    2020-10-28 [1] CRAN (R 4.0.2)
#>  rainbow                * 3.6      2019-01-29 [1] CRAN (R 4.0.2)
#>  rappdirs                 0.3.3    2021-01-31 [1] CRAN (R 4.0.2)
#>  Rcpp                     1.0.6    2021-01-15 [1] CRAN (R 4.0.2)
#>  RCurl                  * 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)
#>  readr                  * 1.4.0    2020-10-05 [1] CRAN (R 4.0.2)
#>  readxl                   1.3.1    2019-03-13 [1] CRAN (R 4.0.2)
#>  reprex                   2.0.0    2021-04-02 [1] CRAN (R 4.0.2)
#>  rlang                    0.4.11   2021-04-30 [1] CRAN (R 4.0.2)
#>  rmarkdown                2.8      2021-05-07 [1] CRAN (R 4.0.2)
#>  RSQLite                  2.2.7    2021-04-22 [1] CRAN (R 4.0.2)
#>  rstudioapi               0.13     2020-11-12 [1] CRAN (R 4.0.2)
#>  rsvd                     1.0.5    2021-04-16 [1] CRAN (R 4.0.2)
#>  rvest                    1.0.0    2021-03-09 [1] CRAN (R 4.0.2)
#>  S4Vectors              * 0.28.1   2020-12-09 [1] Bioconductor  
#>  sass                     0.3.1    2021-01-24 [1] CRAN (R 4.0.2)
#>  scales                   1.1.1    2020-05-11 [1] CRAN (R 4.0.2)
#>  scater                 * 1.18.6   2021-02-26 [1] Bioconductor  
#>  scuttle                  1.0.4    2020-12-17 [1] Bioconductor  
#>  sessioninfo              1.1.1    2018-11-05 [1] CRAN (R 4.0.2)
#>  shiny                    1.6.0    2021-01-25 [1] CRAN (R 4.0.2)
#>  SingleCellExperiment   * 1.12.0   2020-10-27 [1] Bioconductor  
#>  spam                   * 2.6-0    2020-12-14 [1] CRAN (R 4.0.2)
#>  sparseMatrixStats        1.2.1    2021-02-02 [1] Bioconductor  
#>  stringi                  1.6.1    2021-05-10 [1] CRAN (R 4.0.5)
#>  stringr                * 1.4.0    2019-02-10 [1] CRAN (R 4.0.2)
#>  SummarizedExperiment   * 1.20.0   2020-10-27 [1] Bioconductor  
#>  tibble                 * 3.1.1    2021-04-18 [1] CRAN (R 4.0.2)
#>  tidyr                  * 1.1.3    2021-03-03 [1] CRAN (R 4.0.2)
#>  tidyselect               1.1.1    2021-04-30 [1] CRAN (R 4.0.2)
#>  tidyverse              * 1.3.1    2021-04-15 [1] CRAN (R 4.0.2)
#>  tinytex                  0.31     2021-03-30 [1] CRAN (R 4.0.2)
#>  utf8                     1.2.1    2021-03-12 [1] CRAN (R 4.0.2)
#>  vctrs                    0.3.8    2021-04-29 [1] CRAN (R 4.0.2)
#>  vipor                    0.4.5    2017-03-22 [1] CRAN (R 4.0.2)
#>  viridis                  0.6.1    2021-05-11 [1] CRAN (R 4.0.5)
#>  viridisLite              0.4.0    2021-04-13 [1] CRAN (R 4.0.2)
#>  withr                    2.4.2    2021-04-18 [1] CRAN (R 4.0.2)
#>  xfun                     0.22     2021-03-11 [1] CRAN (R 4.0.2)
#>  xml2                     1.3.2    2020-04-23 [1] CRAN (R 4.0.2)
#>  xtable                   1.8-4    2019-04-21 [1] CRAN (R 4.0.2)
#>  XVector                  0.30.0   2020-10-28 [1] Bioconductor  
#>  yaml                     2.2.1    2020-02-01 [1] CRAN (R 4.0.2)
#>  zlibbioc                 1.36.0   2020-10-28 [1] Bioconductor  
#> 
#> [1] /Users/runner/work/_temp/Library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
---
title: "Working with matrices in R"
author: "Adapted by Milan Malfait"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output:
    html_document:
      code_download: true
      theme: cosmo
      toc: true
      toc_float: true
      highlight: tango
      number_sections: false
---

```{r setup, include=FALSE, cache=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

***

## Creating and manipulating matrices

We will create a $4 \times 2$ matrix, starting from a vector of values.

```{r create_matrix}
## Define the matrix X, filling matrix "byrow": 4 x 2
X <- matrix(c(3, 6, 1, 2, 0.23, 0.46, -2, -4), nrow = 4, byrow = TRUE)
X

## Alternative way to define the matrix, `byrow = FALSE` is the default
X <- matrix(c(3, 1, 0.23, -2, 6, 2, 0.46, -4), ncol = 2, byrow = FALSE)
X

# Creating a second matrix: 2 x 4, using the square of `X`
Y <- matrix(X^2, nrow = 2, ncol = 4)
Y

## Check that we really have a matrix
class(X)
```
Generally you can check the arguments of a function, e.g. `matrix`, in R with `?matrix`. 

The `dim` function tells you the __dimensions__ of the matrix, `nrow` and `ncol` give you the number of rows and columns, respectively.

```{r}
dim(X)  # nr. of rows, nr. of columns
nrow(X)
ncol(X)

dim(Y)
```


### Subsetting

Note that R sees a matrix as `X[row, column]`, so you can subset `X` with

```{r subset_matrix}
X[1, ] # first row of X
X[, 2] # second column of X
X[1:2, ] # first 2 rows of X
```

Note that, by default, subsetting a single row or column from a matrix __returns a vector__, not a column- or row-matrix as might be expected.
We can see this from the fact that the subset does not have dimensions anymore (an attribute only defined for matrices in R):

```{r}
dim(X[, 1]) # NULL = non-existent
class(X[, 1]) # not matrix but numeric vector
```

This can lead to unexpected behavior when doing __matrix multiplication__ (see further below).
To avoid this, we can set `drop = FALSE` in the subsetting brackets.

```{r}
## This creates a "column-vector"
X[, 1, drop = FALSE]
dim(X[, 1, drop = FALSE])
```


### Row- and column-binding

You can also create matrices by 'binding' together vectors (of the same length) column-wise or row-wise with `cbind()` and `rbind()`, respectively.

```{r}
# Create vectors from the rows of X
(x1 <- X[1, ])
(x2 <- X[2, ])
(x3 <- X[3, ])
(x4 <- X[4, ])

# Bind them back together row-wise: re-creates X
rbind(x1, x2, x3, x4)

# Bind them back together column-wise: (essentially the transpose of X)
cbind(x1, x2, x3, x4)
```

Binding together 2 matrices also works (but be mindful of their dimensions!).

```{r, error=TRUE}
# Row-binding two matrices
# X and Y don't have the same dimensions (4x2 vs. 2x4),
# so binding them directly doesn't work
rbind(X, Y)

# For rbind we need the same number of columns
rbind(X, Y[, 1:2]) # works

# For cbind, same number of rows
cbind(X[1:2, ], Y)
```

Note that R automatically creates row names or column names from the variable names when using `rbind()` and `cbind()`.

To add row and columns names to an existing matrix, use `rownames()`, `colnames()` or the more general `dimnames()`.

```{r}
rownames(X) <- c("row_1", "row_2", "row_3", "row_4")
colnames(X) <- c("col_1", "col_2")
X

## Using `NULL` removes the dimnames again
dimnames(X) <- NULL
X
```


### Diagonal elements and diagonal matrices: `diag()`

```{r}
## Get diagonal elements from X, returns vector
diag(X)

## Create diagonal matrix, vector as input, returns matrix
diag(c(1, 2, 3 ,4))
```

### Identity matrices

The `diag` function can also be used to create identity matrices with specific dimensions

```{r}
diag(1, nrow = 4, ncol = 4)  # 4 x 4
diag(1, nrow = 2, ncol = 4)  # 2 x 4
diag(1, nrow = 4, ncol = 2)  # 4 x 2
```



## Math operations

### Element-wise operations

#### Scalars and vectors

Scalar operations occur element-wise for matrices.

```{r}
X # for reference
X + 1
X - 100
2 * X
X / 3
X^2
```

Note that `^2` also occurs element-wise, *not* using matrix multiplication (see further below)!

You can also do operations with vectors, though you should be careful with the dimensions.
`R` will add the vector to each __column__ of the matrix.
When using a vector with a different length than the number of rows (elements in a column), the vector elements will be *recycled*.
When the longer object is not a multiple of the shorter, the operation still occurs but a *warning* is produced (technically a `message`).

```{r}
X
v4 <- c(1, 2, 3, 4)  # vector of length 4
X + v4

v2 <- c(1, 2)  # vector of length 2, will be recycled
X + v2

v3 <- c(1, 2, 3) # vector or length 3, also recycled with warning
X + v3

v5 <- c(1, 2, 3, 4, 5) # vector or length 5, also recycled with warning
X + v5
```

#### Two matrices

Define a second matrix `Y` with same dimensions as `X`.

```{r}
# We will generate a matrix Y with 2s and 4s at random places (using `sample`)
# Setting a seed allows reproducibility
set.seed(1)
Y <- matrix(sample(c(2, 4), size = nrow(X) * ncol(X), replace = TRUE), 
            nrow = nrow(X), ncol = ncol(X))
Y
```

Using the standard math operators occurs element-wise.

```{r}
X + Y
X - Y
X * Y
X / Y
X ^ Y
```


### Matrix algebra

<https://www.statmethods.net/advstats/matrix.html>

#### Transpose: `t()`

$\mathbf{X}^T$

```{r}
t(X)
```

#### Matrix multiplication: `%*%`

Note that `X` and `Y` have the same dimensions, so multiplying them straight away doesn't work (incompatible dimensions)

```{r, error=TRUE}
## This produces an error because the dimensions are not compatible
X %*% Y
```

Instead, we can transpose `X` and compute $\mathbf{X}^T \cdot \mathbf{Y}$:

```{r}
t(X) %*% Y

## `crossprod()` is a shortcut for this operation, and is slightly faster
crossprod(X, Y)

## `tcrossprod()` calculates X.Y^T
X %*% t(Y)
tcrossprod(X, Y)
```

The `crossprod()` and `tcrossprod()` functions also work with a single matrix as input, in which case they calculate $\mathbf{X}^T\mathbf{X}$ and $\mathbf{X}\mathbf{X}^T$, respectively:

```{r}
## X^TX
crossprod(X)
## XX^T
tcrossprod(X)
```

As already mentioned before, subsetting a matrix with `[` returns a vector by default.
Doing matrix multiplication with vectors in R can be ambiguous as R will "guess" itself whether the vector should be treated as a column- or row-vector.
See `` ?`%*%` `` for details.

The example below illustrates some potentially unexpected results

```{r, error=TRUE}
## Selecting the first column of X, we would expect this to be treated as a column
## vector with dimensions 4 x 1...
X[, 1]
dim(X[, 1])
## ...but in R vectors don't have dimensions

## Mathematically, multiplicating the first column of X with X is not possible
## (multiplying a 4x1 with a 4x2 matrix doesn't work)
## So we would expect the following code to throw an error...
X[, 1] %*% X
## ...but it doesn't and instead returns the result as if X[, 1] is a row-vector
```

To avoid confusion and when you intend to use single rows or vectors from a matrix as they were single-row or single-column matrices, it's better to use `drop = FALSE` when subsetting.

```{r, error=TRUE}
X1 <- X[, 1, drop = FALSE]
## Now X1 does have dimensions
dim(X1)

## And the following code throws an error
X1 %*% X
```

You can also do matrix multiplication with 2 vectors, in which case again R will guess if they should be treated as column- or row-vectors.

```{r}
(a <- seq_len(3))
(b <- a + 1)
a %*% b
```


#### Inverse: `solve()`

__Inverse__: $\mathbf{X}^{-1}$ (only for square matrices)

```{r}
## Create new square matrix Z
(Z <- matrix(1:4, nrow = 2, ncol = 2))

## Compute inverse
(Z_inv <- solve(Z))

## Check that it's indeed the inverse
Z %*% Z_inv  # the 2x2 identity matrix
```

__Warning:__ DON'T USE `X^-1` to compute a matrix inverse!

The notation might be confusing because of the way we mathematically describe the inverse of a matrix ($\mathbf{X}^{-1}$).
In R, however, `X^-1` computes __element-wise__ inverses of each value.

```{r}
Z^-1
```

If you're wondering why the name "`solve`" is used for the function to compute a matrix inverse, it's because `solve` can do more than just that.
See `?solve` for details.


## Rank of a matrix

Calculating the rank of the matrix $\mathbf X$.
There are several ways to do this in R.
We'll use the `qr()` function (see `?qr`).
An alternative would be to use `rankMatrix` from the [*Matrix*](https://cran.r-project.org/package=Matrix) package: `Matrix::rankMatrix()`.

```{r rank}
## Run `qr` and access the `rank` from the results with `$`
rank <- qr(X)$rank
rank
```

We see that the rank of `X` is lower than its number of columns and rows, meaning the matrix is [*not* of full rank](https://en.wikipedia.org/wiki/Rank_(linear_algebra)#Main_definitions).

Considering the contents of `X`, can you see why this is?

__Hint__: 

```{r}
all(X[, 2] == 2 * X[, 1])
```



## Session info {-}

<details><summary>Session info</summary>

```{r session_info, echo=FALSE, cache=FALSE}
Sys.time()
sessioninfo::session_info()
```

</details>

## [Home](https://statomics.github.io/HDA2020/) {-}
