Table of Contents
Special Matrices in ML
Matrix Cookbook
Diagonal Matrices
-
Definition.
Diagonal matrices are square matrices \(A\) with \(A_{ij} = 0 \text{, when } i \ne j.\)
-
Properties:
- Diagonal matrices correspond to quadratic functions that are simple sums of squares, of the form:
$$q(x) = \sum_{i=1}^n \lambda_i x_i^2 = x^T \mathbf{diag}(\lambda) x.$$
- They are easy to invert \(A^{-1}_ {ii} = 1/A_ {ii} \: \forall i \in [1, n]\)
- The pseudo-inverse is easy to compute: keep zero diagonal elements as zero
- Allow easier matrix multiplication and powers
- Diagonal matrices correspond to quadratic functions that are simple sums of squares, of the form:
Symmetric Matrices
-
Definition:
Symmetric matrices are square matrices that satisfy \(A_{ij} = A_{ji}\) for every pair \((i,j).\)The set of symmetric \((n \times n)\) matrices is denoted \(\mathbf{S}^n\). This set is a subspace of \(\mathbf{R}^{n \times n}\).
- Properties:
- Has orthonormal eigenvectors (even with repeated eigenvalues)
- Its inverse is also symmetric
- All eigenvalues of a symmetric matrix are real
- \(A^TA\) is invertible iff columns of \(A\) are linearly independent
- Every symmetric matrix \(S\) can be diagonalized (factorized) with \(Q\) formed by the orthonormal eigenvectors \(v_i\) of \(S\) and \(\Lambda\) is a diagonal matrix holding all the eigenvalues:
$$\begin{aligned} S&=Q \Lambda Q^{T} \\ &= \lambda_{1} v_{1} v_{1}^{T}+\ldots+\lambda_{n} v_{n} v_{h}^{T} = \sum_{i=1}^n \lambda_i v_i v_i^T \end{aligned}$$
- One can create a symmetric matrix from any matrix by:
$$M = {\displaystyle {\tfrac {1}{2}}\left(A+A^{\textsf {T}}\right)}$$
- Examples:
Skew-Symmetric Matrices
-
Definition.
-
Properties:
-
Asynchronous:
Covariance Matrices
- Definition.
Standard Form:$$\Sigma :=\mathrm {E} \left[\left(\mathbf {X} -\mathrm {E} [\mathbf {X} ]\right)\left(\mathbf {X} -\mathrm {E} [\mathbf {X} ]\right)^{\rm {T}}\right]$$
$$ \Sigma := \dfrac{1}{m} \sum_{k=1}^m (x_k - \hat{x})(x_k - \hat{x})^T. $$
Matrix Form:
$$ \Sigma := \dfrac{X^TX}{n} $$
valid only for (1) \(X\) w/ samples in rows and variables in columns (2) \(X\) is centered (mean=0)
- Properties:
-
The sample covariance matrix allows to find the variance along any direction in data space.
-
The diagonal elements of \(\Sigma\) give the variances of each vector in the data.
-
The trace of \(\Sigma\) gives the sum of all the variances.
-
The matrix \(\Sigma\) is positive semi-definite, since the associated quadratic form \(u \rightarrow u^T \Sigma u\) is non-negative everywhere.
-
It is Symmetric.
- Every symmetric positive semi-definite matrix is a covariance matrix.
Proof.
OR, Visit the website
- The sample variance along direction \(u\) can be expressed as a quadratic form in \(u\):
\(\sigma^2(u) = \dfrac{1}{n} \sum_{k=1}^n [u^T(x_k-\hat{x})]^2 = u^T \Sigma u,\) -
The diminsion of the matrix is \((n \times n)\), where \(n\) is the number of variables/features/columns.
-
The inverse of this matrix, \({\displaystyle \Sigma ^{-1},}\) if it exists, is the inverse covariance matrix, also known as the concentration matrix or precision matrix.
-
If a vector of \(n\) possibly correlated random variables is jointly normally distributed, or more generally elliptically distributed, then its probability density function can be expressed in terms of the covariance matrix.
-
\(\Sigma =\mathrm {E} (\mathbf {XX^{\rm {T}}} )-{\boldsymbol {\mu }}{\boldsymbol {\mu }}^{\rm {T}}\).
-
\({\displaystyle \operatorname {var} (\mathbf {AX} +\mathbf {a} )=\mathbf {A} \,\operatorname {var} (\mathbf {X} )\,\mathbf {A^{\rm {T}}} }\).
-
\(\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )^{\rm {T}}\).
-
\(\operatorname {cov} (\mathbf {X}_ {1}+\mathbf {X}_ {2},\mathbf {Y} )=\operatorname {cov} (\mathbf {X}_ {1},\mathbf {Y} )+\operatorname {cov} (\mathbf {X}_ {2},\mathbf {Y} )\).
-
If \((p = q)\), then \(\operatorname {var} (\mathbf {X} +\mathbf {Y} )=\operatorname {var} (\mathbf {X} )+\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )+\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )+\operatorname {var} (\mathbf {Y} )\).
-
\(\operatorname {cov} (\mathbf {AX} +\mathbf {a} ,\mathbf {B} ^{\rm {T}}\mathbf {Y} +\mathbf {b} )=\mathbf {A} \,\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )\,\mathbf {B}\).
-
If \({\displaystyle \mathbf {X} }\) and \({\displaystyle \mathbf {Y} }\) are independent (or somewhat less restrictedly, if every random variable in \({\displaystyle \mathbf {X} }\) is uncorrelated with every random variable in \({\displaystyle \mathbf {Y} }\)), then \({\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\mathbf {0} }\).
- \(\operatorname {var} (\mathbf {b} ^{\rm {T}}\mathbf {X} )= \mathbf {b} ^{\rm {T}}\operatorname {cov} (\mathbf {X} )\mathbf {b} = \operatorname {cov} (\mathbf{b}^T\mathbf {X}, \mathbf{b}^T\mathbf {X} ) \geq 0,\,\).
This quantity is NON-Negative because it’s variance.
Maybe \(= \mathbf {b} ^{\rm {T}}\operatorname {var} (\mathbf {X} )\mathbf {b}\)?? -
An identity covariance matrix, \(\Sigma = I\) has variance \(= 1\) for all variables.
-
A covariance matrix of the form, \(\Sigma=\sigma^2I\) has variance \(= \sigma^2\) for all variables.
-
A diagonal covariance matrix has variance \(\sigma_i^2\) for the \(i-th\) variable.
- When the mean \(\hat{x}\) is not known the denominator of the “SAMPLE COVARIANCE MATRIX” should be \((n-1)\) and not \(n\).
where, \({\displaystyle \mathbf {X} ,\mathbf {X} _{1}},\) and \({\displaystyle \mathbf {X} _{2}}\) are random \((p\times 1)\) vectors, \({\displaystyle \mathbf {Y} }\) is a random \((q\times 1)\) vector, \({\displaystyle \mathbf {a} }\) is a \((q\times 1)\) vector, \({\displaystyle \mathbf {b} }\) is a \((p\times 1)\) vector, and \({\displaystyle \mathbf {A} }\) and \({\displaystyle \mathbf {B} }\) are \((q\times p)\) matrices of constants.
-
- \(\Sigma\) as a Linear Operator:
- Applied to one vector, the covariance matrix maps a linear combination, \(c\), of the random variables, \(X\), onto a vector of covariances with those variables:
- Treated as a bilinear form, it yields the covariance between the two linear combinations:
- The variance of a linear combination is then (its covariance with itself)
- The (pseudo-)inverse covariance matrix provides an inner product, \({\displaystyle \langle c-\mu \|\Sigma ^{+}\| c-\mu \rangle }\) which induces the Mahalanobis distance, a measure of the “unlikelihood” of \(c\).
- Applications [Examples]:
-
The Whitening Transformation: allows one to completely decorrelate the data, Equivalently,
allows one to find an optimal basis for representing the data in a compact way. -
Portfolio Theory: The matrix of covariances among various assets’ returns is used to determine, under certain assumptions, the relative amounts of different assets that investors should (in a normative analysis) or are predicted to (in a positive analysis) choose to hold in a context of diversification.
-
Positive Semi-Definite Matrices
-
Definition:
A symmetric \({\displaystyle n\times n}\) real matrix \({\displaystyle M}\) is said to be positive semi-definite if the scalar \({\displaystyle z^{\textsf {T}}Mz}\) is non-negative for every non-zero column vector \({\displaystyle z}\) of \(n\) real numbers.Mathematically:
$$M \text { positive semi-definite } \Longleftrightarrow \quad x^{\top} M x \geq 0 \text { for all } x \in \mathbb{R}^{n}$$
-
Properties:
- \(AA^T\) and \(A^TA\) are PSD
- \(M\) is positive definite if All pivots > 0
- \(M\) is positive definite if and only if all of its eigenvalues are non-negative.
- Covariance Matrices \(\Sigma\) are PSD
Positive Definite Matrices
-
Definition:
A symmetric \({\displaystyle n\times n}\) real matrix \({\displaystyle M}\) is said to be positive definite if the scalar \({\displaystyle z^{\textsf {T}}Mz}\) is strictly positive for every non-zero column vector \({\displaystyle z}\) of \(n\) real numbers.Mathematically:
$$M \text { positive definite } \Longleftrightarrow x^{\top} M x>0 \text { for all } x \in \mathbb{R}^{n} \backslash \mathbf{0}$$
-
Properties:
- \(M\) is positive definite if and only if all of its eigenvalues are positive
- The matrix \({\displaystyle M}\) is positive definite if and only if the bilinear form \({\displaystyle \langle z,w\rangle =z^{\textsf {T}}Mw}\) is positive definite
- A symmetric matrix \({\displaystyle M}\) is positive definite if and only if its quadratic form is a strictly convex function
- Every positive definite matrix is invertible and its inverse is also positive definite.
- \(M\) is positive definite if All pivots > 0
- Covariance Matrices \(\Sigma\) are positive definite unless one variable is an exact linear function of the others. Conversely, every positive semi-definite matrix is the covariance matrix of some multivariate distribution
- Any quadratic function from \({\displaystyle \mathbb {R} ^{n}}\) to \({\displaystyle \mathbb {R} }\) can be written as \({\displaystyle x^{\textsf {T}}Mx+x^{\textsf {T}}b+c}\) where \({\displaystyle M}\) is a symmetric \({\displaystyle n\times n}\) matrix, \(b\) is a real \(n\)-vector, and \(c\) a real constant. This quadratic function is strictly convex, and hence has a unique finite global minimum, if and only if \({\displaystyle M}\) is positive definite.
Orthogonal Matrices
-
Definition.
Orthogonal (or, unitary (complex)) matrices are square matrices, such that the columns form an orthonormal basis.
- Properties:
- If \(U = [u_1, \ldots, u_n]\) is an orthogonal matrix, then
$$u_i^Tu_j = \left\{ \begin{array}{ll} 1 & \mbox{if } i=j, \\ 0 & \mbox{otherwise.} \end{array} \right. $$
-
\(U^TU = UU^T = I_n\).
- Easy inverse:
$$U^{-1} = U^T$$
- Geometrically, orthogonal matrices correspond to rotations (around a point) or reflections (around a line passing through the origin).
i.e. they preserve length and angles!
Proof. Part 5 and 6. - For all vectors \(\vec{x}\),
\(\|Ux\|_2^2 = (Ux)^T(Ux) = x^TU^TUx = x^Tx = \|x\|_2^2 .\)Known as, the rotational invariance of the Euclidean norm.
Thus, If we multiply x with an orthogonal matrix, the errors present in x will not be magnified. This behavior is very desirable for maintaining numerical stability. - If \(x, y\) are two vectors with unit norm, then the angle \(\theta\) between them satisfies \(\cos \theta = x^Ty\)
while the angle \(\theta'\) between the rotated vectors \(x' = Ux, y' = Uy\) satisfies \(\cos \theta' = (x')^Ty'\).
Since, \((Ux)^T(Uy) = x^T U^TU y = x^Ty,\) we obtain that the angles are the same.
- If \(U = [u_1, \ldots, u_n]\) is an orthogonal matrix, then
- Examples:
Dyads
-
Definition.
A matrix \(A \in \mathbf{R}^{m \times n}\) is a dyad if it is of the form \(A = uv^T\) for some vectors \(u \in \mathbf{R}^m, v \in \mathbf{R}^n\).The dyad acts on an input vector \(x \in \mathbf{R}^n\) as follows:
$$ Ax = (uv^T) x = (v^Tx) u.$$
- Properties:
-
The output always points in the same direction \(u\) in output space (\(\mathbf{R}^m\)), no matter what the input \(x\) is.
-
The output is always a simple scaled version of \(u\).
-
The amount of scaling depends on the vector \(v\), via the linear function \(x \rightarrow v^Tx\).
-
- Examples:
-
- Normalized dyads:
- We can always normalize the dyad, by assuming that both u,v are of unit (Euclidean) norm, and using a factor to capture their scale.
- That is, any dyad can be written in normalized form:
- \[A = uv^T = (\|u\|_2 \cdot |v|_2 ) \cdot (\dfrac{u}{\|u\|_2}) ( \dfrac{v}{\|v\|_2}) ^T = \sigma \tilde{u}\tilde{v}^T,\]
- where \(\sigma > 0\), and \(\|\tilde{u}\|_2 = \|\tilde{v}\|_2 = 1.\)
-
- Symmetric dyads:
- Another important class of symmetric matrices is that of the form \(uu^T\), where \(u \in \mathbf{R}^n\).
The matrix has elements \(u_iu_j\), and is symmetric.
If \(\|u\|_2 = 1\), then the dyad is said to be normalized.
- \[uu^T = \left(\begin{array}{ccc} u_1^2 & u_1u_2 & u_1u_3 \\ u_1u_2 & u_2^2 & u_2u_3 \\ u_1u_3 & u_2u_3 & u_3^2 \end{array} \right)\]
-
- Properties:
- Symmetric dyads corresponds to quadratic functions that are simply squared linear forms:
\(q(x) = (u^Tx)^2\) - When the vector \(u\) is normalized (unit), then:
\(\mathbf{Tr}(uu^T) = \|u\|_2^2 = 1^2 = 1\)This follows from the fact that the diagonal entries of a symmetric dyad are just \(u_i^2, \forall i \in [1, n]\)
- Symmetric dyads corresponds to quadratic functions that are simply squared linear forms:
- Properties:
Correlation matrix
-
- Definition.
- \[{\text{corr}}(\mathbf {X} )=\left({\text{diag}}(\Sigma )\right)^{-{\frac {1}{2}}}\,\Sigma \,\left({\text{diag}}(\Sigma )\right)^{-{\frac {1}{2}}}\]
- Properties:
-
It is the matrix of “Pearson product-moment correlation coefficients” between each of the random variables in the random vector \({\displaystyle \mathbf {X} }\).
-
The correlation matrix can be seen as the covariance matrix of the standardized random variables \({\displaystyle X_{i}/\sigma (X_{i})}\) for \({\displaystyle i=1,\dots ,n}\).
-
Each element on the principal diagonal of a correlation matrix is the correlation of a random variable with itself, which always equals 1.
-
Each off-diagonal element is between 1 and –1 inclusive.
-