Ahmad Badary

Diagonal Matrices

Symmetric Matrices

Skew-Symmetric Matrices

Covariance Matrices

Positive Semi-Definite Matrices

Positive Definite Matrices

Orthogonal Matrices

Dyads

Correlation Matrices

Special Matrices in ML
Matrix Cookbook

Diagonal Matrices

Definition.
Diagonal matrices are square matrices $A$ with $A_{ij} = 0 \text{, when } i \ne j.$
Properties:
1. Diagonal matrices correspond to quadratic functions that are simple sums of squares, of the form:
  $$q(x) = \sum_{i=1}^n \lambda_i x_i^2 = x^T \mathbf{diag}(\lambda) x.$$
2. They are easy to invert $A^{-1}_ {ii} = 1/A_ {ii} \: \forall i \in [1, n]$
3. The pseudo-inverse is easy to compute: keep zero diagonal elements as zero
4. Allow easier matrix multiplication and powers

Symmetric Matrices

Definition:
Symmetric matrices are square matrices that satisfy $A_{ij} = A_{ji}$ for every pair $(i,j).$

The set of symmetric $(n \times n)$ matrices is denoted $\mathbf{S}^n$. This set is a subspace of $\mathbf{R}^{n \times n}$.
Properties:
- Has orthonormal eigenvectors (even with repeated eigenvalues)
- Its inverse is also symmetric
- All eigenvalues of a symmetric matrix are real
- $A^TA$ is invertible iff columns of $A$ are linearly independent
- Every symmetric matrix $S$ can be diagonalized (factorized) with $Q$ formed by the orthonormal eigenvectors $v_i$ of $S$ and $\Lambda$ is a diagonal matrix holding all the eigenvalues:
  $$\begin{aligned} S&=Q \Lambda Q^{T} \\ &= \lambda_{1} v_{1} v_{1}^{T}+\ldots+\lambda_{n} v_{n} v_{h}^{T} = \sum_{i=1}^n \lambda_i v_i v_i^T \end{aligned}$$
- One can create a symmetric matrix from any matrix by:
  $$M = {\displaystyle {\tfrac {1}{2}}\left(A+A^{\textsf {T}}\right)}$$
Examples:

Skew-Symmetric Matrices

Definition.
Properties:
Asynchronous:

Covariance Matrices

Definition.
Standard Form:
$$\Sigma :=\mathrm {E} \left[\left(\mathbf {X} -\mathrm {E} [\mathbf {X} ]\right)\left(\mathbf {X} -\mathrm {E} [\mathbf {X} ]\right)^{\rm {T}}\right]$$

$$ \Sigma := \dfrac{1}{m} \sum_{k=1}^m (x_k - \hat{x})(x_k - \hat{x})^T. $$

Matrix Form:

$$ \Sigma := \dfrac{X^TX}{n} $$

valid only for (1) $X$ w/ samples in rows and variables in columns (2) $X$ is centered (mean=0)
Properties:
1. The sample covariance matrix allows to find the variance along any direction in data space.
2. The diagonal elements of $\Sigma$ give the variances of each vector in the data.
3. The trace of $\Sigma$ gives the sum of all the variances.
4. The matrix $\Sigma$ is positive semi-definite, since the associated quadratic form $u \rightarrow u^T \Sigma u$ is non-negative everywhere.
5. It is Symmetric.
6. Every symmetric positive semi-definite matrix is a covariance matrix.
  Proof.
  OR, Visit the website
7. The sample variance along direction $u$ can be expressed as a quadratic form in $u$:
  $\sigma^2(u) = \dfrac{1}{n} \sum_{k=1}^n [u^T(x_k-\hat{x})]^2 = u^T \Sigma u,$
8. The diminsion of the matrix is $(n \times n)$, where $n$ is the number of variables/features/columns.
9. The inverse of this matrix, ${\displaystyle \Sigma ^{-1},}$ if it exists, is the inverse covariance matrix, also known as the concentration matrix or precision matrix.
10. If a vector of $n$ possibly correlated random variables is jointly normally distributed, or more generally elliptically distributed, then its probability density function can be expressed in terms of the covariance matrix.
11. $\Sigma =\mathrm {E} (\mathbf {XX^{\rm {T}}} )-{\boldsymbol {\mu }}{\boldsymbol {\mu }}^{\rm {T}}$.
12. ${\displaystyle \operatorname {var} (\mathbf {AX} +\mathbf {a} )=\mathbf {A} \,\operatorname {var} (\mathbf {X} )\,\mathbf {A^{\rm {T}}} }$.
13. $\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )^{\rm {T}}$.
14. $\operatorname {cov} (\mathbf {X}_ {1}+\mathbf {X}_ {2},\mathbf {Y} )=\operatorname {cov} (\mathbf {X}_ {1},\mathbf {Y} )+\operatorname {cov} (\mathbf {X}_ {2},\mathbf {Y} )$.
15. If $(p = q)$, then $\operatorname {var} (\mathbf {X} +\mathbf {Y} )=\operatorname {var} (\mathbf {X} )+\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )+\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )+\operatorname {var} (\mathbf {Y} )$.
16. $\operatorname {cov} (\mathbf {AX} +\mathbf {a} ,\mathbf {B} ^{\rm {T}}\mathbf {Y} +\mathbf {b} )=\mathbf {A} \,\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )\,\mathbf {B}$.
17. If ${\displaystyle \mathbf {X} }$ and ${\displaystyle \mathbf {Y} }$ are independent (or somewhat less restrictedly, if every random variable in ${\displaystyle \mathbf {X} }$ is uncorrelated with every random variable in ${\displaystyle \mathbf {Y} }$), then ${\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\mathbf {0} }$.
18. $\operatorname {var} (\mathbf {b} ^{\rm {T}}\mathbf {X} )= \mathbf {b} ^{\rm {T}}\operatorname {cov} (\mathbf {X} )\mathbf {b} = \operatorname {cov} (\mathbf{b}^T\mathbf {X}, \mathbf{b}^T\mathbf {X} ) \geq 0,\,$.
  
  This quantity is NON-Negative because it’s variance.
  Maybe $= \mathbf {b} ^{\rm {T}}\operatorname {var} (\mathbf {X} )\mathbf {b}$??
19. An identity covariance matrix, $\Sigma = I$ has variance $= 1$ for all variables.
20. A covariance matrix of the form, $\Sigma=\sigma^2I$ has variance $= \sigma^2$ for all variables.
21. A diagonal covariance matrix has variance $\sigma_i^2$ for the $i-th$ variable.
22. When the mean $\hat{x}$ is not known the denominator of the “SAMPLE COVARIANCE MATRIX” should be $(n-1)$ and not $n$.
where, ${\displaystyle \mathbf {X} ,\mathbf {X} _{1}},$ and ${\displaystyle \mathbf {X} _{2}}$ are random $(p\times 1)$ vectors, ${\displaystyle \mathbf {Y} }$ is a random $(q\times 1)$ vector, ${\displaystyle \mathbf {a} }$ is a $(q\times 1)$ vector, ${\displaystyle \mathbf {b} }$ is a $(p\times 1)$ vector, and ${\displaystyle \mathbf {A} }$ and ${\displaystyle \mathbf {B} }$ are $(q\times p)$ matrices of constants.
$\Sigma$ as a Linear Operator:
- Applied to one vector, the covariance matrix maps a linear combination, $c$, of the random variables, $X$, onto a vector of covariances with those variables:
\[{\displaystyle \mathbf {c} ^{\rm {T}}\Sigma =\operatorname {cov} (\mathbf {c} ^{\rm {T}}\mathbf {X} ,\mathbf {X} )}\]
- Treated as a bilinear form, it yields the covariance between the two linear combinations:
\[{\displaystyle \mathbf {d} ^{\rm {T}}\Sigma \mathbf {c} =\operatorname {cov} (\mathbf {d} ^{\rm {T}}\mathbf {X} ,\mathbf {c} ^{\rm {T}}\mathbf {X} )}\]
- The variance of a linear combination is then (its covariance with itself)
\[{\displaystyle \mathbf {c} ^{\rm {T}}\Sigma \mathbf {c} }\]
- The (pseudo-)inverse covariance matrix provides an inner product, ${\displaystyle \langle c-\mu \|\Sigma ^{+}\| c-\mu \rangle }$ which induces the Mahalanobis distance, a measure of the “unlikelihood” of $c$.
Applications [Examples]:
1. The Whitening Transformation: allows one to completely decorrelate the data, Equivalently,
  allows one to find an optimal basis for representing the data in a compact way.
2. Rayleigh Quotient:
3. Principle Component Analysis [PCA]
4. The Karhunen-Loève transform (KL-transform)
5. Mutual fund separation theorem
6. Capital asset pricing model
7. Portfolio Theory: The matrix of covariances among various assets’ returns is used to determine, under certain assumptions, the relative amounts of different assets that investors should (in a normative analysis) or are predicted to (in a positive analysis) choose to hold in a context of diversification.

Positive Semi-Definite Matrices

Definition:
A symmetric ${\displaystyle n\times n}$ real matrix ${\displaystyle M}$ is said to be positive semi-definite if the scalar ${\displaystyle z^{\textsf {T}}Mz}$ is non-negative for every non-zero column vector ${\displaystyle z}$ of $n$ real numbers.

Mathematically:

$$M \text { positive semi-definite } \Longleftrightarrow \quad x^{\top} M x \geq 0 \text { for all } x \in \mathbb{R}^{n}$$
Properties:
- $AA^T$ and $A^TA$ are PSD
- $M$ is positive definite if All pivots > 0
- $M$ is positive definite if and only if all of its eigenvalues are non-negative.
- Covariance Matrices $\Sigma$ are PSD

Positive Definite Matrices

Definition:
A symmetric ${\displaystyle n\times n}$ real matrix ${\displaystyle M}$ is said to be positive definite if the scalar ${\displaystyle z^{\textsf {T}}Mz}$ is strictly positive for every non-zero column vector ${\displaystyle z}$ of $n$ real numbers.

Mathematically:

$$M \text { positive definite } \Longleftrightarrow x^{\top} M x>0 \text { for all } x \in \mathbb{R}^{n} \backslash \mathbf{0}$$
Properties:
- $M$ is positive definite if and only if all of its eigenvalues are positive
- The matrix ${\displaystyle M}$ is positive definite if and only if the bilinear form ${\displaystyle \langle z,w\rangle =z^{\textsf {T}}Mw}$ is positive definite
- A symmetric matrix ${\displaystyle M}$ is positive definite if and only if its quadratic form is a strictly convex function
- Every positive definite matrix is invertible and its inverse is also positive definite.
- $M$ is positive definite if All pivots > 0
- Covariance Matrices $\Sigma$ are positive definite unless one variable is an exact linear function of the others. Conversely, every positive semi-definite matrix is the covariance matrix of some multivariate distribution
- Any quadratic function from ${\displaystyle \mathbb {R} ^{n}}$ to ${\displaystyle \mathbb {R} }$ can be written as ${\displaystyle x^{\textsf {T}}Mx+x^{\textsf {T}}b+c}$ where ${\displaystyle M}$ is a symmetric ${\displaystyle n\times n}$ matrix, $b$ is a real $n$-vector, and $c$ a real constant. This quadratic function is strictly convex, and hence has a unique finite global minimum, if and only if ${\displaystyle M}$ is positive definite.

Orthogonal Matrices

Definition.
Orthogonal (or, unitary (complex)) matrices are square matrices, such that the columns form an orthonormal basis.
Properties:
1. If $U = [u_1, \ldots, u_n]$ is an orthogonal matrix, then
  $$u_i^Tu_j = \left\{ \begin{array}{ll} 1 & \mbox{if } i=j, \\ 0 & \mbox{otherwise.} \end{array} \right. $$
2. $U^TU = UU^T = I_n$.
3. Easy inverse:
  $$U^{-1} = U^T$$
4. Geometrically, orthogonal matrices correspond to rotations (around a point) or reflections (around a line passing through the origin).
  
  i.e. they preserve length and angles!
  Proof. Part 5 and 6.
5. For all vectors $\vec{x}$,
  $\|Ux\|_2^2 = (Ux)^T(Ux) = x^TU^TUx = x^Tx = \|x\|_2^2 .$
  
  Known as, the rotational invariance of the Euclidean norm.
  Thus, If we multiply x with an orthogonal matrix, the errors present in x will not be magnified. This behavior is very desirable for maintaining numerical stability.
6. If $x, y$ are two vectors with unit norm, then the angle $\theta$ between them satisfies $\cos \theta = x^Ty$
  while the angle $\theta'$ between the rotated vectors $x' = Ux, y' = Uy$ satisfies $\cos \theta' = (x')^Ty'$.
  Since, $(Ux)^T(Uy) = x^T U^TU y = x^Ty,$ we obtain that the angles are the same.
Examples:
- Permutation Matrices Visit the Book

Dyads

Definition.
A matrix $A \in \mathbf{R}^{m \times n}$ is a dyad if it is of the form $A = uv^T$ for some vectors $u \in \mathbf{R}^m, v \in \mathbf{R}^n$.

The dyad acts on an input vector $x \in \mathbf{R}^n$ as follows:

$$ Ax = (uv^T) x = (v^Tx) u.$$
Properties:
1. The output always points in the same direction $u$ in output space ($\mathbf{R}^m$), no matter what the input $x$ is.
2. The output is always a simple scaled version of $u$.
3. The amount of scaling depends on the vector $v$, via the linear function $x \rightarrow v^Tx$.
Examples:
- Single factor model of financial price data Visit the Book
Normalized dyads:

We can always normalize the dyad, by assuming that both u,v are of unit (Euclidean) norm, and using a factor to capture their scale.

That is, any dyad can be written in normalized form:

\[A = uv^T = (\|u\|_2 \cdot |v|_2 ) \cdot (\dfrac{u}{\|u\|_2}) ( \dfrac{v}{\|v\|_2}) ^T = \sigma \tilde{u}\tilde{v}^T,\]

where $\sigma > 0$, and $\|\tilde{u}\|_2 = \|\tilde{v}\|_2 = 1.$
Symmetric dyads:

Another important class of symmetric matrices is that of the form $uu^T$, where $u \in \mathbf{R}^n$. The matrix has elements $u_iu_j$, and is symmetric.

If $\|u\|_2 = 1$, then the dyad is said to be normalized.

\[uu^T = \left(\begin{array}{ccc} u_1^2 & u_1u_2 & u_1u_3 \\ u_1u_2 & u_2^2 & u_2u_3 \\ u_1u_3 & u_2u_3 & u_3^2 \end{array} \right)\]
- Properties:
  1. Symmetric dyads corresponds to quadratic functions that are simply squared linear forms:
    $q(x) = (u^Tx)^2$
  2. When the vector $u$ is normalized (unit), then:
    $\mathbf{Tr}(uu^T) = \|u\|_2^2 = 1^2 = 1$
    
    This follows from the fact that the diagonal entries of a symmetric dyad are just $u_i^2, \forall i \in [1, n]$

Correlation matrix

Definition.

\[{\text{corr}}(\mathbf {X} )=\left({\text{diag}}(\Sigma )\right)^{-{\frac {1}{2}}}\,\Sigma \,\left({\text{diag}}(\Sigma )\right)^{-{\frac {1}{2}}}\]
Properties:
1. It is the matrix of “Pearson product-moment correlation coefficients” between each of the random variables in the random vector ${\displaystyle \mathbf {X} }$.
2. The correlation matrix can be seen as the covariance matrix of the standardized random variables ${\displaystyle X_{i}/\sigma (X_{i})}$ for ${\displaystyle i=1,\dots ,n}$.
3. Each element on the principal diagonal of a correlation matrix is the correlation of a random variable with itself, which always equals 1.
4. Each off-diagonal element is between 1 and –1 inclusive.

Classes of Matrices

Table of Contents

Diagonal Matrices

Symmetric Matrices

Skew-Symmetric Matrices

Covariance Matrices

Positive Semi-Definite Matrices

Positive Definite Matrices

Orthogonal Matrices

Dyads

Correlation matrix