Table of Contents
- Linear Functions and Transformations, and Maps
- Linear Functions
- Affine Functions
- Equivalent Definitions of Linear Functions [Theorem]
- Vector Form (and the scalar product)
- Gradient of a Linear Function
- Gradient of an Affine Function
- Interpreting a and b
- First-order approximation of non-linear functions
- First-order Expansion of a function [Theorem]
Linear Functions and Transformations, and Maps
-
- Linear Functions:
- Linear functions are functions which preserve scaling and addition of the input argument.
-
Formally,
- A function f:Rn→R is linear if and only if f preserves scaling and addition of its arguments:
-
- for every x∈Rn, and α∈R, f(αx)=αf(x); and
-
- for every x1,x2∈Rn,f(x1+x2)=f(x1)+f(x2).
-
- Affine Functions:
- Affine functions are linear functions plus constant functions.
- Formally,
- A function f is affine if and only if the function ˜f:Rn→R with values ˜f(x)=f(x)−f(0) is linear. ♢
-
Equivalently,
- A map f:Rn→Rm is affine if and only if the map g:Rn→Rm with values g(x)=f(x)−f(0) is linear.
-
- Quadratic Function
- A function q:Rn→R is said to be a quadratic function if it can be expressed as
- q(x)=n∑i=1n∑j=1Aijxixj+2n∑i=1bixi+c,
- for numbers Aij,bi, and c,i,j∈1,…,n.
A quadratic function is thus an affine combination of the xi’s and all the “cross-products” xixj.
-
We observe that the coefficient of xixj is (Aij+Aji).
-
The function is said to be a quadratic form if there are no linear or constant terms in it: bi=0,c=0.
-
The Hessian of a quadratic function is always constant.
-
- Equivalent Definitions of Linear Functions [Theorem]:
- A map f:Rn→Rm is linear if and only if either one of the following conditions hold:
-
- f preserves scaling and addition of its arguments:
- for every x∈Rn, and α∈R,f(αx)=αf(x); and
- for every x1,x2∈Rn,f(x1+x2)=f(x1)+f(x2).
- f preserves scaling and addition of its arguments:
-
- f vanishes at the origin:
- f(0)=0, and
- It transforms any line segment ∈Rn into another segment ∈Rm:
∀x,y∈Rn,∀λ∈[0,1] : f(λx+(1−λ)y)=λf(x)+(1−λ)f(y).
- f is differentiable, vanishes at the origin, and the matrix of its derivatives is constant.
- There exist A∈Rm×n such that, ∀x∈Rn : f(x)=Ax.
- f vanishes at the origin:
- Vector Form (and the scalar product):
Theorem: Representation of affine function via the scalar product.
)Afunction\(f:Rn→R is affine if and only if it can be expressed via a scalar product:
)\( )\(f(x)=aTx+b ,
)forsomeuniquepair\((a,b), with a∈Rn and b∈R, given by ai=f(ei)−f(0), with ei )the\(i−th unit vector ∈Rn,i=1,…,n, and b=f(0).
The function is linear ⟺b=0.
The theorem shows that a vector can be seen as a (linear) function from the “input” space Rn to the “output” space R.
Both points of view (matrices as simple collections of numbers, or as linear functions) are useful.
-
Gradient of a Linear Function:
-
- Gradient of an Affine Function:
- The gradient of a function f:Rn→R at a point x, denoted ∇f(x), is the vector of first derivatives with respect to x1,…,xn.
-
When n=1 (there is only one input variable), the gradient is simply the derivative.
- An affine function f:Rn→R, with values f(x)=aTx+b has the gradient:
- ∇f(x)=a.
-
i.e. For all Affine Functions, the gradient is the constant vector a.
-
- Interpreting a and b:
-
- The b=f(0) is the constant term. For this reason, it is sometimes referred to as the bias, or intercept.
as it is the point where f intercepts the vertical axis if we were to plot the graph of the function.
- The b=f(0) is the constant term. For this reason, it is sometimes referred to as the bias, or intercept.
-
- The terms aj,j=1,…,n, which correspond to the gradient of f, give the coefficients of influence of xj on f.
For example, if a1>>a3, then the first component of x has much greater influence on the value of f(x) than the third.
- The terms aj,j=1,…,n, which correspond to the gradient of f, give the coefficients of influence of xj on f.
-
- First-order approximation of non-linear functions:
-
- One-dimensional case:
Consider a function of one variable f:R→R, and assume it is differentiable everywhere.
Then we can approximate the values function at a point x near a point x0 as follows:
- One-dimensional case:
- f(x)≃l(x):=f(x0)+f′(x0)(x−x0),
- )where\(f′(x) denotes the derivative of f at x.
-
- Multi-dimensional:
Let us approximate a differentiable function f:Rn→R by a linear function l, so that f and l coincide up and including to the first derivatives.
The approximate function l must be of the form:
- Multi-dimensional:
- l(x)=aTx+b,
- )where\(a∈Rn and b∈R.
-
The corresponding approximation l is called the first-order approximation to f at x0.
-
- Our condition that l coincides with f up and including to the first derivatives shows that we must have:
- ∇l(x)=a=∇f(x0),aTx0+b=f(x0),
- )where\(∇f(x0) is the gradient, of f at x0.
-
- First-order Expansion of a function [Theorem]:
- The first-order approximation of a differentiable function f at a point x0 is of the form:
- f(x)≈l(x)=f(x0)+∇f(x0)T(x−x0)
- where ∇f(x0)∈Rn is the gradient of f at x0.
Matrices
-
- Matrix Transpose:
- Aij=ATji,∀i,j∈F
- Properties:
- (AB)T=BTAT.
-
- Matrix-vector product:
- (Ax)i=n∑j=1Aijxj,i=1,…,m.
- Where the Matrix is ∈Rm×n and the vector is ∈Rm.
- Interpretations:
-
-
A linear combination of the columns of A:
Ax=(aT1x…aTmx)T .
where the columns of A are given by the vectors ai,i=1,…,n, so that A=(a1,…,an). -
Scalar Products of Rows of A with x:
Ax=∑ni=1xiai .
where the rows of A are given by the vectors aTi,i=1,…,m: A=(aT1…aTm)T.
-
-
- Left Product:
- If z∈Rm, then the notation zTA is the row vector of size n equal to the transpose of the column vector ATz∈Rn:
- (zTA)j=∑mi=1Aijzi,j=1,…,n.
-
- Matrix-matrix product:
- (AB)ij=∑nk=1AikBkj.
- where A∈Rm×n and B∈Rn×p, and the notation AB denotes the m×p matrix given above.
- Interpretations:
-
- Transforming the columns of B into Abi:
AB=A(b1…bn)=(Ab1…Abn) .
where the columns of B are given by the vectors bi,i=1,…,n, so that B=(b1,…,bn). - Transforming the Rows of A into aTiB:
AB=(aT1⋮aTm)B=(aT1B⋮aTmB).
where the rows of A are given by the vectors aTi,i=1,…,m: A=(aT1…aTm)T.
- Transforming the columns of B into Abi:
-
Block Matrix Products:
-
Outer Products:
-
- Trace:
- The trace of a square n×n matrix A, denoted by TrA, is the sum of its diagonal elements:
- TrA=∑ni=1Aii.
- Properties:
- TrA=TrAT.
- Tr(AB)=Tr(BA).
- Tr(XYZ)=Tr(ZXY)=Tr(YZX).
- tr(A+B)=tr(A)+tr(B).
- tr(cA)=ctr(A).
- tr(XTY)=tr(XYT)=tr(YTX)=tr(YXT)=∑i,jXijYij.
- \({\displaystyle \operatorname{tr} \left(X^{\mathrm {T} }Y\right)=\sum _{ij}(X\circ Y)_{ij}}\ \ \ \\) (The Hadamard product).
- Arbitrary permutations of the product of matrices is not allowed. Only, cyclic permutations are.
However, if products of three symmetric matrices are considered, any permutation is allowed.
- The trace of an idempotent matrix A, is the dimension of A.
- The trace of a nilpotent matrix is zero.
- If f(x)=(x−λ1)d1⋯(x−λk)dk is the characteristic polynomial of a matrix A, then tr(A)=d1λ1+⋯+dkλk.
- When both A and B are n×n, the trace of the (ring-theoretic) commutator of A and B vanishes: tr([A,B])=0; one can state this as “the trace is a map of Lie algebras GLn→k from operators to scalars”, as the commutator of scalars is trivial (it is an abelian Lie algebra).
- The trace of a projection matrix is the dimension of the target space. PX=X(XTX)−1XT⇒tr(PX)=rank(X)
-
- Scalar Product:
- ⟨A,B⟩:=Tr(ATB)=m∑i=1n∑j=1AijBij.
-
The above definition is Symmetric:
- ⟹⟨A,B⟩=Tr(ATB)=Tr(ATB)T=Tr(BTA)=⟨B,A⟩.
-
We can interpret the matrix scalar product as the vector scalar product between two long vectors of length mn each, obtained by stacking all the columns of A,B on top of each other.
- Special Matrices:
- Diagonal matrices: are square matrices A with Aij=0 when i≠j.
- Symmetric matrices: are square matrices that satisfy Aij=Ajifor every pair (i,j).
-
Triangular matrices: are square matrices that satisfy Aij=Ajifor every pair (i,j).
{F}}^{2}=|AR|{\rm {F}}^{2}=|RA|{\rm {F}}^{2}}foranyrotationmatrixR$$.-
Invariant under a unitary transformation for complex matrices.
-
‖.
-
{\displaystyle \|A+B\|_{\rm {F}}^{2}=\|A\|_{\rm {F}}^{2}+\|B\|_{\rm {F}}^{2}+2\langle A,B\rangle _{\mathrm {F} }}.
-
-
- l_{\infty,\infty} (Max Norm):
- \|A\|_{\max} = \max_{ij} |a_{ij}|.
- Properties:
- NOT Submultiplicative.
-
- The Spectral Norm:
- {\displaystyle \|A\|_{2}={\sqrt {\lambda _{\max }(A^{^{*}}A)}}=\sigma _{\max }(A)} = {\displaystyle \max_{\|x\|_2!=0}(\|Ax\|_2)/(\|x\|_2)}.
The spectral norm of a matrix {\displaystyle A} is the largest singular value of {\displaystyle A}. i.e. the square root of the largest eigenvalue of the positive-semidefinite matrix {\displaystyle A^{*}A}.
-
- The Spectral Radius of A \\) [denoted \(\rho(A)]:
- \lim_{r\rightarrow\infty}\|A^r\|^{1/r}=\rho(A).
- Properties:
-
Submultiplicative.
-
Satisfies, {\displaystyle \|A^{r}\|^{1/r}\geq \rho (A),}, where \rho(A) is the spectral radius of A.
-
It is an “induced vector-norm”.
-
-
Asynchronous:
-
Asynchronous:
-
Equivalence of Norms:
- Applications:
-
RMS Gain: Frobenius Norm.
-
Peak Gain: Spectral Norm.
-
Distance between Matrices: Frobenius Norm.
-
Direction of Maximal Variance: Spectral Norm.
-