Table of Contents



img


img


img

Linear Functions and Transformations, and Maps

  1. Linear Functions:
    Linear functions are functions which preserve scaling and addition of the input argument.

    Formally,

    A function f:RnR is linear if and only if f preserves scaling and addition of its arguments:
    • for every xRn, and αR, f(αx)=αf(x); and
    • for every x1,x2Rn,f(x1+x2)=f(x1)+f(x2).
  2. Affine Functions:
    Affine functions are linear functions plus constant functions.
    Formally,
    A function f is affine if and only if the function ˜f:RnR with values ˜f(x)=f(x)f(0) is linear.

    Equivalently,

    A map f:RnRm is affine if and only if the map g:RnRm with values g(x)=f(x)f(0) is linear.
  3. Quadratic Function
    A function q:RnR is said to be a quadratic function if it can be expressed as
    q(x)=ni=1nj=1Aijxixj+2ni=1bixi+c,
    for numbers Aij,bi, and c,i,j1,,n.

    A quadratic function is thus an affine combination of the  xi’s and all the “cross-products” xixj.

    We observe that the coefficient of xixj is (Aij+Aji).

    The function is said to be a quadratic form if there are no linear or constant terms in it: bi=0,c=0.

    The Hessian of a quadratic function is always constant.

  4. Equivalent Definitions of Linear Functions [Theorem]:
    A map f:RnRm is linear if and only if either one of the following conditions hold:
    • f preserves scaling and addition of its arguments:
      • for every xRn, and αR,f(αx)=αf(x); and
      • for every x1,x2Rn,f(x1+x2)=f(x1)+f(x2).
    • f vanishes at the origin:
      • f(0)=0, and
      • It transforms any line segment Rn into another segment Rm: x,yRn,λ[0,1] : f(λx+(1λ)y)=λf(x)+(1λ)f(y).
        • f is differentiable, vanishes at the origin, and the matrix of its derivatives is constant.
        • There exist ARm×n such that,  xRn : f(x)=Ax.
  5. Vector Form (and the scalar product):
    Theorem: Representation of affine function via the scalar product.
           )Afunction\(f:RnR is affine if and only if it can be expressed via a scalar product:
                                             )\(         )\(f(x)=aTx+b ,
           )forsomeuniquepair\((a,b), with aRn and bR, given by ai=f(ei)f(0), with ei         )the\(ith unit vector Rn,i=1,,n, and  b=f(0).

    The function is linear b=0.

    The theorem shows that a vector can be seen as a (linear) function from the “input” space Rn to the “output” space R.

    Both points of view (matrices as simple collections of numbers, or as linear functions) are useful.

  6. Gradient of a Linear Function:
    img

  7. Gradient of an Affine Function:
    The gradient of a function f:RnR at a point x, denoted f(x), is the vector of first derivatives with respect to x1,,xn.

    When n=1 (there is only one input variable), the gradient is simply the derivative.

    An affine function f:RnR, with values f(x)=aTx+b has the gradient:
    f(x)=a.

    i.e. For all Affine Functions, the gradient is the constant vector a.

  8. Interpreting a and b:
    • The b=f(0) is the constant term. For this reason, it is sometimes referred to as the bias, or intercept.

      as it is the point where f intercepts the vertical axis if we were to plot the graph of the function.

    • The terms aj,j=1,,n, which correspond to the gradient of f, give the coefficients of influence of xj on f.

      For example, if a1>>a3, then the first component of x has much greater influence on the value of f(x) than the third.

  9. First-order approximation of non-linear functions:
    • One-dimensional case:
      Consider a function of one variable f:RR, and assume it is differentiable everywhere.
      Then we can approximate the values function at a point x near a point x0 as follows:
    f(x)l(x):=f(x0)+f(x0)(xx0),
           )where\(f(x) denotes the derivative of f at x.
    • Multi-dimensional:
      Let us approximate a differentiable function f:RnR by a linear function l, so that f and l coincide up and including to the first derivatives.
      The approximate function l must be of the form:
    l(x)=aTx+b,
           )where\(aRn and bR.

    The corresponding approximation l is called the first-order approximation to f at x0.

    • Our condition that l coincides with f up and including to the first derivatives shows that we must have:
    l(x)=a=f(x0),aTx0+b=f(x0),
           )where\(f(x0) is the gradient, of f at x0.
  10. First-order Expansion of a function [Theorem]:
    The first-order approximation of a differentiable function f at a point x0 is of the form:
    f(x)l(x)=f(x0)+f(x0)T(xx0)
    where f(x0)Rn is the gradient of f at x0.

img


Matrices

  1. Matrix Transpose:
    Aij=ATji,i,jF
    • Properties:
      • (AB)T=BTAT.
  2. Matrix-vector product:
    (Ax)i=nj=1Aijxj,i=1,,m.
    Where the Matrix is Rm×n and the vector is Rm.
    Interpretations:
    1. A linear combination of the columns of A:
                                                   Ax=(aT1xaTmx)T .
      where the columns of A are given by the vectors ai,i=1,,n, so that A=(a1,,an).

    2. Scalar Products of Rows of A with x:
                                                   Ax=ni=1xiai .
      where the rows of A are given by the vectors aTi,i=1,,m: A=(aT1aTm)T.

  3. Left Product:
    If zRm, then the notation zTA is the row vector of size n equal to the transpose of the column vector ATzRn:
    (zTA)j=mi=1Aijzi,j=1,,n.
  4. Matrix-matrix product:
    (AB)ij=nk=1AikBkj.
    where ARm×n and BRn×p, and the notation AB denotes the m×p matrix given above.
    Interpretations:
    1. Transforming the columns of B into Abi:
                                                   AB=A(b1bn)=(Ab1Abn) .
      where the columns of B are given by the vectors bi,i=1,,n, so that B=(b1,,bn).
    2. Transforming the Rows of A into aTiB:
                                                   AB=(aT1aTm)B=(aT1BaTmB).
      where the rows of A are given by the vectors aTi,i=1,,m: A=(aT1aTm)T.
  5. Block Matrix Products:
    img

  6. Outer Products: img

  7. Trace:
    The trace of a square n×n matrix A, denoted by TrA, is the sum of its diagonal elements:
    TrA=ni=1Aii.
    • Properties:
      • TrA=TrAT.
      • Tr(AB)=Tr(BA).
      • Tr(XYZ)=Tr(ZXY)=Tr(YZX).
      • tr(A+B)=tr(A)+tr(B).
      • tr(cA)=ctr(A).
      • tr(XTY)=tr(XYT)=tr(YTX)=tr(YXT)=i,jXijYij.
      • \({\displaystyle \operatorname{tr} \left(X^{\mathrm {T} }Y\right)=\sum _{ij}(X\circ Y)_{ij}}\ \ \ \\) (The Hadamard product).
      • Arbitrary permutations of the product of matrices is not allowed. Only, cyclic permutations are.

        However, if products of three symmetric matrices are considered, any permutation is allowed.

      • The trace of an idempotent matrix A, is the dimension of A.
      • The trace of a nilpotent matrix is zero.
      • If f(x)=(xλ1)d1(xλk)dk is the characteristic polynomial of a matrix A, then tr(A)=d1λ1++dkλk.
      • When both A and B are n×n, the trace of the (ring-theoretic) commutator of A and B vanishes: tr([A,B])=0; one can state this as “the trace is a map of Lie algebras GLnk from operators to scalars”, as the commutator of scalars is trivial (it is an abelian Lie algebra).
      • The trace of a projection matrix is the dimension of the target space. PX=X(XTX)1XTtr(PX)=rank(X)
  8. Scalar Product:
    A,B:=Tr(ATB)=mi=1nj=1AijBij.

    The above definition is Symmetric:

    A,B=Tr(ATB)=Tr(ATB)T=Tr(BTA)=B,A.

    We can interpret the matrix scalar product as the vector scalar product between two long vectors of length mn each, obtained by stacking all the columns of A,B on top of each other.

  9. Special Matrices:
    • Diagonal matrices: are square matrices A with Aij=0 when ij.
    • Symmetric matrices: are square matrices that satisfy Aij=Ajifor every pair (i,j).
    • Triangular matrices: are square matrices that satisfy Aij=Ajifor every pair (i,j).
      {F}}^{2}=|AR|{\rm {F}}^{2}=|RA|{\rm {F}}^{2}}foranyrotationmatrixR$$.

      1. Invariant under a unitary transformation for complex matrices.

      2. .

      3. {\displaystyle \|A+B\|_{\rm {F}}^{2}=\|A\|_{\rm {F}}^{2}+\|B\|_{\rm {F}}^{2}+2\langle A,B\rangle _{\mathrm {F} }}.

  10. l_{\infty,\infty} (Max Norm):
    \|A\|_{\max} = \max_{ij} |a_{ij}|.
    • Properties:
      1. NOT Submultiplicative.
  11. The Spectral Norm:
    {\displaystyle \|A\|_{2}={\sqrt {\lambda _{\max }(A^{^{*}}A)}}=\sigma _{\max }(A)} = {\displaystyle \max_{\|x\|_2!=0}(\|Ax\|_2)/(\|x\|_2)}.

    The spectral norm of a matrix {\displaystyle A} is the largest singular value of {\displaystyle A}. i.e. the square root of the largest eigenvalue of the positive-semidefinite matrix {\displaystyle A^{*}A}.

    • The Spectral Radius of A \\) [denoted \(\rho(A)]:
      \lim_{r\rightarrow\infty}\|A^r\|^{1/r}=\rho(A).
    • Properties:
      1. Submultiplicative.

      2. Satisfies, {\displaystyle \|A^{r}\|^{1/r}\geq \rho (A),}, where \rho(A) is the spectral radius of A.

      3. It is an “induced vector-norm”.

  12. Asynchronous:

  13. Asynchronous:

  14. Equivalence of Norms:

  15. Applications:
    1. RMS Gain: Frobenius Norm.

    2. Peak Gain: Spectral Norm.

    3. Distance between Matrices: Frobenius Norm.

    4. Direction of Maximal Variance: Spectral Norm.