Table of Contents




Generalized Linear Models and Exponential Family Distributions (Blog!)
Logistic regression as a neural network (Blog!)

A very simple demo of interactive controls on Jupyter notebook - Interactive Linear Regression (Article+Code)

Regression


Linear Regression

Assume that the target distribution is a sum of a deterministic function \(f(x; \theta)\) and a normally distributed error \(\epsilon \sim \mathcal{N}\left(0, \sigma^{2}\right)\):

$$y = f(x; \theta) + \epsilon$$

Thus, \(y \sim \mathcal{N}\left(f(x; \theta), \sigma^{2}\right)\), and (we assume) there is a distribution \(p(y\vert x)\) where \(y \sim \mathcal{N}\left(f(x; \theta), \sigma^{2}\right)\).
- Notice that, \(\epsilon = y - \hat{y} \implies\)

$$\begin{align} \epsilon &\sim \mathcal{N}\left(0, \sigma^{2}\right) \\ &\sim \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{\left(\epsilon\right)^{2}}{2 \sigma^{2}}} \\ &\sim \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{\left(y-\hat{y}\right)^{2}}{2 \sigma^{2}}} \end{align}$$

In LR, the equivalent is:
We assume that we are given data \(x_{1}, \ldots, x_{n}\) and outputs \(y_{1}, \ldots, y_{n}\) where \(x_{i} \in \mathbb{R}^{d}\) and \(y_{i} \in \mathbb{R}\) and that there is a distribution \(p(y \vert x)\) where \(y \sim \mathcal{N}\left(w^{\top} x, \sigma^{2}\right)\).

$$Y_{i} \vert \boldsymbol{\theta} \sim \mathcal{N}\left(h_{\boldsymbol{\theta}}\left(\mathbf{x}_ {i}\right), \sigma^{2}\right)$$

In other words, we assume that there is a true linear model weighted by some true \(w\) and the values generated are scattered around it with some error \(\epsilon \sim \mathcal{N}\left(0, \sigma^{2}\right)\).
Then we just want to obtain the max likelihood estimation:

$$\begin{aligned} p(Y \vert X, w) &=\prod_{i=1}^{n} p\left(y_{i} \vert x_{i}, w\right) \\ \log p(\cdot) &=\sum_{i}-\log \left(2 \pi \sigma^{2}\right)-\frac{1}{2 \sigma^{2}}\left(y_{i}-w^{\top} x_{i}\right)^{2} \end{aligned}$$


Logistic Regression

The errors are not directly observable, since we never observe the actual probabilities directly.

Latent Variable Interpretation:
The logistic regression can be understood simply as finding the \(\beta\) parameters that best fit:

$$y=\left\{\begin{array}{ll}{1} & {\beta_{0}+\beta_{1} x+\varepsilon>0} \\ {0} & {\text { else }}\end{array}\right.$$

where $\varepsilon$ is an error distributed by the standard logistic distribution.
The associated latent variable is \({\displaystyle y'=\beta _{0}+\beta _{1}x+\varepsilon }\). The error term \(\varepsilon\) is not observed, and so the \(y'\) is also an unobservable, hence termed “latent” (the observed data are values of \(y\) and \(x\)). Unlike ordinary regression, however, the \(\beta\) parameters cannot be expressed by any direct formula of the \(y\) and \(x\) values in the observed data. Instead they are to be found by an iterative search process.

Notes:

LR, MINIMIZING the ERROR FUNCTION (DERIVATION):

img

Linear Classification and Regression, and Non-Linear Transformations:

img

A Third Linear Model - Logistic Regression: img

Logistic Regression Algorithm:
img

Summary of Linear Models:

img