Table of Contents

MLE vs MAP Estimation


Maximum Likelihood Estimation (MLE)

  1. Maximum Likelihood Estimation:
    Likelihood in Parametric Models:

    Suppose we have a parametric model \(\{p(y ; \theta) \vert \theta \in \Theta\}\) and a sample \(D=\left\{y_{1}, \ldots, y_{n}\right\}\):

    • The likelihood of parameter estimate \(\hat{\theta} \in \Theta\) for sample \(\mathcal{D}\) is:

      $$p(\mathcal{D} ; \hat{\theta})=\prod_{i=1}^{n} p\left(y_{i} ; \hat{\theta}\right)$$

    • In practice, we prefer to work with the log-likelihood. Same maximum but

      $$\log p(\mathcal{D} ; \hat{\theta})=\sum_{i=1}^{n} \log p\left(y_{i} ; \theta\right)$$

      and sums are easier to work with than products.

    Likelihood is the probability of the data given the parameters of the model

    MLE for Parametric Models:

    The maximum likelihood estimator (MLE) for \(\theta\) in the (parametric) model \(\{p(y, \theta) \vert \theta \in \Theta\}\) is:

    $$\begin{aligned} \hat{\theta} &=\underset{\theta \in \Theta}{\arg \max } \log p(\mathcal{D}, \hat{\theta}) \\ &=\underset{\theta \in \Theta}{\arg \max } \sum_{i=1}^{n} \log p\left(y_{i} ; \theta\right) \end{aligned}$$

    You are finding the value of the parameter \(\theta\) that, if used (in the model) to generate the probability of the data, would make the data most “likely” to occur.

    • MLE Intuition:
      If I choose a hypothesis \(h\) underwhich the observed data is very plausible then the hypothesis is very likely.
    • Maximum Likelihood as Empirical Risk Minimization
    • Finding the MLE is an optimization problem.
    • For some model families, calculus gives a closed form for the MLE
    • Can also use numerical methods we know (e.g. SGD)


    • Why maximize the natural log of the likelihood?

Maximum A Posteriori (MAP) Estimation