Table of Contents



RNNs

Refer to this section on RNNs

  1. Process Sequences:
    • One-to-One
    • One-to-Many: Image Captioning: image -> seq of words
    • Many-to-One: Sentiment Classification: seq of words -> Sentiment
    • Many-to-Many:
      Machine Translation: seq of words -> seq of words
    • (Discrete) Many-to-Many:
      Frame-Level Video Classification: seq. of frames -> seq of classes per frame
    img
  2. RNN Structure:
    We can process a sequence of vectors \(\vec{x}\) by applying a recurrence formula__ at every time step:
    \[h_t = f_W(h_{t-1}, x_t)\]
    where \(h_t\) is the new state, \(f_W\) is some function with weights \(W\), \(h_{t-1}\) is the old state, and \(x_t\) is the input vector at some time step \(t\).

    The same function and set of parameters (weights) are used at every time step.

  3. A Vanilla Architecture of an RNN:
    \[\begin{align} h_t &= f_W(h_{t-1}, x_t) h_t &= tanh(W_{hh}h_t{t-1} + W_{xh}x_t) y_t &= W_{hy}h_t \end{align}\]
  4. The RNN Computational Graph:
    • Many-to-Many:
    img
    • One-to-Many:
    img
    • Seq-to-Seq:
    img
  5. Example Architecture: Character-Level Language Model:
    img
    img
  6. The Functional Form of a Vanilla RNN (Gradient Flow):
    img

Applications in CV

Coming Soon!


Implementations and Training (LSTMs and GRUs)

Coming Soon!