Table of Contents



FIRST

  1. Motivation:
    Combination (consecutively) of words are hard to capture/model/detect.

  2. Padding:
    Padding:

    • After convolution, the rows and columns of the output tensor are either:
    • Equal to rows/columns of input tensor (“same” convolution)

      Keeps the output dimensionality intact.

    • Equal to rows/columns of input tensor minus the size of the filter plus one (“valid” or “narrow’)
    • Equal to rows/columns of input tensor plus filter minus one (“wide”)

Striding:
Skip some of the outputs to reduce length of extracted feature vector
img \

Pooling:
Pooling is like convolution, but calculates some reduction function feature-wise.

Stacking - Stacked Convolution:

Dilation - Dilated Convolution:
Gradually increase stride, every time step (no reduction in length).
img
One can use the final output vector, for next target output prediction. Very useful if the problem we are modeling requires a fixed size output (e.g. auto-regressive models).

Structured Convolution:

  1. Asynchronous:

  2. Asynchronous:

  3. Asynchronous:

  4. Asynchronous:

  5. Asynchronous:

  6. Asynchronous:


SECOND

  1. Asynchronous:

  2. Asynchronous:

  3. Asynchronous:

  4. Asynchronous:

  5. Asynchronous:

  6. Asynchronous:

  7. Asynchronous:

  8. Asynchronous:


THIRD

  1. Asynchronous:

  2. Asynchronous:

  3. Asynchronous:

  4. Asynchronous:

  5. Asynchronous:

  6. Asynchronous:

  7. Asynchronous:

  8. Asynchronous: