Ahmad Badary

Introduction

Text Classification Breakdown:
We can think of text classification as being broken down into a two stage process:
1. Representation: Process text into some (fixed) representation -> How to learn \(\mathbf{x}'\).
2. Classification: Classify document given that representation \(\mathbf{x}'\) -> How to learn \(p(c\vert x')\).
Representation:
Bag of Words (BOW):
- Pros:
  - Easy, no effort
- Cons:
  - Variable size, ignores sentential structure, sparse representations
Continuous BOW:
- Pros:
  - Continuous Repr.
- Cons:
  - Ignores word ordering
Deep CBOW:
- Pros:
  - Can learn feature combinations (e.g. “not” AND “hate”)
- Cons:
  - Cannot learn word-ordering (positional info) directly (e.g. “not hate”)
Bag of n-grams:
- Pros:
  - Captures (some) combination features and word-ordering (e.g. “not hate”), works well
- Cons:
  - Parameter Explosion, no sharing between similar words/n-grams
CNNs for Text:
Two main paradigms:
1. Context-window modeling: for tagging etc. get the surrounding context before tagging.
2. Sentence modeling: do convolution to extract n-grams, pooling to combine over whole sentence.