FIRST
- Asynchronous:
SECOND
THIRD
ML System Design
-
ML System Design (NG) - Summary:
Problem: Spam Classification
Task: Build a Spam Classifier- How to spend your time to lower the systems error:
- Collect lots of data
- Develop Sophisticated Features based on email routing information (from email header)
- Develop Sophisticated Features for message body (“deal” vs “Deals”, etc.)
- Develop Sophisticated algorithm for misspellings (“Med1cine”, “M0rtgage” etc.)
Recommended Approach:
- Start with a simple algorithm that you can implement quickly. Implement it and test it on your cross-validation data.
- Plot learning curves to decide if more data, more features, etc. are likely to help.
- Error analysis: Manually examine the examples (in cross validation set that your algorithm made errors on. See if you spot any systematic trend in what type of examples it is making errors on.
- How to spend your time to lower the systems error:
Notes: