ens

Modern NN are miscalibrated: not well-calibrated. They tend to be very confident. We cannot interpret the softmax probabilities as reflecting the true probability distribution or as a measure of confidence.

Miscalibration: is the discrepancy between model confidence and model accuracy.
You assume that if a model gives \(80\%\) confidence for 100 images, then \(80\) of them will be accurate and the other \(20\) will be inaccurate.
img

Model Confidence: probability of correctness.
Calibrated Confidence (softmax scores) \(\hat{p}\): \(\hat{p}\) represents a true probability.

img

img

Probability Calibration:
Predicted scores (model outputs) of many classifiers do not represent “true” probabilities.
They only respect the mathematical definition (conditions) of what a probability function is:

  1. Each “probability” is between 0 and 1
  2. When you sum the probabilities of an observation being in any particular class, they sum to 1.

Note: The samples that are used to fit the calibrator should not be the same samples used to fit the classifier, as this would introduce bias. This is because performance of the classifier on its training data would be better than for novel data. Using the classifier output of training data to fit the calibrator would thus result in a biased calibrator that maps to probabilities closer to 0 and 1 than it should.