This is a very shallow post I use as reference to other posts just to make sure we are on the same page on understanding and notation around Supervised Learning.
How are models “trained”?
Imagine we are given a dataset of input-outputs pairs1:
1 Here
We are asked (quite unsurprisingly) to find a decent mapping
2 Fancy way of saying “a sequence of operations”
3 The more the merrier 🤪
To find the best parameters according to the data, we define an error metric (loss function) which we wanna optimize on. If working on regression it could be MSE. If working on classification, it could be cross-entropy, or any other for that matter:
Now we need to find a global optima of
4 Unless very carefully choosing
We usually start with “random” parameters
Where
The gradient
What I explained previously falls within the MLE paradigm in probabilistic modelling.
Before understanding the differences it is important to keep in mind the Bayes theorem:
Naming goes:
posterior likelihood prior evidence
Differences:
- MLE (Maximum Likelihood Estimation) finds the parameters which better explain the data, the ones that give a highest likelihood:
- MAP (Maximum a Posteriori) works within the Bayesian framework and, instead of maximizing the likelihood, maximizes the posterior. It finds the best parameters according to the data. For this, we need to assume some prior distribution over the parameters
.
Notice that the probability of the observed data is independent from the model parameters. Thus, we do not need to consider it for the MAP computation.
In addition, this can be linked to some regularization techiques, depending on the prior chosen.
Pytorch implementation
The previous logic is abstracted by deep learning libraries like so:
for inputs, labels in enumerate(training_loader):
# Zero your gradients for every batch
optimizer.zero_grad()
# Make predictions for this batch
= model(inputs)
outputs
# Compute the loss
= loss_fn(outputs, labels)
loss
# Compute each parameter's gradients wrt the loss
loss.backward()
# Add gradients to obtained parameters
optimizer.step()