# Loss Functions

In this post, I will summarize some common loss functions in machine learning. Some of the definitions come from Professor Liang in the lecture note of CS221 at Stanford.

A cheatsheet of loss functions can be found here.

According to Professor Liang, a loss function Loss$$(x, y, \mathbf w)$$ quantifies how unhappy you would be if you used $$\mathbf w$$ to make a prediction on $$x$$ when the correct output is $$y$$. It is the object we want to minimize.

One thing we need to keep in mind is that loss function is an individual topic independent of prediction models.

## Zero-one loss

$\text{Loss}_{0-1}(x, y, \mathbf w) = 1[f_{\mathbf w}(x) \ne y]$

## Squared loss (L2 loss)

First we define residual. The residual is $$(\mathbf w · \phi (x)) − y$$, the amount by which prediction $$f_{\mathbf w}(x) = \mathbf w · \phi (x)$$ overshoots the target $$y$$. Then we have

$\text{Loss}_{\text{squared}}(x, y, \mathbf w) = (f_{\mathbf w}(x) - y)^2$

Squared loss is used in Linear Regression.

## Absolute deviation loss (L1 loss)

$\text{Loss}_{\text{absdev}}(x, y, \mathbf w) = |f_{\mathbf w}(x) - y|$

## Cross entropy loss

$\text{Loss}_{\text{cross-entropy}}(x, y, \mathbf w) = y\log f_{\mathbf w}(x) +(1-y) \log (1-f_{\mathbf w}(x))$

Cross entropy is used in Logistic Regression.

## Hinge loss

$$\text{Loss}_{\text{hinge}}(x,y,\mathbf w) = \begin{cases} max (0, 1-f_{\mathbf w}(x)) & \text{if } y=1\\ max (0, 1+f_{\mathbf w}(x)) & \text{if } y=0\end{cases}$$

Hinge loss is used in SVM.