We use cookies on this site to enhance your user experience
By clicking the Accept button, you agree to us doing so. More info on our cookie policy
We use cookies on this site to enhance your user experience
By clicking the Accept button, you agree to us doing so. More info on our cookie policy
In a probabilistic model, given some parameters $\theta$ and observed data x, the likelihood function measures how likely the observed data is under those parameters:
Given observed data ( x ) and model parameters ( $\theta$ ), the likelihood function is:
\[L(\theta) = P(x | \theta)\]Since likelihoods are often small numbers (fractions close to zero), working with them directly can cause numerical instability. Instead, we take the log of the likelihood Taking the logarithm of the likelihood:
\[\log L(\theta) = \log P(x | \theta)\]Most machine learning models minimize loss functions (instead of maximizing likelihood). To turn log-likelihood maximization into a minimization problem, we take the negative of the log-likelihood. To turn the maximization problem into a minimization problem:
\[\text{NLL}(\theta) = - \log P(x | \theta)\]\(\text{NLL} = - \sum_{i=1}^{n} \left[ y_i \log p_i + (1 - y_i) \log (1 - p_i) \right]\)
where:
\(\text{NLL} = \sum_{i=1}^{n} \left[ \frac{(x_i - \mu)^2}{2\sigma^2} + \log \sigma + \frac{1}{2} \log(2\pi) \right]\)
\(H(p, q) = - \sum_{i=1}^{C} p_i \log q_i\)
where:
\(L(y, \hat{y}) = - \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]\)
where:
For multi-class classification with softmax outputs:
\[L(y, \hat{y}) = - \sum_{i=1}^{C} y_i \log(\hat{y}_i)\]Since only one $y_i$ = 1 (one-hot encoding), this simplifies to:
\[L(y, \hat{y}) = - \log(\hat{y}_{\text{correct}} )\]Cross-entropy is equivalent to the negative log-likelihood (NLL) when using softmax probabilities:
\[\text{Cross-Entropy Loss} = \text{Negative Log-Likelihood}\]exponentiation of the average negative log-likelihood of a sequence. For a given sequence of words w1,w2,….,wN the perplexity is calculated as
\[\text{PPL} = 2^{-\frac{1}{N} \sum_{i=1}^N \log_2 P(w_i \mid w_1, w_2, \ldots, w_{i-1})}\]Latest Posts