Relative entropy (KL divergence)

A directed measure of discrepancy between two probability distributions, defined by an expectation of a log-likelihood ratio.
Relative entropy (KL divergence)

A relative entropy (Kullback–Leibler divergence) is an extended real number DKL(PQ)D_{\mathrm{KL}}(P\|Q) associated to two PP and QQ on the same measurable space, defined (when PP is absolutely continuous with respect to QQ) by

DKL(PQ)  =  log ⁣(dPdQ)dP, D_{\mathrm{KL}}(P\|Q)\;=\;\int \log\!\Big(\frac{dP}{dQ}\Big)\,dP,

where dPdQ\frac{dP}{dQ} is the Radon–Nikodym derivative (see the ). If PP is not absolutely continuous with respect to QQ, one sets DKL(PQ)=+D_{\mathrm{KL}}(P\|Q)=+\infty.

In the discrete case with mass functions p,qp,q on a countable set, this becomes

DKL(PQ)=xp(x)logp(x)q(x), D_{\mathrm{KL}}(P\|Q)=\sum_x p(x)\,\log\frac{p(x)}{q(x)},

with the convention that terms with p(x)=0p(x)=0 contribute 00, and any xx with p(x)>0p(x)>0 and q(x)=0q(x)=0 forces DKL(PQ)=+D_{\mathrm{KL}}(P\|Q)=+\infty. Relative entropy is always nonnegative by , equals 00 iff P=QP=Q (in the appropriate sense), and is not symmetric in general. It is related to other discrepancy notions such as (for example via ).

Examples:

  • If P=Bernoulli(p)P=\mathrm{Bernoulli}(p) and Q=Bernoulli(q)Q=\mathrm{Bernoulli}(q) with p,q(0,1)p,q\in(0,1), then DKL(PQ)=plogpq+(1p)log1p1q. D_{\mathrm{KL}}(P\|Q)=p\log\frac{p}{q}+(1-p)\log\frac{1-p}{1-q}.
  • If P=N(μ1,σ2)P=\mathcal{N}(\mu_1,\sigma^2) and Q=N(μ2,σ2)Q=\mathcal{N}(\mu_2,\sigma^2) with the same variance σ2>0\sigma^2>0, then DKL(PQ)=(μ1μ2)22σ2. D_{\mathrm{KL}}(P\|Q)=\frac{(\mu_1-\mu_2)^2}{2\sigma^2}.