Gibbs' inequality (lemma)

Relative entropy (KL divergence) is nonnegative: D(P‖Q) ≥ 0, with equality iff P = Q (a.s.).
Gibbs’ inequality (lemma)

Definitions and notation

Statement

Let PP and QQ be probability measures on the same measurable space (Ω,F)(\Omega,\mathcal F) with PQP \ll Q (absolute continuity). Let f=dPdQf=\frac{dP}{dQ} be the Radon–Nikodym derivative. Then the relative entropy satisfies

D(PQ)  =  Ωlog ⁣(dPdQ)dP    0. D(P\|Q) \;=\; \int_\Omega \log\!\Big(\frac{dP}{dQ}\Big)\,dP \;\ge\; 0.

Moreover, D(PQ)=0D(P\|Q)=0 if and only if P=QP=Q (equivalently, f=1f=1 QQ-a.s.).

In the discrete case, for probability vectors (pi)i(p_i)_i and (qi)i(q_i)_i with pi>0qi>0p_i>0 \Rightarrow q_i>0,

ipilogpiqi0, \sum_i p_i \log\frac{p_i}{q_i} \ge 0,

with equality iff pi=qip_i=q_i for all ii.

Key hypotheses and conclusions

Hypotheses

  • PP and QQ are probability measures on the same space.
  • Absolute continuity PQP\ll Q (so dPdQ\frac{dP}{dQ} exists).
  • Integrability of log ⁣(dPdQ)\log\!\big(\frac{dP}{dQ}\big) under PP (automatic in many finite/discrete settings).

Conclusions

  • Nonnegativity: D(PQ)0D(P\|Q)\ge 0.
  • Rigidity of equality: D(PQ)=0D(P\|Q)=0 iff the two measures coincide.

Proof idea / significance

A common proof is a one-line application of to the convex function xxlogxx\mapsto x\log x (or to xlogxx\mapsto -\log x after normalization). Another elementary route uses the pointwise inequality logxx1\log x \le x-1, applied to x=dQdPx=\frac{dQ}{dP} on the set where PP has mass.

In statistical mechanics, Gibbs’ inequality underlies “maximum entropy” and free-energy variational principles; it is a measure-theoretic form of the statement that the Gibbs distribution minimizes free energy (or equivalently maximizes entropy subject to constraints).