Gibbs' inequality (lemma)
Definitions and notation
- Probability measures , expectation .
- Relative entropy (KL divergence) .
- Shannon entropy (for the cross-entropy interpretation).
Statement
Let and be probability measures on the same measurable space with (absolute continuity). Let be the Radon–Nikodym derivative. Then the relative entropy satisfies
Moreover, if and only if (equivalently, -a.s.).
In the discrete case, for probability vectors and with ,
with equality iff for all .
Key hypotheses and conclusions
Hypotheses
- and are probability measures on the same space.
- Absolute continuity (so exists).
- Integrability of under (automatic in many finite/discrete settings).
Conclusions
- Nonnegativity: .
- Rigidity of equality: iff the two measures coincide.
Proof idea / significance
A common proof is a one-line application of Jensen's inequality to the convex function (or to after normalization). Another elementary route uses the pointwise inequality , applied to on the set where has mass.
In statistical mechanics, Gibbs’ inequality underlies “maximum entropy” and free-energy variational principles; it is a measure-theoretic form of the statement that the Gibbs distribution minimizes free energy (or equivalently maximizes entropy subject to constraints).