Jensen's inequality (lemma)

For a convex function φ, one has φ(E[X]) ≤ E[φ(X)] whenever the expectations exist.
Jensen’s inequality (lemma)

Definitions and notation

Statement

Let (Ω,F,P)(\Omega,\mathcal F,\mathbb P) be a probability space and let X:ΩIX:\Omega\to I be a random variable taking values in a convex set IRdI\subseteq \mathbb R^d. Let φ:IR\varphi:I\to\mathbb R be convex.

Assume XX is integrable and φ(X)\varphi(X) is integrable, i.e.

  • E[X]<\mathbb E[\|X\|] < \infty and
  • E[φ(X)]<\mathbb E[|\varphi(X)|] < \infty.

Then

φ(E[X])E[φ(X)]. \varphi(\mathbb E[X]) \le \mathbb E[\varphi(X)].

If φ\varphi is concave, the inequality is reversed:

φ(E[X])E[φ(X)]. \varphi(\mathbb E[X]) \ge \mathbb E[\varphi(X)].

Key hypotheses and conclusions

Hypotheses

  • XX is a random variable with values in a convex set II.
  • φ\varphi is convex on II.
  • The expectations E[X]\mathbb E[X] and E[φ(X)]\mathbb E[\varphi(X)] exist (finite).

Conclusions

  • Convexity pulls outside expectations: φ(E[X])E[φ(X)]\varphi(\mathbb E[X]) \le \mathbb E[\varphi(X)].
  • If φ\varphi is strictly convex (and mild regularity/“non-degenerate support” assumptions hold), equality forces XX to be almost surely constant.

Proof idea / significance

A standard proof uses supporting hyperplanes: convexity implies that at the point E[X]\mathbb E[X] there exists a subgradient gg such that φ(y)φ(E[X])+g(yE[X])\varphi(y) \ge \varphi(\mathbb E[X]) + g\cdot (y-\mathbb E[X]) for all yIy\in I. Substitute y=X(ω)y=X(\omega) and take expectations; the linear term vanishes because E[XE[X]]=0\mathbb E[X-\mathbb E[X]]=0.

In statistical mechanics, Jensen’s inequality is a basic engine behind entropy/variational bounds, including and exponential-moment bounds such as .