Maximum entropy with constraints (Gibbs/exponential family)

Maximizing Shannon entropy under expectation constraints yields a Gibbs (exponential-family) distribution, with Lagrange multipliers fixed by the constraints.
Maximum entropy with constraints (Gibbs/exponential family)

Statement (maximum entropy principle)

Let (X,F,μ)(X,\mathcal F,\mu) be a measurable space with a reference measure μ\mu, and let f1,,fk:XRf_1,\dots,f_k:X\to\mathbb R be measurable “constraint functions.” Fix target values c1,,ckRc_1,\dots,c_k\in\mathbb R and consider probability measures PP that are absolutely continuous w.r.t. μ\mu, with density p=dPdμp=\frac{dP}{d\mu} satisfying

  • normalization: pdμ=1\int p\,d\mu = 1,
  • moment constraints: fipdμ=ci\int f_i\,p\,d\mu = c_i for i=1,,ki=1,\dots,k.

Assume there exists at least one feasible pp with finite Shannon entropy S(p)=plogpdμS(p)=-\int p\log p\,d\mu (see ).

If the entropy maximization problem has an interior maximizer (e.g. under standard regularity/feasibility conditions), then any maximizer has the Gibbs/exponential-family form

p(x)=exp ⁣(αi=1kλifi(x)), p^*(x)=\exp\!\Big(-\alpha-\sum_{i=1}^k \lambda_i f_i(x)\Big),

where αR\alpha\in\mathbb R and multipliers λ1,,λkR\lambda_1,\dots,\lambda_k\in\mathbb R are chosen so that the constraints hold. Equivalently,

p(x)=exp ⁣(i=1kλifi(x))Z(λ),Z(λ)=exp ⁣(i=1kλifi(x))dμ. p^*(x)=\frac{\exp\!\big(-\sum_{i=1}^k \lambda_i f_i(x)\big)}{Z(\lambda)},\qquad Z(\lambda)=\int \exp\!\Big(-\sum_{i=1}^k \lambda_i f_i(x)\Big)\,d\mu.

Key hypotheses

  • A feasible set exists: there is at least one density p0p\ge 0 with pdμ=1\int p\,d\mu=1 and fipdμ=ci\int f_i p\,d\mu=c_i.
  • Finite entropy is attainable: S(p)>S(p)>-\infty for some feasible pp.
  • Regularity ensuring the optimizer is not on the boundary (so Lagrange multiplier calculus applies) and that Z(λ)Z(\lambda) is finite at the solution.

Conclusions

  • Form of the maximizer: the maximum-entropy density is an exponential tilt of the reference μ\mu.
  • Uniqueness (typical): since S(p)S(p) is strictly concave on densities, the maximizer is unique when the feasible set is convex and contains an interior point.
  • Thermodynamic interpretation: with a single constraint f1=Hf_1=H (energy), the maximizer is the with partition function ; the multiplier is the inverse temperature (see ).

Proof idea / significance

Use Lagrange multipliers for the constrained optimization of the strictly concave functional S(p)S(p) over an affine slice of densities. Stationarity of

S(p)α(pdμ1)iλi(fipdμci) S(p) - \alpha\Big(\int p\,d\mu-1\Big)-\sum_i \lambda_i\Big(\int f_i p\,d\mu-c_i\Big)

forces logp(x)\log p(x) to be an affine function of the constraints, giving the exponential form.

Equivalently, maximizing S(p)S(p) subject to constraints is the same as minimizing KL divergence to the reference measure μ\mu subject to the same constraints (a projection principle), linking equilibrium ensembles to variational principles.