Entropy and multinomial coefficients

Approximations and bounds relating multinomial coefficients to Shannon entropy.
Entropy and multinomial coefficients

Multinomial coefficients have exponential growth rates governed by entropy. This connection underlies many results in information theory and large deviations, including the .

Multinomial coefficient and empirical proportions

Let n1,,nkn_1,\dots,n_k be nonnegative integers with i=1kni=n\sum_{i=1}^k n_i = n. The multinomial coefficient is

(nn1,,nk)  =  n!i=1kni!, {n \choose n_1,\dots,n_k} \;=\; \frac{n!}{\prod_{i=1}^k n_i!},

where n!=n(n1)1n! = n(n-1)\cdots 1 is the factorial.

Define proportions pi=ni/np_i = n_i/n (so ipi=1\sum_i p_i = 1). The (natural-log) Shannon entropy is

H(p)=i=1kpilogpi. H(p) = -\sum_{i=1}^k p_i \log p_i.

(Using base-2 logarithms multiplies all entropy expressions by log2\log 2.)

Main asymptotic principle

Using , one obtains

log(nn1,,nk)=nH(p)+O(logn)(n), \log {n \choose n_1,\dots,n_k} = n H(p) + O(\log n) \quad (n\to\infty),

as long as the number of categories kk is fixed (and typically assuming each pip_i is bounded away from 00 to avoid singular prefactors).

A more detailed approximation (when all pi>0p_i>0) is

(nn1,,nk)enH(p)(2πn)(k1)/2p1p2pk, {n \choose n_1,\dots,n_k} \approx \frac{e^{nH(p)}}{(2\pi n)^{(k-1)/2}\sqrt{p_1p_2\cdots p_k}},

where “\approx” means the ratio tends to 11 under standard regularity conditions.

Simple entropy bounds (polynomial accuracy)

There are widely used bounds that capture the correct exponential rate without tracking constants:

(n+1)kenH(p)    (nn1,,nk)    enH(p). (n+1)^{-k}\,e^{nH(p)} \;\le\; {n \choose n_1,\dots,n_k} \;\le\; e^{nH(p)}.

These bounds are especially convenient when only exponential rates matter (e.g., in large deviation estimates).

Proof idea for the upper bound

Apply the multinomial theorem with weights pip_i:

1=(i=1kpi)n=ni=n(nn1,,nk)i=1kpini. 1 = \Bigl(\sum_{i=1}^k p_i\Bigr)^n = \sum_{\sum n_i=n} {n \choose n_1,\dots,n_k}\prod_{i=1}^k p_i^{n_i}.

Every term is nonnegative, so for the particular (n1,,nk)(n_1,\dots,n_k),

(nn1,,nk)i=1kpini1, {n \choose n_1,\dots,n_k}\prod_{i=1}^k p_i^{n_i} \le 1,

hence

(nn1,,nk)i=1kpini=enH(p). {n \choose n_1,\dots,n_k} \le \prod_{i=1}^k p_i^{-n_i} = e^{nH(p)}.

Why this matters

  • The exponential size of a “type class” (all sequences with the same empirical proportions) is controlled by enH(p)e^{nH(p)}, up to polynomial factors; this is the core counting step in the .
  • Many probability bounds reduce to comparing enH(p)e^{nH(p)} with likelihood terms, producing Kullback–Leibler divergence rates.