Method of types

Counting and probability estimates for sequences grouped by their empirical distribution.
Method of types

The method of types is a set of counting and probability tools for sequences over a finite alphabet. It is foundational in information theory and large deviations: it turns questions about probabilities of empirical frequencies into entropy and divergence calculations.

It relies on the entropy–counting relationship in .

Types and type classes

Let A\mathcal{A} be a finite alphabet with A=m|\mathcal{A}|=m, and let xn=(x1,,xn)Anx^n=(x_1,\dots,x_n)\in\mathcal{A}^n.

  • The type (empirical distribution) of xnx^n is the probability mass function PxnP_{x^n} on A\mathcal{A} defined by

    Pxn(a)=1n#{i:xi=a}. P_{x^n}(a)=\frac{1}{n}\,\#\{i: x_i=a\}.
  • For a type PP with denominator nn (i.e., all nP(a)nP(a) are integers), the type class is

    T(P)={xnAn:Pxn=P}. T(P)=\{x^n\in\mathcal{A}^n: P_{x^n}=P\}.

Counting: size of a type class

Write na=nP(a)n_a = nP(a) for each aAa\in\mathcal{A}. Then

T(P)=n!aAna!, |T(P)|=\frac{n!}{\prod_{a\in\mathcal{A}} n_a!},

a multinomial coefficient.

Let

H(P)=aAP(a)logP(a) H(P)=-\sum_{a\in\mathcal{A}} P(a)\log P(a)

(natural-log entropy). Then the type class size satisfies the standard bounds

(n+1)menH(P)    T(P)    enH(P). (n+1)^{-m}\,e^{nH(P)} \;\le\; |T(P)| \;\le\; e^{nH(P)}.

In particular, logT(P)=nH(P)+O(logn)\log |T(P)| = nH(P) + O(\log n).

Counting: number of possible types

The number of distinct types on A\mathcal{A} with denominator nn is at most

(n+1)m, (n+1)^m,

because each P(a)P(a) must be in {0,1/n,2/n,,1}\{0,1/n,2/n,\dots,1\}.

Probability of a type class under an i.i.d. law

Let QQ be a distribution on A\mathcal{A}, and let XnQnX^n\sim Q^{\otimes n} be i.i.d. draws.

For any fixed type PP with denominator nn, every sequence xnT(P)x^n\in T(P) has the same probability:

Qn(xn)=aAQ(a)nP(a). Q^{\otimes n}(x^n)=\prod_{a\in\mathcal{A}} Q(a)^{nP(a)}.

Therefore,

Qn(T(P))=T(P)aAQ(a)nP(a). Q^{\otimes n}(T(P)) = |T(P)| \prod_{a\in\mathcal{A}} Q(a)^{nP(a)}.

Introduce the (natural-log) Kullback–Leibler divergence

D(PQ)=aAP(a)logP(a)Q(a). D(P\|Q)=\sum_{a\in\mathcal{A}} P(a)\log\frac{P(a)}{Q(a)}.

Using T(P)enH(P)|T(P)|\approx e^{nH(P)} gives the characteristic exponential rate

Qn(T(P))enD(PQ) Q^{\otimes n}(T(P)) \approx e^{-nD(P\|Q)}

(up to polynomial factors). A standard bound pair is

(n+1)menD(PQ)    Qn(T(P))    enD(PQ). (n+1)^{-m}\,e^{-nD(P\|Q)} \;\le\; Q^{\otimes n}(T(P)) \;\le\; e^{-nD(P\|Q)}.

Why this is useful

  • Converts empirical-frequency events into entropy/divergence exponents.
  • Provides finite-nn (non-asymptotic) bounds with only polynomial slack.
  • Serves as a discrete, combinatorial route to large deviation principles (e.g., the intuition behind Sanov-type results).