Total variation distance

A distance between two probability distributions defined by the largest possible difference they assign to the same event.
Total variation distance

A total variation distance is the quantity

dTV(P,Q)  =  supAFP(A)Q(A) d_{\mathrm{TV}}(P,Q)\;=\;\sup_{A\in\mathcal F}\,\bigl|P(A)-Q(A)\bigr|

for two P,QP,Q on the same (Ω,F)(\Omega,\mathcal F), where the supremum ranges over all AA (events). It measures the worst-case discrepancy in between PP and QQ.

If PP and QQ are both absolutely continuous with respect to a common measure μ\mu and have densities p=dPdμp=\frac{dP}{d\mu} and q=dQdμq=\frac{dQ}{d\mu} (via the Radon–Nikodym derivative from ), then

dTV(P,Q)=12Ωpqdμ. d_{\mathrm{TV}}(P,Q)=\frac12\int_\Omega |p-q|\,d\mu.

Total variation is a strong notion of closeness of , and it can be controlled by through .

Examples:

  • If P=Bernoulli(p)P=\mathrm{Bernoulli}(p) and Q=Bernoulli(q)Q=\mathrm{Bernoulli}(q) on {0,1}\{0,1\}, then dTV(P,Q)=pqd_{\mathrm{TV}}(P,Q)=|p-q|.
  • If PP and QQ have probability mass functions (pi)(p_i) and (qi)(q_i) on a finite set, then dTV(P,Q)=12ipiqid_{\mathrm{TV}}(P,Q)=\frac12\sum_i |p_i-q_i|.