VWT: The Inspection Paradox as a Probability Transform
Introduction
The inspection paradox (also called length-biased sampling) is a statistical effect whereby the same probability distribution is altered depending on who measures it. Specifically, it occurs when the observer of a probability distribution is themselves an individual which the distribution tries to count, which biases their sampling of it towards larger values.
For a concrete example, consider waiting for a bus which comes, on average, once every 10 minutes. You might expect that your average waiting time will be 10/2 = 5 minutes. In fact, this will only be true if the buses arrive exactly once every 10 minutes.
Consider what happens if the interval between two bus arrivals has a 50-50 chance of being either exactly 5 minutes or exactly 15 minutes. The bus still comes on average once every 10 minutes. But since you arrive at the bus stop at a random time with respect to the bus schedule, your likelihood to arrive within the 15-minute interval is 3 times higher than your likelihood to arrive within the 5-minute interval, meaning the probabilities from your perspective are actually 25-75.
Your actual average waiting time is therefore:
\[\text{25%}\times \frac{5}{2} + \text{75%}\times \frac{15}{2} = 6.25\text{ minutes}\]To convince yourself of this discrepancy, imagine an extreme case: when two buses arrive at the exact same moment. The average time between bus arrivals goes down since we just observed a 0-second waiting time interval. But no passenger’s actual waiting time was reduced in the process.
Formal derivation
To formalize this phenomenon, we introduce the Value-Weighted Transform (VWT), a probability transform which maps a distribution supported on $\R^{+}$ from its original form to what it looks like as observed from one of the individuals counted by it.
If $f_X$ is the density of a distribution with finite expectation supported on $\R^{+}$ with respect to some measure $\lambda$, then $f_Y$, the density of the Value-Weighted Transform of this distribution, can be written as:
\[\forall x\in\R^{+}, f_Y(x) = \frac{x}{\E(X)}f_X(x)\]It is easy to prove that:
\[\begin{align*} \forall n \in \N, && \E(Y^n) &= \int_{\R} x^n \frac{x}{\E(X)} \,df_X(x) \\ &&&= \frac{\E(X^{n+1})}{\E(X)} \end{align*}\]And in particular:
\[\begin{align*} \E(Y) &= \frac{\E(X^2)}{\E(X)} \\ & = \E(X) + \frac{\text{Var}(X)}{\E(X)} \end{align*}\]This means that the expectation of $Y$ is always greater than or equal to the expectation of $X$, with equality only when $X$ is almost-surely a constant (i.e. when there is no variability in the distribution). This formalizes the intuition behind the inspection paradox: when there is variability in the distribution, the individuals counted by it are more likely to observe larger values.
We can also express the characteristic function of $Y$ in terms of the derivative of the characteristic function of $X$:
\[\begin{align*} \forall t \in \R, && \phi_Y(t) &= \E(e^{itY}) \\ &&&= \int_{\R} e^{itx} \frac{x}{\E(X)} \,df_X(x) \\ &&&= \frac{1}{i\E(X)}\int_\R\frac{d}{dt}(e^{itx}) \,df_X(x) \\ &&&= \frac{1}{i\E(X)}\frac{d}{dt}\E(e^{itX}) \\ &&&= \frac{1}{i\E(X)}\phi_X'(t) \end{align*}\]where we can swap the derivative and the expectation by the Leibniz rule since $X$ is non-negative and therefore $\abs{Xe^{itX}} \leq X$ which is integrable.
Summary table
| Property | Value |
|---|---|
| Density | $f_Y(x) = \frac{x}{\E(X)}f_X(x)$ |
| $n$-th moment | $\E(Y^n) = \frac{\E(X^{n+1})}{\E(X)}$ |
| Expectation | $\E(Y) = \E(X) + \frac{\text{Var}(X)}{\E(X)}$ |
| Characteristic function | $\phi_Y(t) = \frac{1}{i\E(X)}\phi_X’(t)$ |
VWT of specific distributions
Exponential and Gamma distributions
If $X$ has an exponential distribution with parameter $\lambda$, then:
\[\begin{align*} \forall x \in \R^{+}, && f_Y(x) &= \frac{x}{\E(X)}f_X(x) \\ &&&= \frac{x}{1/\lambda}\lambda e^{-\lambda x} \\ &&&= \lambda^2 x e^{-\lambda x} \end{align*}\]This is the density of a Gamma distribution with shape parameter 2 and rate parameter $\lambda$.
Since the exponential distribution is a special case of the Gamma distribution with shape parameter 1, this is a particular case of a more general result: the VWT of a Gamma distribution with shape parameter $\alpha$ and rate parameter $\lambda$ is a Gamma distribution with shape parameter $\alpha + 1$ and rate parameter $\lambda$.
Poisson distribution
If $X$ has a Poisson distribution with parameter $\lambda$, then:
\[\begin{align*} \forall k \in \N, && f_Y(k) &= \frac{k}{\E(X)}f_X(k) \\ &&&= \frac{k}{\lambda}\frac{\lambda^k e^{-\lambda}}{k!} \\ &&&= e^{-\lambda}\frac{\lambda^{k-1}}{(k-1)!} \\ &&&= f_{X+1}(k) \end{align*}\]Thus $Y$ has the same distribution as $X+1$ (a Poisson distribution with parameter $\lambda$ shifted by 1).
This result has an interesting “memoryless” interpretation. If for example the number of children per family were poisson-distributed, then asking a random child how many siblings they have (not including themselves) would give you the same distribution as asking a random family how many children they have. This is caused by the fact that in a poisson process, conditioning on the fact that there is at least one increment (i.e. one child) in a time interval does not change the distribution of the number of other increments (i.e. siblings) due to its memoryless property.
Uniform and Beta distributions
If $X$ has a uniform distribution on the interval $[0,1]$, then:
\[\begin{align*} \forall x \in [0,1], && f_Y(x) &= \frac{x}{\E(X)}f_X(x) \\ &&&= 2x \end{align*}\]Which is the same distribution as the maximum of two independent $\mathcal{U}([0,1])$ random variables.
It is also the density of a Beta distribution with shape parameters 2 and 1, a special case of a more general result: if $X$ has a Beta distribution with shape parameters $\alpha$ and $\beta$, then:
\[\begin{align*} \forall x \in [0,1], && f_Y(x) &= \frac{x}{\E(X)}f_X(x) \\ &&&= \frac{x}{\alpha/(\alpha + \beta)}\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha - 1}(1-x)^{\beta - 1} \\ &&&= \frac{\Gamma(\alpha + \beta + 1)}{\Gamma(\alpha + 1)\Gamma(\beta)}x^{\alpha}(1-x)^{\beta - 1} \end{align*}\]Which is the density of a Beta distribution with shape parameters $\alpha + 1$ and $\beta$.
Summary table
| Distribution | VWT |
|---|---|
| Gamma($\alpha$, $\lambda$) | Gamma($\alpha + 1$, $\lambda$) |
| $\rightarrow$ Exponential($\lambda$) | Gamma(2, $\lambda$) |
| Poisson($\lambda$) | Poisson($\lambda$) + 1 |
| Beta($\alpha$, $\beta$) | Beta($\alpha + 1$, $\beta$) |
| $\rightarrow$ Uniform([0,1]) | Beta(2, 1) |