WTT: Bus waiting times as a probability transform

This article builds on a previous article on the Value-Weighted Transform.

Introduction

While the previous article simply described the Value-Weighted Transform and its properties, this one goes further by showing how combining it with another assumption yields a new transform with more real-life applications.

Let’s start with the bus waiting times example again. In the previous article on the VWT, I started with an example of bus waiting times but did not dive deeply into it. I simply stated that the expected waiting time of a passenger arriving during an interval of length $T$ is $\frac{T}{2}$.

The more formal assumption behind this statement is that, conditionally on the inter-bus interval a passenger arrives in, the distribution of their arrival time within that interval is uniform. This is simply a hypothesis we make, which gets more and more reasonable as the inter-bus intervals get shorter with respect to the time scale of the passenger’s arrival process. An example of where this assumption does not hold is if we consider the overnight interval between the last bus of the day and the first bus of the next day, so this model is best applied to a short period during a somewhat stable arrival regime.

Where this uniformity assumption gets interesting is when we combine it with the fact that the distribution of interval lengths as seen by the passenger is given by the Value-Weighted Transform of the original distribution of interval lengths between buses (see the previous article for a detailed explanation of this fact).

Combining these two facts, we can define the Waiting Time Transform (WTT) as a probability transform which maps a distribution supported on $\R^{+}$ (representing the distribution of time between bus arrivals) to the distribution of waiting times of a passenger.

Symmetrical case

Before we give the formal derivation, I would like to highlight an interesting fact about this model: you can formulate it equivalently in a reversed context.

Imagine you go to a hotel and ask each guest how many days they have been staying at the hotel. Here, instead of measuring the time left to wait, you measure the time already waited. But since your arrival time is independent of the guests’ stays, we can still assume that you arrive uniformly at random within a given guest’s stay. The distribution of hotel stays as observed by you is still biased towards longer stays according to the VWT, thus the distribution of time already waited is given by the WTT of the distribution of lengths of stays.

Thus, since the WTT is reversible (as we will see later), asking random guests how long they have been staying (even if they do not know when they will leave) can be used to recover the distribution of lengths of stays at the hotel as long as the uniformity assumption holds.

Derivation

Using slightly informal notation, given a probability measure $\mu$, we define its WTT as a compound distribution:

\[WTT(\mu) = \mathcal{U}\left(\left[0, VWT(\mu)\right]\right)\]

So the WTT of a distribution is always a continuous distribution supported on $\R^{+}$.

We denote by $X$ a random variable representing the time between bus arrivals, by $Y$ a random variable representing the interval length as observed by a passenger (which follows the VWT of $X$’s distribution), and by $Z$ a random variable representing the waiting time of a passenger (which follows the WTT of $X$’s distribution). We assume that $X$ has a density $f_X$ with respect to some measure $\mu$ (therefore so does $Y$) and that $0 < \E(X) < +\infty$. Then we can write $Z$’s density with respect to the Lebesgue measure as:

\[\begin{align*} \forall z\in\R^{+}, && f_{Z}(z) &= \int_{\R^{+}} f_{Z|Y}(z) f_Y(y) \,d\mu(y) \\ &&&= \int_{\R^{+}} \frac{1}{y} \mathbf{1}_{z\in[0,y]} f_Y(y) \,d\mu(y) \\ &&&= \int_{z}^{+\infty} \frac{1}{y} f_Y(y) \,d\mu(y) \\ &&&= \int_{z}^{+\infty} \frac{1}{y} \frac{y}{\E(X)} f_X(y) \,d\mu(y)\quad [1] \\ &&&= \frac{1}{\E(X)} \int_{z}^{+\infty} f_X(y) \,d\mu(y) \\ &&&= \frac{S_X(z)}{\E(X)} \end{align*}\]

[1] This is where the magic happens: the 1/y term arising from the uniformity assumption cancels out the y term from the VWT, leaving a simple expression in terms of the survival function.

Where $S_X(z) := P(X > z)$ is the survival function of $X$.

This density is normalised due to the classic result that the expectation of a non-negative random variable can be written as the integral of its survival function over $\R^{+}$.

From this formula, we can derive the moments of $Z$ as a function of the moments of $X$:

\[\begin{align*} \forall n \in \N, && \E(Z^n) &= \int_{\R^{+}} z^n \frac{S_X(z)}{\E(X)} \,d\lambda(z) \\ &&&= \frac{1}{\E(X)} \int_{\R^{+}} z^n P(X > z) \,d\lambda(z) \\ &&&= \frac{1}{\E(X)} \int_{\R^{+}} z^n \E_X[\mathbf{1}_{X > z}] \,d\lambda(z) \\ &&&= \frac{1}{\E(X)} \E_X\left(\int_{0}^{X} z^n \,d\lambda(z)\right)\quad [1]\\ &&&= \frac{1}{\E(X)} \E\left(\frac{X^{n+1}}{n+1}\right) \\ &&&= \frac{\E(X^{n+1})}{(n+1)\E(X)} \end{align*}\]

[1] Here we use Tonelli’s theorem applied to the non-negative measurable function $(z, \omega) \mapsto z^n \mathbf{1}_{X(\omega) > z}$ to swap the expectation and the integral. Crucially, it doesn’t require the expectation to be finite, meaning $\E(Z^n)$ is finite iff. $\E(X^{n+1})$ is finite.

In particular:

\[\begin{align*} \E(Z) &= \frac{\E(X^2)}{2\E(X)} \\ &= \frac{1}{2}\left(\E(X) + \frac{\text{Var}(X)}{\E(X)}\right) \\ &= \frac{\E(Y)}{2} \end{align*}\]

Which we implicitly used at the beginning of the VWT article.

We can also express the characteristic function of $Z$ in terms of the characteristic function of $X$:

\[\begin{align*} \forall t \in \R, && \phi_Z(t) &= \E(e^{itZ}) \\ &&&= \int_{\R^{+}} e^{itz} \frac{S_X(z)}{\E(X)} \,d\lambda(z) \\ &&&= \frac{1}{\E(X)} \int_{\R^{+}} e^{itz} P(X > z) \,d\lambda(z) \\ &&&= \frac{1}{\E(X)} \int_{\R^{+}} e^{itz} \E_X[\mathbf{1}_{X > z}] \,d\lambda(z) \\ &&&= \frac{1}{\E(X)} \E_X\left(\int_{0}^{X} e^{itz} \,d\lambda(z)\right)\quad [1] \\ &&&= \frac{1}{\E(X)} \E\left(\frac{e^{itX} - 1}{it}\right) \\ &&&= \frac{\phi_X(t) - 1}{it\E(X)} \end{align*}\]

[1] Here we use Fubini’s theorem applied to the integrable function $(z, \omega) \mapsto e^{itz} \mathbf{1}_{X(\omega) > z}$ to swap the expectation and the integral. The integrability condition is satisfied since $\abs{e^{itz} \mathbf{1}_{X(\omega) > z}} \leq \mathbf{1}_{X(\omega) > z}$ and $\E_X\left(\int_{0}^{X} \mathbf{1}_{X > z} \,d\lambda(z)\right) = \E(X) < +\infty$.

Summary table

Property	Value
Density	$f_Z(z) = \frac{S_X(z)}{\E(X)}$
$n$-th moment	$\E(Z^n) = \frac{\E(X^{n+1})}{(n+1)\E(X)}$
Expectation	$\E(Z) = \frac{\E(X^2)}{2\E(X)}$
Characteristic function	$\phi_Z(t) = \frac{\phi_X(t) - 1}{it\E(X)}$

WTT of specific distributions

Dirac distribution

If $X$ is almost-surely equal to some constant $c > 0$, then the density of $Z$ is given by:

\[\begin{align*} \forall z \in \R^{+}, && f_Z(z) &= \frac{S_X(z)}{\E(X)} \\ &&&= \frac{\mathbf{1}_{z < c}}{c} \end{align*}\]

Which is the density of a uniform distribution on $[0, c]$.

This case corresponds to a bus arriving perfectly on schedule every $c$ units of time.

Pareto distribution

If $X$ has a Pareto distribution with shape parameter $\alpha > 1$ and scale parameter $x_m > 0$, then the density of $Z$ is given by:

\[\begin{align*} \forall z \in \R^{+}, && f_Z(z) &= \frac{S_X(z)}{\E(X)} \\ &&&= \frac{1}{\E(X)}\left(\mathbf{1}_{z < x_m} + \mathbf{1}_{z \geq x_m} \left(\frac{x_m}{z}\right)^{\alpha}\right) \\ &&&= \frac{\alpha - 1}{\alpha x_m}\left(\mathbf{1}_{z < x_m} + \mathbf{1}_{z \geq x_m} \left(\frac{x_m}{z}\right)^{\alpha}\right) \\ &&&= \frac{\alpha - 1}{\alpha}\mathbf{1}_{z < x_m} \frac{1}{x_m} + \mathbf{1}_{z \geq x_m} \frac{\alpha - 1}{\alpha x_m} \left(\frac{x_m}{z}\right)^{\alpha} \\ &&&=\left(1 - \frac{1}{\alpha}\right)\mathbf{1}_{z < x_m} \frac{1}{x_m} + \frac{1}{\alpha}\mathbf{1}_{z \geq x_m} \frac{(\alpha - 1)x_m^{\alpha - 1}}{z^\alpha} \\ \end{align*}\]

Which is the density of a mixture of a uniform distribution on $[0, x_m]$ with weight $1 - \frac{1}{\alpha}$ and a Pareto distribution with shape parameter $\alpha - 1$ and scale parameter $x_m$ with weight $\frac{1}{\alpha}$.

Exponential distribution

If $X$ has an exponential distribution with rate parameter $\lambda > 0$, then the density of $Z$ is given by:

\[\begin{align*} \forall z \in \R^{+}, && f_Z(z) &= \frac{S_X(z)}{\E(X)} \\ &&&= \lambda e^{-\lambda z} \end{align*}\]

Which is the density of an exponential distribution with rate parameter $\lambda$. So the exponential distribution is a fixed point of the WTT.

Uniform distribution

If $X$ has a uniform distribution on $[0, c]$ for some $c > 0$, then the density of $Z$ is given by:

\[\begin{align*} \forall z \in [0, c], && f_Z(z) &= \frac{S_X(z)}{\E(X)} \\ &&&= \frac{1 - \frac{z}{c}}{\frac{c}{2}} \\ &&&= \frac{2}{c} \left(1 - \frac{z}{c}\right) \end{align*}\]

Which is the density of a Beta distribution with shape parameters 1 and 2 stretched to [0, $c$]. It is also the density of the minimum of two independent uniform random variables on [0, $c$].

Summary table

Distribution of $X$	Distribution of $Z$
Dirac($c$)	Uniform([0, $c$])
Pareto($\alpha$, $x_m$)	$\left(1 - \frac{1}{\alpha}\right)$ Uniform([0, $x_m$]) + $\frac{1}{\alpha}$ Pareto($\alpha - 1$, $x_m$)
Exponential($\lambda$)	Exponential($\lambda$)
Uniform([0, $c$])	Beta(1, 2) on [0, $c$]

Reversing the transform

The fact that $f_Z(z) = \frac{S_X(z)}{\E(X)}$ can be used to perform the WTT in reverse: if we assume that $X > 0$ almost-surely, then we can recover $X$’s distribution from $Z$’s distribution by first estimating the density of $Z$ then using the formula to recover $X$’s survival function and therefore its distribution.

One useful property that allows us to check our assumptions is that the density of $Z$ must be non-increasing, since the survival function of $X$ is non-increasing. We could use this to add a constraint to our density estimation procedure (in which case a poor quality of fit would indicate that our assumptions are not satisfied) or to check the non-increasing property after performing an unconstrained density estimation.

In the first case, there exists a well-known estimator of a non-increasing density called the Grenander estimator (Grenander, 1956), which is the non-parametric maximum likelihood estimator of a non-increasing density. It is defined as the left derivative of the least concave majorant of the empirical distribution function of the data.

One advantage of this estimator is that it provides a reasonable estimate of $f_Z(0)$ which we need to recover the constant $\E(X)$.

Introduction

Symmetrical case

Derivation

Summary table

WTT of specific distributions

Dirac distribution

Pareto distribution

Exponential distribution

Uniform distribution

Summary table

Reversing the transform

References