A uniform sum threshold problem

Let \(U_1, U_2, \dots\) be an infinite sequence of independent \(\mathrm{Uniform}[0, 1]\) random variables. Let \(N\) be the minimum index for which \[ U_1 + U_2 + \dots + U_N > 1 . \] What is the expected value \(\mathbf{E}\{N\}\)?

This problem was originally posed to me by some students of a course I was TA-ing, as part of a homework problem in another course they were taking. It didn’t take long for me to find the solution, but I was unsatisfied with the cleanliness of the proof. At the time, my ex-labmate Xing Shi Cai was visiting my lab, and I relayed the problem to him. He cleaned up the messy bits and presented to me the slickest possible proof.

Firstly, by what is sometimes called the tail formula for expectations, we have that \[ \begin{aligned} \mathbf{E}\{N\} &= \sum_{i \ge 0} \mathbf{P}\{N \ge i\} \\ &= \sum_{i \ge 0} \mathbf{P}\{U_1 + \dots + U_i \le 1\} . \end{aligned} \] Rearranging, \[ \begin{aligned} \mathbf{P}\{U_1 + \dots + U_i \le 1\} &= \mathbf{P}\{U_1 + \dots + U_{i - 1} \le 1 - U_i\} \\ &= \mathbf{P}\{U_1 + \dots + U_{i - 1} \le U_i’\} , \end{aligned} \] where \(U_i’ \sim \mathrm{Uniform}[0, 1]\), and \(U_i’\) is independent of \(U_1, \dots, U_{i - 1}\).

Consider the event \(\mathcal{E}_i\) that \[ U_i’ = \max\{U_1, \dots, U_{i - 1}, U_i’\} . \]

Since this is just a collection of independent uniform random variables, then \(U_i’\) is maximum among them with probability exactly \(\mathbf{P}\{\mathcal{E}_i\} = 1/i\). Moreover, conditionally upon \(\mathcal{E}_i\), then \(U_1/U_i’, \dots, U_{i - 1}/U_i’\) normalizes the variables to lie in the interval \([0, 1]\), and in fact now each of these is independent and distributed as \(\mathrm{Uniform}[0, 1]\). On the other hand, conditionally upon \(\overline{\mathcal{E}_i}\), one of the other variables is larger than \(U_i’\), and in particular, \[ U_1 + \dots + U_{i - 1} > U_i’ . \] Therefore, \[ \begin{aligned} \mathbf{P}\{U_1 + \dots + U_{i - 1} \le U_i’\} &= \mathbf{P}\{U_1 + \dots + U_{i - 1} \le U_i’ \mid \mathcal{E}_i\} \mathbf{P}\{\mathcal{E}_i\} \\ &= \frac{1}{i} \mathbf{P}\left\{\frac{U_1}{U_i’} + \dots + \frac{U_{i - 1}}{U_i’} \le 1 ,\Big|, \mathcal{E}_i\right\} \\ &= \frac{1}{i} \mathbf{P}\{U_1 + \dots + U_{i - 1} \le 1\} , \end{aligned} \] and upon iterating, we get the following very satisfying equation, \[ \mathbf{P}\{U_1 + \dots + U_i \le 1\} = \frac{1}{i!} . \] Replacing this into the formula for the expectation, we get the magical result \[ \mathbf{E}\{N\} = \sum_{i \ge 0} \frac{1}{i!} = e . \]

Incidentally, we’ve derived the entire distribution above, not just the expectation, \[ \mathbf{P}\{N \ge i\} = \frac{1}{i!} , \qquad \mathbf{P}\{N = i\} = \frac{1}{(i - 1)! (i + 1)} . \]

Consider also the value of the first uniform sum exceeding \(1\), \[ S_N = U_1 + \dots + U_N . \] Deterministically, this quantity must be between \(1\) and \(2\). Intuitively, it seems likely that it will be at most \(3/2\), since uniforms have expected value \(1/2\), and since we’re adding one last uniform before exceeding the threshold \(1\).

Since \(N\) is a stopping time, by Wald’s lemma, we can easily nail the expectation of \(S_N\), \[ \mathbf{E}\{S_N\} = \mathbf{E}\{N\} \mathbf{E}\{U\} = e/2 \approx 1.359, \] which is indeed less than \(3/2\).