Franklin Pezzuti Dyer

Home     Posts     CV     Contact     People     RSS

Some collected cool stats about the Wiener process

In this post I'd like to show off some fun calculations quantifying interesting properties of the Wiener process, which is probably the simplest continuous-time random walk.

I'm not going to talk about the low-level definition of a Wiener process and I'll take for granted a lot of its basic properties. These include the four properties from its Wikipedia page that uniquely characterize a Wiener process:

  1. $W_0 = 0$ almost surely
  2. $W$ has independent increments, so that $W_{t+\Delta t} - W_t$ is independent of $W_s$ for $s\in [0, t]$
  3. $W$ has Gaussian increments, with $W_{t+\Delta t}\sim \mathcal{N}(0,\Delta t)$
  4. $W_t$ is almost surely a continuous function of $t$

Another property of the Wiener process that I won't prove is the formula for its quadratic variation.

Here are the highlights:

Interpolating between points on the Wiener process

A natural question: given the values of $W_s$ and $W_t$ for some $t > s > 0$, what distribution of values should we expect at the intermediate points in the interval $(s,t)$?

We can parametrize the most general version of this situation as follows. Let the values of the Wiener process at time $t$ and at time $t + \Delta t_1 + \Delta t_2$ be given, and let time $t + \Delta t_1$ be the time of interest. Rather than talking about the value of the Wiener process in absolute terms, let's consider the differences between its values at these points, letting $\Delta w_1$ and $\Delta w_2$ be the respective changes in the Wiener process over $[t,t+\Delta t_1]$ and $[t+\Delta t_1,t+\Delta t_1 + \Delta t_2]$. This means $\Delta w = \Delta w_1 + \Delta w_2$ is the total change over the interval $[t,t+\Delta t_1 + \Delta t_2]$.

We know that $\Delta w_1 \sim \mathcal{N}(0,\Delta t_1)$ and $\Delta w_2 \sim \mathcal{N}(0, \Delta t_2)$. If the values of $W_t$ and $W_{t+\Delta t_1 + \Delta t_2}$ are given, then their difference is $\Delta w_1 + \Delta w_2$. The value $W_{t+\Delta t_1}$ at the time of interest is simply $W_t + \Delta w_1$. So we can reframe this problem as finding the conditional distribution of $\Delta w_1$ given $\Delta w_1 + \Delta w_2$ i.e. $\Delta w$.

The joint probability density function of $\Delta w_1$ and $\Delta w_2$ just the product of their respective density functions, since they're independent:

Performing a coordinate transformation, we can obtain the joint density function for $\Delta w_1$ and $\Delta w$. Note that it is not separable, since these two variables aren't independent:

This allows us to calculate the conditional density of $\Delta w_1$ as follows:

The reason I've written the result in this cumbersome form with all of the messy fractions is to make it clear that this is also a Gaussian distribution in $\Delta w_1$. From how I've written it, you can read off the mean of $\Delta w_1$ as $\Delta t_1 \Delta w/\Delta t$ and its variance as the harmonic sum (see harmonic mean) of $\Delta_1$ and $\Delta_2$.

These mean and variance values make intuitive sense, by the way! The mean of $(\Delta t_1/\Delta t) \Delta w$ tells us that the average value of the walk approaches its value at the rightmost endpoint of the interval in direct proportion to how far through the interval it is. For example, halfway between the left and right endpoints, its expectation is just the average of the values at the left and right endpoints. Note also that close to the endpoints, the variance is very small (as we would expect, since the walk has "less time" to deviate from the given fixed values), and in the middle of the interval it is half of the interval's length.

This calculation is very useful if you ever want to simulate a Wiener process in code. The most naive way of implementing a Wiener process would be to generate its values on a fine mesh by repeatedly adding up normally distributed random variables with small variances. But this approach has a big drawback, namely that you have to specify the mesh size in advance. Using this conditional distribution, you can implement a "lazy" Wiener process that only evaluates values on-demand, interpolating between previously-generated values for consistency.

Time for the walk to reach a threshhold

Another natural question about the Wiener process is "how long before $W_t$ gets bigger than $w$?" This question is about the "first hitting times" for a Wiener process.

There's a really elegant and ingenious way of deriving the distribution of the random variable $T_w$ describing the time at which $W_t$ first reaches the value $w$. The key insight is called the reflection principle. By the symmetry of a Wiener process, at any time after a specific time $s$, there is exactly a $50\%$ chance that the random walk will be above $W_s$, and a $50\%$ chance that it will be below $W_s$. That is, for any $0 < s < t$, we have that This has a useful consequence: if the random walk has attained the value $w$ at any time the interval $[0,t)$, then there is a $50\%$ chance that $W_t > w$ and an equal chance that $W_t < w$. Note also that because of the (almost certain) continuity of $W_t$, by the Intermediate Value Theorem, $W_t$ cannot exceed $w$ without first equalling $w$ at some point. Thus Implying that the probability of the random walk reaching $w$ somewhere during the interval $(0,t)$ is given by

Now let us return to our original question, which is not about the likelihood of reaching $w$ within a certain interval, but rather the distribution of the hitting time for the value $w$. These questions are closely related: the probability that the hitting time $T_w$ lies in the interval $(t, t+\epsilon)$ is precisely the probability that $w$ is achieved within $(0,t+\epsilon)$ but not within $(0,t)$. That is: Hence, the probability density function $f_w(t)$ for $T_w$ is given by This kind of distribution is called the Lévy distribution. It's interesting to note that this is a "heavy-tailed" distribution, and its mean diverges, so it doesn't make sense to talk about the "average hitting time" for a value $w$.

Omitting zeroes on an interval

Using what we've calculated in the previous section, we're prepared to answer an interesting question about the zeroes of a Wiener process $W_t$: what is the probability that it will have no zeroes on the interval $[a,b]$?

Given the value $W_a = w$, this question is easy to answer. By symmetry and independent increments, the probability of the random walk failing to cross the x-axis within $(b-a)$ time units starting from $w$ is equal to the probability of a Wiener random walk failing to surpass $w$ within $(b-a)$ time units when starting from a value of $0$. And we have already seen that this probability is equal to But in the question we're asking we are not given the value of $W_a$. We'll have to integrate this expression over all possible values of $W_a$, weighted by their respective probabilities: To evaluate this, let's use the technique of Feynman Integration. If we differentiate under the integral sign with respect to the parameter $b$, the integrand takes on a more familiar form: The integrand here is antidifferentiable, so the integral can be readily evaluated as Now, integrating with respect to $b$ and choosing a constant of integration such that the probability is one when $a=b$ (since the probability of $W_t$ having zeroes on $[a,a]$ vanishes), we obtain the final result: There are a couple of "cute" things to notice about this. One of them is that this probability only depends on the ratio $b/a$, a fact which can actually be derived without figuring out an exact formula for the probability using scaling properties of Wiener processes. Another cute fact is that on any interval $[a,2a]$ there is exactly a $50\%$ chance of zeroes occurring!

First and last zeroes

There's a collection of interesting arcsine laws relating to the behaviors of the zeros and maxima of a Wiener process. They say that:

  1. The measure of the set ${t\in [0,b] ~ | ~ W_t > 0}$ is arcsine-distributed
  2. The last zero of $W_t$ on $[0,b]$ follows an arcsine distribution
  3. The value of $t\in [0,b]$ for which $W_t$ is greatest follows an arcsine distribution

The second law can be derived readily from our previous computation of the probability that $W_t$ is zero-free on $[a,b]$. Let's call this probability $p(a,b)$ for now. The probability that the last zero of $W_t$ on $[0,b]$ lies in the small interval $[a,a+\epsilon)$ is equal to $p(a+\epsilon,b) - p(a,b)$, that is, the probability that $W_t$ is zero-free on $[a+\epsilon, b]$ but not on $[a,b]$. Thus, the density function of the last zero of $W_t$ on $[0,b]$ is given by This function has an elementary antiderivative as a function of $a$, so we can integrate to find a cumulative distribution function for the random variable $Z_b$ describing the last zero of $W_t$ on the interval $[0,b]$: Hence why this law is dubbed an "arcsine law". The third arcsine law, describing the argument of the maximum value attained by $W_t$ on $[0,b]$, can be deduced from this statistic using a more advanced version of the reflection principle (check the Wikipedia page for a sketch). And I still haven't figured out how to prove the first law!

Naturally, you might also ask about the the smallest zero of the Wiener process $W_t$. Of course, the smallest nonnegative zero is at $t=0$, so to make this question nontrivial, we would have to ask about the smallest strictly positive zero.

But it turns out that there (almost certainly) is no smallest positive zero! To be exact - with probability one, there are infinitely many zeroes of $W_t$ in any interval $[0, b]$, no matter the value of $b>0$.

This is quite easy to prove using a fact that we proved in the last section. We showed that on any interval $[a,b]$, the probability of $W_t$ having no zeroes on that interval is Of course, for any $b > a > 0$, the probability of $W_t$ having no zeroes on $(0,b]$ is upper-bounded by the probability of it having no zeroes on $[a,b]$. Thus we have that the probability of $W_t$ having no zeroes on $(0,b]$ is at most This limit converges to $0$ as $a\to 0^+$, hence the probability of $W_t$ omitting zeroes on $(0,b]$ is precisely zero!

Convergence rate of Riemann sums

Finally, one more question: when we use Riemann sums to approximate integrals of functions involving Wiener processes, how quickly can we expect them to converge?

For context, when we approximate the integral of a continuously differentiable function over a closed, bounded interval using upper, lower, left or right Riemann sums with $n$ subintervals, we can expect that our approximation will converge to the true value with an error shrinking at a rate of at least $\mathcal O(1/n)$. But with a function that is not continuously differentiable, there is no such guarantee. Mere continuity is enough to guarantee convergence of the Riemann sums, but not necessarily at a rate bounded by $\mathcal O(1/n)$. A classic example is the Weierstrass Function, which is continuous but nowhere differentiable. The convergence of its Riemann sums is strictly slower than $\mathcal O(1/n)$, but it is Hölder continuous with exponent $1/2$, which means its Riemann sums can be proven to converge at at rate of at least $\mathcal O(1/\sqrt{n})$.

According to the Wikipedia page, a Wiener process is almost surely Hölder continuous with exponent $1/2 - \epsilon$ for any $\epsilon > 0$. (We won't prove this.) Naively, then, we can at least expect that the Riemann sums of a Wiener process would converge at a rate of at least $\mathcal O(1/\sqrt{n^{1-\epsilon}})$ for any $\epsilon > 0$. Similarly, if $f(x,y)$ is a smooth function $\mathbb R^2\to \mathbb R$, we can expect the Riemann sums for the integral to converge at this rate as well. But can we hope for anything better?

Surprisingly, the answer is yes! We can actually say that the errors will converge at a rate of $\mathcal O(1/n^{1-\epsilon})$ almost surely (with probability one) for any $\epsilon > 0$, which is nearly as good as the Riemann sum convergence rate for a continuously differentiable function. (Actually, we can do a bit better than this, as we will see.) The "spikiness" of the Wiener makes this argument difficult, but its random up-and-down fluctuations saves the day, allowing us to argue that positive and negative errors will "cancel out" to a sufficient extent on average that we can guarantee the same convergence rate as we could with a continuously differentiable function.

To see why, let's consider left Riemann sums approximating the integral of $f(t,W_t)$ on $[0,1]$. By the smoothness of $f$, for small differences $\Delta t$, we have the following Taylor expansion about $t$: where we have made use of the fact that $W_t$ is almost surely Hölder continuous with exponent $1/2-\epsilon$ to collapse the higher-order terms into the error term $\mathcal O(\Delta t^{3/2-\epsilon})$. This means that the error in the left Riemann sums can be bounded as follows: where $\lVert\cdot\rVert_\infty$ represents the supremum, so that $\lVert \partial f/\partial t\rVert_\infty$ is an upper bound on the partial of $f$ with respect to $t$. In the second term of this upper bound, note that each $\Delta W_{i/n}$ is a normally distributed random variable with variance $1/n$, hence each term $i$ of the sum has variance given by Summing all of these $n$ terms gives a normally distributed random variable with overall variance upper-bounded by $\lVert \partial f/\partial W\rVert_\infty$. Call this random variable $E_n$, so that we have If $E_n$ is a sequence of normally distributed random variables with bounded variance, then it is not difficult to show that $E_n = \mathcal O(n^\epsilon)$ with probability one for any $\epsilon > 0$. With a bit more care, one can show that $E_n = \mathcal O(\sqrt{\log n})$ with probability one. This tells us that the errors in the left Riemann sums for this integral converge at a rate of $\mathcal O(\sqrt{\log n}/n)$ (or $\mathcal O(1/n^{1-\epsilon})$, if you prefer the weaker estimate) almost certainly.

The key step in this proof is where we used the fact that variance is additive when adding Gaussians together. This distinguishes $W_t$ from other functions that are merely Hölder continuous. If $w(t)$ is a general function that is Hölder continuous with exponent $\alpha < 1$, then the most we can say about the sum of the differences $\Delta w(i/n)$ weighted by the values of $\partial f/\partial W$ is that they are bounded by $\sqrt{n}$. But due to the "independent increments" property of a Wiener process, it guarantees that these differences almost certainly cancel sufficiently to bound this sum by $\sqrt{\log n}$.


back to home page
The posts on this website are licensed under CC-by-NC 4.0.