*2018 October 20*

In this post, I will list some common and useful **discrete distributions**, or different types of probability mass functions often used for discrete random variables. I will, however, neglect the **discrete uniform distribution**, in which \(S\) is finite and each of its elements are equally probable. Some examples of variables distributed in this way are fair coin flips or die rolls.

The first distribution to discuss is the **Bernoulli Distribution**. We say that \(X\) follows the Bernoulli Distribution with parameter \(p\), or \(X\sim \text{Bern}(p)\), if \(X=1\) with probability \(p\) and \(X=0\) with probability \(1-p\). We would then say that \(X\) is determined by a **Bernoulli Trial**.

The mean \(\mathbb E[X]\) and the variance \(\sigma_X^2\) are very simple in this case. We have that

\[\mathbb E [X]=0\cdot (1-p)+1\cdot p=p\]

and

\[\sigma_X^2=\mathbb E[X^2]-\mathbb E^2[X]=p-p^2=p(1-p)\]

There is little more to say about the Bernoulli Distribution, except that it is a “building block” of sorts of some more complicated distributions.

Consider, for example, the **Binomial Distribution**. If \(X\) follows the Binomial Distribution with parameters \(n\) and \(p\), or \(X\sim\text{Bin}(n,p)\), then \(X\) represents the number of successes if \(n\) independent Bernoulli Trials, each with parameter \(p\), were conducted. That is, if \(X_1,...,X_n\sim \text{Bern}(p)\), then \(X=X_1+...+X_n\sim\text{Bin}(n,p)\). We may calculate the mean of \(X\) easily:

\[\mathbb E[X]=\mathbb E[X_1+...+X_n]=\mathbb E[X_1]+...+\mathbb E[X_n]=np\]

Since the variance \(\sigma_X ^2\) is also additive over independent variables, one may similarly calculate that

\[\sigma_X^2=np(1-p)\]

Furthermore, we have that the PMF of \(X\) is

\[p_X(x)=\binom{n}{x}p^x(1-p)^{n-x}\]

because there are \(\binom{n}{x}\) ways to choose the \(x\) successes, a probability \(p^x\) of all of them succeeding, and a probability \((1-p)^{n-x}\) of the other \(n-x\) trials failing.

Consider also the counterpart of this distribution: the **Negative Binomial Distribution**. If \(X\) follows the Negative Binomial Distribution with parameters \(r\) and \(p\), or \(X\sim\text{NB}(r,p)\), then \(X\) represents the number of successes occurring in a string of independent Bernoulli Trials before \(r\) failures occur. The calculations here require a bit more work. Combinatorially, one may see that the probability of having \(x\) successes and \(r\) failures is

\[p_X(x)=\binom{r+x-1}{x}p^x (1-p)^r\]

Thus, we have that the expected number of successes is

\[\mathbb E[X]=\sum_{x=0}^\infty \binom{r+x-1}{x} x p^x (1-p)^r\]

Alternatively, one may avoid computing the exact value of this series by noticing that by the definition of the Negative Binomial Distribution, a variable \(X\) distributed thusly with parameters \(r\) and \(p\) can also be written as a sum of independent Negative Binomial Variables each with parameters \(r=1\) and \(p\). This is because conducting Bernoulli Trials until \(r\) failures occur and then counting the number of successes is equivalent to conducting \(r\) sequences of Bernoulli Trials, each of which terminates after a single failure, and then adding up the successes in each sequence of trials. Thus, we have that if \(X\sim \text{NB}(r,p)\) and \(Y\sim \text{NB}(1,p)\),

\[\mathbb E[X]=r\mathbb E[Y]\]

We may compute the value of \(\mathbb E[Y]\) easily:

\[\mathbb E[Y]=\sum_{y=0}^\infty y p^y (1-p)=\frac{p}{1-p}\]

Thus, we have that if \(X\sim\text{NB}(r,p)\), then

\[\mathbb E[X]=\frac{rp}{1-p}\]

We may compute the variance similarly by using the additive property of \(\sigma_X^2\) discussed in the previous blog post for independent random variables. If \(Y\sim \text{NB}(1,p)\), then we have that

\[\begin{align}
\sigma_Y^2
&=\mathbb E[Y^2]-\mathbb E^2[Y]\\
&=-\frac{p^2}{(1-p)^2}+\sum_{y=0}^\infty y^2 p^y (1-p)\\
&=-\frac{p^2}{(1-p)^2}+\frac{p(1+p)}{(1-p)^2}\\
&=\frac{p}{(1-p)^2}
\end{align}\]

Thus, we also have that if \(X\sim \text{NB}(r,p)\), then

\[\sigma_X^2=\frac{rp}{(1-p)^2}\]

Next up is the **Poisson Distribution**, which is used to model the number random, independent events occurring within a fixed interval of time. This distribution is also related to the Binomial Distribution, with the difference residing in the fact that the Poisson Distributions counts the number of times an event occurs in a continuous interval of time, while the Binomial Distribution counts the number of times an event (namely, the success of a Bernoulli Trial) occurs over a fixed number of discrete trials (however, the Poisson Distribution is still a discrete distribution, since \(X\) takes on only nonnegative integer values). The Poisson Distribution only takes one parameter \(\lambda\), which represents the average number of events occurring in the interval of time. If \(X\) follows a Poisson Distribution with parameter \(\lambda\), then we say that \(x\sim\text{Pois}(\lambda)\).

To find the PMF of a random variable \(X\) with \(X\sim \text{Pois}(\lambda)\), we may take the limit of a sequence of random variables following the Binomial Distribution to make the transition from a discrete sequence of trials to a continuous time interval composed of infinitely many infinitesimal “moments.” Consider the sequence of random variables \(X_1,X_2,...\) with \(X_n\sim\text{Bin}(n,\lambda/n)\), each of which has mean \(\lambda\) as desired (which is apparent from the earlier calculation of the mean of a Binomially Distributed random variable). We may then let the PMF of \(X\) equal the limit of the sequence of PMFs for \(X_1,X_2,...\). By using the previously calculated PMF for the Binomial Distribution, we have that

\[\begin{align}
p_X(x)
&=\lim_{n\to\infty} p_{X_n}(x)\\
&=\lim_{n\to\infty} \binom{n}{x}\bigg(\frac{\lambda}{n}\bigg)^x\bigg(1-\frac{\lambda}{n}\bigg)^{n-x}\\
&=\lim_{n\to\infty} \frac{n!}{x!(n-x)!}\bigg(\frac{\lambda}{n}\bigg)^x\bigg(1-\frac{\lambda}{n}\bigg)^{n-x}\\
&=\frac{\lambda^x}{x!}\lim_{n\to\infty} \frac{n(n-1)...(n-x+1)}{n^x}\bigg(1-\frac{\lambda}{n}\bigg)^{n-x}\\
&=\frac{\lambda^x}{x!}\bigg(\lim_{n\to\infty} \frac{n(n-1)...(n-x+1)}{n^x}\bigg)\bigg(\lim_{n\to\infty}\bigg(1-\frac{\lambda}{n}\bigg)^{n-x}\bigg)\\
&=\frac{\lambda^x}{x!}\lim_{n\to\infty}\bigg(1-\frac{\lambda}{n}\bigg)^{n-x}\\
&=\frac{\lambda^x e^{-\lambda}}{x!}
\end{align}\]

and we are left with our final result:

\[p_X(x)=\frac{\lambda^x e^{-\lambda}}{x!}\]

Because the mean was fixed at \(\lambda\) for each of the random variables \(X_1,X_2,...\), we have that

\[\mathbb E[X]=\lambda\]

We may also calculate the variance \(\sigma_X^2\) by considering the limit of the variances of the sequence of random variables previously considered. We have that

\[\sigma_{X_n}^2=\lambda\bigg(1-\frac{\lambda}{n}\bigg)\]

and so, by taking the limit as \(n\to\infty\), we have that

\[\sigma_X^2=\lambda\]

Interestingly, the variance for a random variable of this distribution is the same as its mean!

Now we shall consider the **Geometric Distribution**. If a random variable \(X\) follows the Geometric Distribution with parameter \(p\), or \(X\sim\text{Geom}(p)\), then \(X\) represents the number of Bernoulli Trials conducted before a success occurs (including the final success). We can easily see that

\[p_X(x)=p(1-p)^{x-1}\]

We may also easily calculate the mean as

\[\mathbb E[X]=\sum_{x=1}^\infty xp(1-p)^{x-1}=\frac{1}{p}\]

and the variance as

\[\begin{align}
\sigma_X^2
&=\mathbb E[X^2]-\mathbb E^2 [X]\\
&=-\frac{1}{p^2}+\sum_{x=1}^\infty x^2 p(1-p)^{x-1}\\
&=-\frac{1}{p^2}+\frac{p^2-3p+2}{p^2(1-p)}\\
&=\frac{1-p}{p^2}\\
\end{align}\]

An interesting fact about this distribution is that, as observed in the previous blog post, the expected value of \((1-p)^{-x}\) diverges to infinity.

The last discrtete distribution that we shall briefly consider is the **Hypergeometric Distribution**. We say that a random variable \(X\) follows the Hypergeometric Distribution with parameters \(N\), \(R\), and \(n\), or \(X\sim\text{H}(N,R,n)\), if \(X\) is the number of “successes” chosen if \(n\) items are sampled from a population of \(N\) items of which \(R\) are considered “successes.” This is similar to the Binomial distribution as well, but the chance of success decreases as each success is chosen, because the “trials” are not independent, with samples being chosen from a population without replacement. We may find \(p_X(x)\) as follows: in a population of \(N\) items with \(R\) “success” items, the number of ways to choose \(x\) “successes” and \(n-x\) “failures” is \(\binom{R}{x}\binom{N-R}{n-x}\). Since the total number of ways to choosen \(n\) items is \(\binom{N}{n}\), we have that

\[p_X(x)=\frac{\binom{R}{x}\binom{N-R}{n-x}}{\binom{N}{n}}\]

The mean and variance of a random variable \(X\) distributed Hypergeometrically are given by

\[\mathbb E[X]=\frac{nR}{N}\]

\[\sigma_X^2=\frac{nR}{N}\bigg(1-\frac{R}{N}\bigg)\frac{N-n}{N-1}\]

Because the derivations of these two equations are elementary but algebraically tedious, they are conveniently left as exercises for the reader (which I am sure the reader shall not chose to complete).

That finishes my list of discrete distributions! To conclude this blog post, I shall pose two problems to the reader. The first problem is as follows:

Suppose that \(X_1,X_2,...,X_n\) is a sequence of identically distributed and independent random variables, and that you know \(X_i \sim\text{Geom}(p)\) for each \(X_i\), but you do not know the value of \(p\). If you are given the value of each \(X_i\), with what certainty or accuracy can you guess the value of \(\mathbb E[X]\)? As you take more and more samples, that is, as \(n\to\infty\), how does the “error” or “uncertainty” of your guess behave asymptotically?

The second question is the following:

Suppose that you play a game in which you flip a fair coin each turn, and you win one dollar if the coin comes up heads, and you lose one dollar otherwise (negative dollar amounts are allowed, and are interpreted as debt). If \(X_n\) represents your net gain in dollars after playing \(n\) times, then what is \(\mathbb E[X_n]\)? What are \(\mathbb E[X_n^2]\) and \(\mathbb E[X_n^4]\)? Asymptotically, how does the number of times that you “break even” behave (that is, about how many of the variables \(X_1,X_2,...,X_n\) would you expect to equal zero for large \(n\))?

Cheers!