These are random variables:

  1. Does a coin flip comes up heads?
  2. In the Covid-19 vaccine trial, how many vaccinated individuals will become symptomatic with Covid-19? How many in the placebo control group will?
  3. How many days until a light bulb burns out?
  4. How many fish will you catch tomorrow?

Discrete random variables

Bernoulli random variables.

Coin flips have two possible outcomes, heads or tails. Let \(p\) be the probability that it comes up heads. The coin flip is a Bernoulli(\(p\)) random variable. A fair coin is Bernoulli(.5).

library(purrr)
rbernoulli(1, p=.5) # the 1 indicates that we are only sampling 1 bernoulli random variable. 
## [1] TRUE
rbernoulli(1, p=.5)
## [1] FALSE
rbernoulli(1, p=.5)
## [1] FALSE
rbernoulli(1, p=.5)
## [1] TRUE

Binomial random variables.

Suppose that there are 20,100 people that receive a vaccine and 20,000 people that recieve the placebo. Then, whether or not an individual becomes symptomatic is a Bernoulli(\(p_{vaccine}\)) or Bernoulli(\(p_{placebo}\)) random variable, depending on whether they received the vaccine or the placebo. If we presume that everyone’s infection status is independent (e.g. they don’t infect each other), then the total number of vaccinated individuals that become symptomatic is Binomial(\(size = 20100, p = p_{vaccine}\)) and the number of individuals that received the placebo that become symptomatic is Binomial(\(size = 20000, p=p_{placebo}\)). As such, a Binomial random variable is ``the sum’’ of independent Bernoulli random variables with the same \(p\).

p_vaccine = .001
p_placebo = .01
rbinom(1, size = 20100, p = p_vaccine) # the 1 indicates that we are only sampling 1 binomial random variable. 
## [1] 25
# if we repeat the experiment, we get different results:
rbinom(1, 20100, p_vaccine); rbinom(1, 20100, p_vaccine); rbinom(1, 20100, p_vaccine); rbinom(1, 20100, p_vaccine)
## [1] 23
## [1] 20
## [1] 15
## [1] 19
rbinom(1, 20000, p_placebo); rbinom(1, 20000, p_placebo); rbinom(1, 20000, p_placebo); rbinom(1, 20000, p_placebo)
## [1] 182
## [1] 191
## [1] 194
## [1] 217

Geometric random variable.

One way to model the failure of a light bulb is to imagine that it is memoryless; the probability that it fails on any given day is independent of what has happened so far. This means that it fails on day zero as Bernoulli(\(p\)), where TRUE denotes failure. And, if it has not yet failed before any day \(t\), then the probability it fails on day \(t\) is again Bernoulli(\(p\)). Light bulb failures don’t need to be memoryless, but this is a surprisingly powerful and simple modeling assumption for lots of various “waiting times,” how long until something happens. Under this memoryless model, the day of failure is the first in a sequence of independent Bernoulli(\(p\))’s to come up TRUE. This is called Geometric(\(p\)).

p = .0001
rgeom(1,p); rgeom(1,p); rgeom(1,p); rgeom(1,p)
## [1] 938
## [1] 2234
## [1] 3332
## [1] 16427

Poisson random variable

You are going fishing on Lake Monona tomorrow. How many fish are you going to catch?

Denote the total number of fish as \(N\). Note that it should hold that \(5 = Np\). Why is that? This means that \(p = 5/N\).

The total number of fish that you will catch is Binomial(\(N,p\)). For really big \(N\) (“infinite”), this is equivalent to another distribution, Poisson(5).

rpois(1,5); rpois(1,5); rpois(1,5); rpois(1,5)
## [1] 8
## [1] 9
## [1] 7
## [1] 5

Let’s get lots of random variables…

Instead of putting 1 in the first argument to the functions, put how many replicates of the random variable you want.

rbernoulli(11, p=.5)
##  [1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE
rbinom(11, 20100, p_vaccine)
##  [1] 27 19 19 23 20 23 33 23 23 18 17
rbinom(11, 20000, p_placebo)
##  [1] 187 182 189 169 184 196 175 196 217 204 186
rgeom(11,p)
##  [1]  2550 27196 15049  2532 21803 16564 10990  8511  4730  5388   800
rpois(11,5)
##  [1] 10  6  2  4  8  1  6  9  2  8  4

Notation

Let \(X\) denote a random variable. We write \(X \sim\) Bernoulli\((p)\) if \(X\) has the distribution Bernoulli(\(p\)).

\[ X \sim Bernoulli(p)\] \[ X \sim Binomial(size, p)\] \[ X \sim Geometric(p)\] \[ X \sim Poisson(\lambda)\]

Other distributions

A number \(x\) is in the support of a random variable \(X\) if the probability that \(X=x\) is strictly greater than zero (i.e. \(P(X = x)>0\) ) The support of a random variable \(X\) is all such numbers \(x\).

The above doesn’t make much sense if you don’t pay attention to upper and lower case!

On this page, we have only discussed discrete random variables; this means that the support of each random variable has been a discrete set (e.g. the counting numbers).

There are lots of other ``named distributions’’ for discrete random variables.

A distribution doesn’t need to have a name! To make a discrete distribution, you must identify the support, then assign numbers (i.e. ``probabilities’’) to these values. To ensure it is a proper probability distribution, the numbers must (1) be non-negative and (2) the sum to one over the entire support. If those things hold, then you have a probability distribution!

Continuous random variables

The random variables above are all “discrete random variables” because \(X\) always had a “discrete set of possible outcomes” (e.g. \(\{0,1,2,...\}\)). This section covers “continuous random variables”. These random variables are a little trickier.

Normal or Gaussian random variables

This is probably the most well known distribution. It “looks like this”:

Here is one sample:

rnorm(1)
## [1] -1.820305

Here are 10 samples:

rnorm(10)
##  [1]  1.77112178  0.65419963  0.84909582 -0.90090371  0.56023938 -0.12526369
##  [7]  0.06027276  0.03943334  0.59796321 -0.45016766

The plot is showing the “density” of the standard normal distribution (mean zero, variance one). This density is the function \[f(x) = \frac{e^{-x^2/2}}{\sqrt{2\pi}}. \] We write \(X \sim N(0,1)\) if \(X\) has that distribution; the zero and the one represent the mean and variance. If you simulate a million \(N(0,1)\) random variables and make their histogram, it looks like the density:

This distribution comes up a lot in classical statistics because when you average a lot of random variables, then subtract their mean and divide by their standard deviation, then you get a random variable that has this distribution. It is very handy, because we can make \(Z\)-tables that give us numbers like 1.96 to make confidence intervals and hypothesis tests (perhaps you’ve heard of these things and also have bad dreams about the number 1.96 like I do!).

However, in this class we will use Monte Carlo simulation (next chapter) as a much more powerful computational tool that means we don’t have much use for this distribution! Wild! Right? No more bad dreams about 1.96!

Exponential distribution

This is a continuous distribution that is often used to model “waiting times”. It is similar to the geometric distribution, but for the fact that it is continuous. Here are ten:

rexp(10)
##  [1] 0.6007976 2.0746924 2.2280653 2.7327622 3.3554383 0.7547070 2.5338734
##  [8] 0.2597949 0.5099658 0.1341623

Here is a histogram of one million:

x = rexp(10^6)
hist(x, breaks = 100)

There are lots of other named distributions.

Here is a list of a bunch of continuous distributions that have names.

We make more complicated models by mixing these simple distributions together.

In a lot of settings, we want to think about a random variable that is a function of multiple other random variables. This can get really complicated. Thankfully, the next section shows an easy way to handle these situations.

Social network models

In my research, I study social networks. In the most basic type of social network data, you get to see who is friends with who, and that’s it! In my favorite probability models for social networks, each pair of people \(i\) and \(j\) become friends, independently, with some probability \(p_{ij}\). Then, we apply all sorts of wild functions to these networks (e.g. compute eigenvectors, if you know what those are). It is really difficult to handle these situations without the Monte Carlo tools in the next chapter.

Random variables can be formed by taking a function of another random variable. This is what happens in the pricing of “options” in finance.

Binomial asset pricing model

One area of finance is “option pricing.” Suppose that \(X\) is the price of GME stock in one month from today. An “option” pays you \(f(X)\) for some function \(f\).

For example, suppose GME costs $120 today and the function is \[f(X) = \left\{\begin{array}{ll} 120-X & \mbox{if $X<120$} \\ 0 & \mbox{if $X \ge 120$.} \end{array}\right.\] This is often referred to as a put option. This is like giving you the option to purchase GME stock in one month and selling GME at today’s price. If the price goes up, you would not use that option and you would make zero (but not lose any money). Otherwise, you would make \(120-X\). In effect, you are betting that GME will go down.

Suppose you are a bank and someone wants to purchase this put option. You need to determine the price. What would be the fair price to charge them? We will refer back to this example in the next section on Monte Carlo. In that section, we will need a model for the asset price \(X\).

One of the simplest models for \(X\) is the Binomial Asset Pricing Model which says that at every time step (e.g. every minute), the price goes up by one penny with probability \(p\), or down by one penny with probability \(1-p\). In this example, both \(X\) and \(f(X)\) are random variables.

What might be some problems with the Binomial Asset Pricing model? Let’s critique it.

Election models

Will the D or R presidential candidate win Wisconsin in 2024? What is the distribution of this random variable \(W\), where \(W = -1\) if D and \(W = 1\) if R? What about Michigan \(M \in \{-1,1\}\)? What if we wanted to model both Wisconsin and Michigan together \((W,M)\)? If you knew that \(M = 1\), would this tell you anything about \(W\)? Why? How do we model this?

One thing you could do is model the proportion of votes for D vs R in Wisconsin (only considering the two main candidates). Perhaps you consider this to be \(W_p \sim N(1/2,.05)\). Then, \(W\) is 1 if \(W_p > .5\) and \(W=-1\) if \(W_p<.5\). This helps because we can do the same thing for \(M\) and \(M_p\). Moreover, we can model \((W_p, M_p)\) as correlated via the multivariate normal distribution. To simulate these, you need to specify both the mean “vector” and a covariance matrix \(\Sigma \in \mathbb{R}^{2 \times 2}\)! The diagonal of the matrix specifies the variances \(\Sigma_{1,1}\) and \(\Sigma_{2,2}\), the off diagonal \(\Sigma_{1,2}\) specifies the covariances (kinda like correlation).

library(MASS)
Mean_Vector = c(.5,.5)
Covariance_Matrix = matrix(
  c(.05^2,.04^2,.04^2,.05^2), 
  nrow = 2)
Covariance_Matrix
##        [,1]   [,2]
## [1,] 0.0025 0.0016
## [2,] 0.0016 0.0025
WMp = mvrnorm(n = 1000, 
              mu = Mean_Vector, 
              Sigma = Covariance_Matrix)
plot(WMp, xlab = "Wisconsin proportion", ylab = "Michigan proportion")
lines(c(.5,.5), c(-10,10), col = "red")
lines(c(-10,10),c(.5,.5), col = "red")

What region of this plot corresponds to \(W=-1\) and \(M=+1\)? Does it make sense that there are fewer points in the top left compared to the top right?