# 8 Random Numbers

In this lesson we learn about generating “random” numbers in R.

## 8.1 Generating “Random” Numbers

The random word is between quotes above because R - or any software - cannot really generate truly random numbers. It gets really close though. This is because softwares use an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. This is termed a pseudorandom number generator, which is not *truly* random, because it is completely determined by a relatively small set of initial values, called the seed.

### 8.1.1 Sampling from a vector

Below we will sample from several vectors, and emulate tossing a coin, rolling a die, playing the lottery, etc.

`[1] 0`

` [1] 0 0 1 1 0 0 1 0 1 0`

`[1] 2`

` [1] 1 6 1 6 1 1 1 4 2 1`

` [1] 7 15 1 12 6 5 3 7 4 3`

`[1] 22 2 34 5 43 6`

`[1] 12 49 50 15 7 40`

`[1] "King of Spades"`

### 8.1.2 Concept: Randomness & Distributions

In computational statistics, and in R, random numbers are described by a distributions, which is a function specifying the probability that a random number is within some range. If the random number is continuous this is called “probability density function”, if the random number is discrete then the term is “probability mass function”.

If you want to learn more about Probability density functions, you can run the code below in your RStudio, which will produce a Shiny App displaying the probability density functions for the Normal, Poisson and Beta distributions. Here’s the link if you want the raw App

### 8.1.3 Sampling from a Distribution

How to choose a random number in R? As a language for statistical analysis, R has a comprehensive library of functions for generating random numbers from various statistical distributions.

Distribution | R Function |
---|---|

Uniform | runif |

Normal | rnorm |

Student’s t | rt |

F | rf |

chi-squared | rchisq |

Exponential | rexp |

Log normal | rlnorm |

Beta | rbeta |

Binomial | rbinom |

Negative Binomial | rnbinom |

Poisson | rpois |

Gamma | rgamma |

Weibull | rweibull |

Cauchy | rcauchy |

Multinomial | rmultinom |

Geometric | rgeom |

?Distributions | full list |

**The Uniform Distribution**

If you want to generate a decimal number where any value (including fractional values) between the stated minimum and maximum is equally likely, the runif() function is what you are looking for.This function generates values from the Uniform distribution. Here’s how to generate one random number between 0 and 1:

`[1] 0.9159742`

Of course, when you run this, you’ll get a different number, but it will definitely be between 0 and 1. You won’t get the values 0 or 1 exactly, either.

If you want to generate multiple random values, you can generate several values at once by specifying the number of values you want as the first argument to runif. Here’s how to generate 10 values between 0 and 1.

```
[1] 0.9945982 0.9423607 0.4861354 0.2834595 0.2515457 0.5032552 0.4969662
[8] 0.3184458 0.9622228 0.6340994
```

**The Normal Distribution**

$$\varphi (z)=\frac{1}{\sqrt{2\phantom{\rule{thinmathspace}{0ex}}\pi}}{e}^{-\frac{1}{2}{z}^{2}},\phantom{\rule{1em}{0ex}}z\in \mathbb{R}$$

To generate numbers from a normal distribution, use rnorm().

`[1] -1.138608`

`[1] 1.367827179 1.329564791 0.336472797 0.006892838 -0.455468738`

`[1] 0.08299672`

`[1] 4.667343`

`[1] -0.4667063 9.2337872 2.4872527 1.6287116 14.1312977`

```
[1] 12.3502131 9.6674139 -17.3221952 9.0020941 19.7603173
[6] 14.1386892 19.1232216 29.8373220 21.6910851 4.9126298
[11] 17.0418018 8.0158373 4.6192921 -18.5575866 2.1035315
[16] 14.8781464 31.6803254 15.0069461 16.2021020 0.3409679
[21] 11.6265471 -10.7823754 14.8522682 16.9676878 11.8551392
[26] 17.0073352 13.1168103 17.6046236 28.4246363 21.1236284
[31] 10.3266396 -1.1444896 14.1805782 5.9976476 24.9349310
[36] -6.0708094 5.8424821 14.2200837 8.4826346 3.9384888
[41] 6.9527893 16.2953610 18.9517198 16.6021263 32.7348352
[46] 21.7349757 12.8770973 3.4022991 39.1914013 16.7741550
[51] 3.1567966 11.8649208 6.7560670 7.2529578 0.6649666
[56] 11.1684534 13.1916024 -0.7754212 -22.3315213 7.4512535
[61] 10.2951783 15.9427377 10.5913517 14.1339889 -0.9777217
[66] 17.1117526 17.1888873 12.5165107 23.5727444 14.0446847
[71] 12.6436427 12.6804390 14.3693058 20.6012391 14.5219040
[ reached getOption("max.print") -- omitted 25 entries ]
```

**The Poisson Distribution**

To generate numbers from a poisson distribution, use rpois(). The Poisson distribution is popular for modeling the number of times an event occurs in an interval of time or space. The Poisson distribution may be useful to model events such as:

- The number of meteors greater than 1 meter diameter that strike earth in a year
- The number of occurrences of the DNA sequence “ACGT” in a gene
- The number of patients arriving in an emergency room between 11 and 12 pm

In probability theory, a Poisson process is a stochastic process that counts the number of independent events in a given time interval. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

$p(x;\lambda )=\frac{{e}^{-\lambda}{\lambda}^{x}}{x!}{\textstyle \text{for}}x=0,1,2,\cdots $

` [1] 1 2 1 1 0 1 0 1 0 1`

` [1] 7 4 6 3 6 6 6 1 4 4`

` [1] 0 0 1 0 3 0 0 0 0 1`

` [1] 7 1 6 3 7 3 6 5 6 3`

## 8.2 13.1 Wrap-up Exercise

Let’s say I have a categorical variable which can take the values A, B, C and D. How can I generate 10000 random data points and control for the frequency of each? For example: A = 10% B = 20% C = 65% D = 5%. Any ideas how to do this? Don’t fret if you have no ideas. I didn’t when I first tried to solve it. But it helped me a great deal to practice the skills I learned and how they can be useful.

### 8.2.1 Solution 1: Elegant and quickest

```
x
A B C D
1014 2073 6427 486
```

```
x
A B C D
0.1014 0.2073 0.6427 0.0486
```

### 8.2.2 Solution 2: Clever, but dirty

```
[1] "C" "C" "C" "C" "B" "C" "B" "C" "C" "C" "B" "C" "C" "C" "A" "C" "C"
[18] "C" "C" "C" "D" "C" "C" "C" "C" "B" "D" "D" "C" "C" "B" "B" "C" "C"
[35] "B" "B" "B" "B" "A" "B" "C" "C" "C" "A" "C" "B" "B" "C" "C" "C" "C"
[52] "D" "C" "B" "C" "C" "C" "C" "C" "C" "B" "A" "C" "B" "B" "C" "D" "C"
[69] "C" "C" "C" "C" "C" "B" "C"
[ reached getOption("max.print") -- omitted 925 entries ]
```

```
[1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
[18] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
[35] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
[52] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
[69] "A" "A" "A" "A" "A" "A" "A"
[ reached getOption("max.print") -- omitted 9925 entries ]
```

```
x
A B C D
0.10 0.20 0.65 0.05
```

### 8.2.3 Solution 3: Brute force (reversed thinking?)

```
x
A B C D
0.0967 0.1901 0.6653 0.0479
```