forked from dereksonderegger/444
-
Notifications
You must be signed in to change notification settings - Fork 0
/
06_FlowControl.Rmd
678 lines (556 loc) · 26 KB
/
06_FlowControl.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
---
output:
pdf_document: default
html_document: default
---
# Flow Control
```{r, echo=FALSE}
# Un-attach any packages that happen to already be loaded. In general this is unecessary
# but is important for the creation of the book to not have package namespaces
# fighting unexpectedly.
pkgs = names(sessionInfo()$otherPkgs)
if( length(pkgs > 0)){
pkgs = paste('package:', pkgs, sep = "")
for( i in 1:length(pkgs)){
detach(pkgs[i], character.only = TRUE, force=TRUE)
}
}
knitr::opts_chunk$set(cache=FALSE)
```
```{r, message=FALSE, warning=FALSE, results='hide'}
suppressPackageStartupMessages({
library(tidyverse, quietly = TRUE) # loading ggplot2 and dplyr
})
```
As always, there is a [Video Lecture](https://youtu.be/bPZKP_yYFLo) that accompanies this chapter.
Often it is necessary to write scripts that perform different action depending on the data or to automate a task that must be repeated many times. To address these issues we will introduce the `if` statement and its closely related cousin `if else`. To address repeated tasks we will define two types of loops, a `while` loop and a `for` loop.
## Logical Expressions
The most common logical expressions are the numerical expressions `<`, `<=`, `==`, `!=`, `>=`, `>`. These are the usual logical comparisons from mathematics, with `!=` being the *not equal* comparison. For any logical value or vector of values, the `!` flips the logical values.
```{r}
df <- data.frame(A=1:6, B=5:10)
df
df %>% mutate(`A==3?` = A == 3,
`A<=3?` = A <= 3,
`A!=3?` = A != 3,
`Flip Previous` = ! `A!=3?` )
```
I find that it is preferable to write logical comparisons using `<` or `<=` rather than the "greater than" versions because the number line is read left to right, so it is much easier to have the smaller value on the left.
```{r}
df %>% mutate( `A < B` = A < B)
```
If we have two (or more) vectors of logical values, we can do two *pairwise* operations. The "and" operator `&` will result in a TRUE value if all elements are TRUE. The "or" operator will result in a TRUE value if either the left hand side or right hand side is TRUE.
```{r}
df %>% mutate(C = A>=0, D = A<=5) %>%
mutate( result1_and = C & D, # C and D both true
result2_and = A>=0 & A<=5, # directly calculated
result3_and = 0 <= A & A<=5, # more readable 0 <= A <= 5
result4_or = A<=0 | 5<=A) # A not in (0,5) range
```
Next we can summarize a vector of logical values using `any()`, `all()`, and `which()`. These functions do exactly what you would expect them to do.
```{r}
any(6:10 <= 7 ) # Should return TRUE because there are two TRUE results
all(6:10 <= 7 ) # Should return FALSE because there is at least one FALSE result
which( 6:10 <= 7) # return the indices of the TRUE values
```
Finally, I often need to figure out if a character string is in some set of values.
```{r}
df <- data.frame( Type = rep(c('A','B','C','D'), each=2), Value=rnorm(8) )
df
# df %>% filter( Type == 'A' | Type == 'B' )
df %>% filter( Type %in% c('A','B') ) # Only rows with Type == 'A' or Type =='B'
```
## Decision statements
### In `dplyr` wrangling
A very common task within a data wrangling pipeline is to create a new column that recodes information in another column. Consider the following data frame that has name, gender, and political party affiliation of six individuals. In this example, we've coded male/female as 1/0 and political party as 1,2,3 for democratic, republican, and independent.
```{r}
people <- data.frame(
name = c('Barack','Michelle', 'George', 'Laura', 'Bernie', 'Deborah'),
gender = c(1,0,1,0,1,0),
party = c(1,1,2,2,3,3)
)
people
```
The command `ifelse()` works quite well within a `dplyr::mutate()` command and
it responds correctly to vectors. The syntax is
`ifelse( logical.expression, TrueValue, FalseValue )`. The return object of this
function is a vector that is appropriately made up of the TRUE and FALSE results.
For example, there we take the column `gender` and for each row test to see if the
value is `0`. If it is, the resulting element is `Female` otherwise it is `Male`.
```{r}
people <- people %>%
mutate( gender2 = ifelse( gender == 0, 'Female', 'Male') )
people
```
To do something similar for the case where we have 3 or more categories, we could
use the `ifelse()` command repeatedly to address each category level separately.
However because the `ifelse` command is limited to just two cases, it would be
nice if there was a generalization to multiple categories. The
`dplyr::case_when` function is that generalization. The syntax is
`case_when( logicalExpression1~Value1, logicalExpression2~Value2, ... )`.
We can have as many `LogicalExpression ~ Value` pairs as we want.
```{r}
people <- people %>%
mutate( party2 = case_when( party == 1 ~ 'Democratic',
party == 2 ~ 'Republican',
party == 3 ~ 'Independent',
TRUE ~ 'None Stated' ) )
people
```
Often the last case is a catch all case where the logical expression will
ALWAYS evaluate to TRUE and this is the value for all other input.
As another alternative to the problem of recoding factor levels, we could use
the command `forcats::fct_recode()` function. See the Factors chapter in this
book for more information about factors.
### General `if else`
While programming, I often need to perform expressions that are more complicated
than what the `ifelse()` command can do. Remember, the `ifelse()` command only
can produce a vector. For any logic that is more complected that simply producing
an output vector, we'll need a more flexible approach. The downside is that it
is slightly more complicated to use and *does not* automatically handle vector
inputs for the comparison.
The general format of an `if` or an `if else` is presented below:.
```{r, eval=FALSE}
# Simplest version
if( logical.test ){ # The logical test must be a SINGLE TRUE/FALSE value
expression # The expression could be many lines of code
}
# Including the optional else
if( logical.test ){
expression
}else{
expression
}
```
where the else part is optional.
Suppose that I have a piece of code that generates a random variable from the
Binomial distribution with one sample (essentially just flipping a coin) but
I'd like to label it head or tails instead of one or zero.
The test expression inside the `if()` is evaluated and if it is true, then the
subsequent statement is executed. If the test expression is false, the next
statement is skipped. The way the R language is defined, only the first statement
after the if statement is executed (or skipped) depending on the test expression.
If we want multiple statements to be executed (or skipped), we will wrap those
expressions in curly brackets `{ }`. I find it easier to follow the `if else`
logic when I see the curly brackets so I use them even when there is only one
expression to be executed. Also notice that the RStudio editor indents the code
that might be skipped to try help give you a hint that it will be conditionally
evaluated.
```{r}
# Flip the coin, and we get a 0 or 1
result <- rbinom(n=1, size=1, prob=0.5)
result
# convert the 0/1 to Tail/Head
if( result == 0 ){
result <- 'Tail'
print(" in the if statement, got a Tail! ")
}else{
result <- 'Head'
print("In the else part!")
}
result
```
Run this code several times until you get both cases several times. Notice that
in the Environment tab in RStudio, the value of the variable `result` changes as
you execute the code repeatedly.
To provide a more statistically interesting example of when we might use an
`if else` statement, consider the calculation of a p-value in a 1-sample t-test
with a two-sided alternative. Recall that the calculation was:
* If the test statistic t is negative, then p-value = $2*P\left(T_{df} \le t \right)$
* If the test statistic t is positive, then p-value = $2*P\left(T_{df} \ge t \right)$.
```{r}
# create some fake data
n <- 20 # suppose this had a sample size of 20
x <- rnorm(n, mean=2, sd=1)
# testing H0: mu = 0 vs Ha: mu =/= 0
t <- ( mean(x) - 0 ) / ( sd(x)/sqrt(n) )
df <- n-1
if( t < 0 ){
p.value <- 2 * pt(t, df)
}else{
p.value <- 2 * (1 - pt(t, df))
}
# print the resulting p-value
p.value
```
This sort of logic is necessary for the calculation of p-values and so something
similar is found somewhere inside the `t.test()` function.
Finally we can nest `if else` statements together to allow you to write code
that has many different execution routes.
```{r}
# randomly grab a number between 0,5 and round it up to 1,2, ..., 5
birth.order <- ceiling( runif(1, 0,5) )
if( birth.order == 1 ){
print('The first child had more rules to follow')
}else if( birth.order == 2 ){
print('The second child was ignored')
}else if( birth.order == 3 ){
print('The third child was spoiled')
}else{
# if birth.order is anything other than 1, 2 or 3
print('No more unfounded generalizations!')
}
```
## Loops
It is often desirable to write code that does the same thing over and over,
relieving you of the burden of repetitive tasks. To do this we'll need a way
to tell the computer to repeat some section of code over and over. However we'll
usually want something small to change each time through the loop and some way
to tell the computer how many times to run the loop or when to stop repeating.
### `while` Loops
The basic form of a `while` loop is as follows:
```{r, eval=FALSE}
# while loop with multiple lines to be repeated
while( logical.test ){
expression1 # multiple lines of R code
expression2
}
```
The computer will first evaluate the test expression. If it is true, it will
execute the code once. It will then evaluate the test expression again to see if
it is still true, and if so it will execute the code section a third time. The
computer will continue with this process until the test expression finally
evaluates as false.
```{r}
x <- 1
while( x < 100 ){
print( paste("In loop and x is now:", x) ) # print out current value of x
x <- 2*x
}
```
It is very common to forget to update the variable used in the test expression. In that case the test expression will never be false and the computer will never stop. This unfortunate situation is called an *infinite loop*.
```{r, eval=FALSE}
# Example of an infinite loop! Do not Run!
x <- 1
while( x < 10 ){
print(x)
}
```
### `for` Loops
Often we know ahead of time exactly how many times we should go through the loop.
We could use a `while` loop, but there is also a second construct called a `for`
loop that is quite useful.
The format of a for loop is as follows:
```{r, eval=FALSE}
for( index in vector ){
expression1
expression2
}
```
where the `index` variable will take on each value in `vector` in succession and
the next statement will be evaluated. As always, the statement can be multiple
statements wrapped in curly brackets {}.
```{r}
for( i in 1:5 ){
print( paste("In the loop and current value is i =", i) )
}
```
What is happening is that `i` starts out as the first element of the vector
`c(1,2,3,4,5)`, in this case, `i` starts out as 1. After `i` is assigned, the
statements in the curly brackets are then evaluated. Once we get to the end of
those statements, i is reassigned to the next element of the vector
`c(1,2,3,4,5)`. This process is repeated until `i` has been assigned to each
element of the given vector. It is somewhat traditional to use `i` and `j`
and the index variables, but they could be anything.
While the recipe above is the minimal definition of a `for` loop, there is
often a bit more set up to create a result vector or data frame that will store
the steps of the `for` loop.
```{r, eval=FALSE}
N <- 10
result <- NULL # Make a place to store each step of the for loop
for( i in 1:N ){
# Perhaps some code that calculates something
result[i] <- # something
}
```
We can use this loop to calculate the first $10$ elements of the Fibonacci
sequence. Recall that the Fibonacci sequence is defined by
$F_{i}=F_{i-1}+F_{i-2}$ where $F_{1}=0$ and $F_{2}=1$.
```{r}
N <- 10 # How many Fibonacci numbers to create
F <- rep(0,2) # initialize a vector of zeros
F[1] <- 0 # F[1] should be zero
F[2] <- 1 # F[2] should be 1
print(F) # Show the value of F before the loop
for( i in 3:N ){
F[i] <- F[i-1] + F[i-2] # define based on the prior two values
print(F) # show F at each step of the loop
}
```
For a more statistical case where we might want to perform a loop, we can
consider the creation of the bootstrap estimate of a sampling distribution.
The bootstrap distribution is created by repeatedly re-sampling with replacement
from our original sample data, running the analysis for each re-sample, and then
saving the statistic of interest.
```{r, ForLoopExample, cache=TRUE, message=FALSE, warning=FALSE, fig.height=3}
library(dplyr)
library(ggplot2)
# bootstrap from the trees dataset.
SampDist <- data.frame(xbar=NULL) # Make a data frame to store the means
for( i in 1:1000 ){
## Do some stuff
boot.data <- trees %>% dplyr::sample_frac(replace=TRUE)
boot.stat <- boot.data %>% dplyr::summarise(xbar=mean(Height)) # 1x1 data frame
## Save the result as a new row in the output data frame
SampDist <- rbind( SampDist, boot.stat )
}
# Check out the structure of the result
str(SampDist)
# Plot the output
ggplot(SampDist, aes(x=xbar)) +
geom_histogram( binwidth=0.25) +
labs(title='Trees Data: Bootstrap distribution of xbar')
```
### `mosaic::do()` loops
Many times when using a `for` loop, we want to save some quantity for each pass
through the `for` loop. Because this is such a common tasks, the `mosaic::do()`
function automates the creation of the output data frame and the saving each
repetition. This function is intended to hide the coding steps that often trips
up new programmers.
```{r, DoLoopExample, cache=TRUE, message=FALSE, warning=FALSE, fig.height=3}
# Same Loop
SampDist <- mosaic::do(1000) * {
trees %>% dplyr::sample_frac(replace=TRUE) %>%
dplyr::summarise(xbar=mean(Height)) %>% # 1x1 data frame
pull(xbar) # Scalar
}
# Structure of the SampDist object
str(SampDist)
# Plot the output
ggplot(SampDist, aes(x=result)) +
geom_histogram( binwidth=0.25) +
labs(title='Trees Data: Bootstrap distribution of xbar')
```
## Functions
It is very important to be able to define a piece of programming logic that is
repeated often. For example, I don't want to have to always program the
mathematical code for calculating the sample variance of a vector of data.
Instead I just want to call a function that does everything for me and I don't
have to worry about the details.
While hiding the computational details is nice, fundamentally writing functions
allows us to think about our problems at a higher layer of abstraction. For
example, most scientists just want to run a t-test on their data and get the
appropriate p-value out; they want to focus on their problem and not how to
calculate what the appropriate degrees of freedom are.
Another statistical example where functions are important is a bootstrap data
analysis where we need to define a function that calculates whatever statistic
the research cares about.
The format for defining your own function is
```{r, eval=FALSE}
function.name <- function(arg1, arg2, arg3){
statement1
statement2
}
```
where `arg1` is the first argument passed to the function and `arg2` is the second.
To illustrate how to define your own function, we will define a variance
calculating function.
```{r}
# define my function
my.var <- function(x){
n <- length(x) # calculate sample size
xbar <- mean(x) # calculate sample mean
SSE <- sum( (x-xbar)^2 ) # calculate sum of squared error
v <- SSE / ( n - 1 ) # "average" squared error
return(v) # result of function is v
}
```
```{r}
# create a vector that I wish to calculate the variance of
test.vector <- c(1,2,2,4,5)
# calculate the variance using my function
calculated.var <- my.var( test.vector )
calculated.var
```
Notice that even though I defined my function using `x` as my vector of data, and
passed my function something named `test.vector`, R does the appropriate renaming.
If my function doesn't modify its input arguments, then R just passes a pointer to
the inputs to avoid copying large amounts of data when you call a function. If
your function modifies its input, then R will take the input data, copy it, and
then pass that new copy to the function. This means that a function cannot modify
its arguments. In Computer Science parlance, R does not allow for procedural side
effects. Think of the variable `x` as a placeholder, with it being replaced by
whatever gets passed into the function.
When I call a function, it might cause something to happen (e.g. draw a plot); or,
it might do some calculations, the result of which is returned by the function,
and we might want to save that. Inside a function, if I want the result of some
calculation saved, I return the result as the output of the function. The way I
specify to do this is via the `return` statement. (Actually R doesn't completely
require this. But the alternative method is less intuitive and I strongly
recommend using the `return()` statement for readability.)
By writing a function, I can use the same chunk of code repeatedly. This means
that I can do all my tedious calculations inside the function and just call the
function whenever I want and happily ignore the details. Consider the function
`t.test()` which we have used to do all the calculations in a t-test. We could
write a similar function using the following code:
```{r}
# define my function
one.sample.t.test <- function(input.data, mu0){
n <- length(input.data)
xbar <- mean(input.data)
s <- sd(input.data)
t <- (xbar - mu0)/(s / sqrt(n))
if( t < 0 ){
p.value <- 2 * pt(t, df=n-1)
}else{
p.value <- 2 * (1-pt(t, df=n-1))
}
# we haven't addressed how to print things in a organized
# fashion, the following is ugly, but works...
# Notice that this function returns a character string
# with the necessary information in the string.
return( paste('t =', round(t, digits=3), ' and p.value =', round(p.value, 3)) )
}
```
```{r}
# create a vector that I wish apply a one-sample t-test on.
test.data <- c(1,2,2,4,5,4,3,2,3,2,4,5,6)
one.sample.t.test( test.data, mu0=2 )
```
Nearly every function we use to do data analysis is written in a similar fashion.
Somebody decided it would be convenient to have a function that did an ANOVA
analysis and they wrote something that is similar to the above function, but is
a bit grander in scope. Even if you don't end up writing any of your own functions,
knowing how will help you understand why certain functions you use are designed
the way they are.
## Exercises {#Exercises_FlowControl}
1. I've created a dataset about presidential candidates for the 2020 US election
and it is available on the github website for my STA 141
```{r, message=FALSE}
prez <- readr::read_csv('https://raw.githubusercontent.com/dereksonderegger/444/master/data-raw/Prez_Candidate_Birthdays')
prez
```
a) Re-code the Gender column to have Male and Female levels. Similarly convert
the party variable to be Democratic or Republican. *You may write this using*
*a `for()` loop with an `if(){ ... }else{...}` structure nested inside, or simply*
*using a `mutate()` statement with the `ifelse()` command inside. I believe*
*the second option is MUCH easier.*
b) Bernie Sanders was registered as an Independent up until his 2016 presidential
run. Change his political party value into 'Independent'.
2. The $Uniform\left(a,b\right)$ distribution is defined on x $\in [a,b]$ and
represents a random variable that takes on any value of between `a` and `b`
with equal probability. Technically since there are an infinite number of
values between `a` and `b`, each value has a probability of 0 of being selected
and I should say each interval of width $d$ has equal probability. It has the
density function
$$f\left(x\right)=\begin{cases}
\frac{1}{b-a} & \;\;\;\;a\le x\le b\\
0 & \;\;\;\;\textrm{otherwise}
\end{cases}$$
The R function `dunif()` evaluates this density function for the above
defined values of x, a, and b. Somewhere in that function, there is a chunk
of code that evaluates the density for arbitrary values of $x$. Run this code
a few times and notice sometimes the result is $0$ and sometimes it is
$1/(10-4)=0.16666667$.
```{r}
a <- 4 # The min and max values we will use for this example
b <- 10 # Could be anything, but we need to pick something
x <- runif(n=1, 0,10) # one random value between 0 and 10
# what is value of f(x) at the randomly selected x value?
dunif(x, a, b)
```
We will write a sequence of statements that utilizes if statements to
appropriately calculate the density of `x`, assuming that `a`, `b` , and `x`
are given to you, but your code won't know if `x` is between `a` and `b`.
That is, your code needs to figure out if it is and give either `1/(b-a)`
or `0`.
a. We could write a set of `if else` statements.
```{r, eval=FALSE}
a <- 4
b <- 10
x <- runif(n=1, 0,10) # one random value between 0 and 10
if( x < a ){
result <- ???? # Replace ???? with something appropriate!
}else if( x <= b ){
result <- ????
}else{
result <- ????
}
print(paste('x=',round(x,digits=3), ' result=', round(result,digits=3)))
```
Replace the `????` with the appropriate value, either 0 or
$1/\left(b-a\right)$. Run the code repeatedly until you are certain that
it is calculating the correct density value.
b. We could perform the logical comparison all in one comparison. Recall
that we can use `&` to mean “and” and `|` to mean “or”. In the following
two code chunks, replace the `???` with either `&` or `|` to make the
appropriate result.
i.
```{r, eval=FALSE}
x <- runif(n=1, 0,10) # one random value between 0 and 10
if( (a<=x) ??? (x<=b) ){
result <- 1/(b-a)
}else{
result <- 0
}
print(paste('x=',round(x,digits=3), ' result=', round(result,digits=3)))
```
ii.
```{r, eval=FALSE}
x <- runif(n=1, 0,10) # one random value between 0 and 10
if( (x<a) ??? (b<x) ){
result <- 0
}else{
result <- 1/(b-a)
}
print(paste('x=',round(x,digits=3), ' result=', round(result,digits=3)))
```
iii.
```{r, eval=FALSE}
x <- runif(n=1, 0,10) # one random value between 0 and 10
result <- ifelse( a<=x & x<=b, ???, ??? )
print(paste('x=',round(x,digits=3), ' result=', round(result,digits=3)))
```
3. I often want to repeat some section of code some number of times. For example,
I might want to create a bunch plots that compare the density of a t-distribution
with specified degrees of freedom to a standard normal distribution.
```{r, fig.height=3, message=FALSE, warning=FALSE}
library(ggplot2)
df <- 5
N <- 1000
x.grid <- seq(-3, 3, length=N)
data <- data.frame(
x = c(x.grid, x.grid),
y = c(dnorm(x.grid), dt(x.grid, df)),
type = c( rep('Normal',N), rep('T',N) ) )
# make a nice graph
myplot <- ggplot(data, aes(x=x, y=y, color=type, linetype=type)) +
geom_line() +
labs(title = paste('Std Normal vs t with', df, 'degrees of freedom'))
# actually print the nice graph we made
print(myplot)
```
a) Use a `for` loop to create similar graphs for degrees of freedom
$2,3,4,\dots,29,30$.
b) In retrospect, perhaps we didn't need to produce all of those. Rewrite
your loop so that we only produce graphs for
$\left\{ 2,3,4,5,10,15,20,25,30\right\}$ degrees of freedom.
*Hint: you can just modify the vector in the `for` statement to include*
*the desired degrees of freedom.*
4. It is very common to run simulation studies to estimate probabilities that
are difficult to work out. In this exercise we will investigate a gambling
question that sparked much of the fundamental mathematics behind the study
of [probability](http://homepages.wmich.edu/~mackey/Teaching/145/probHist.html).
The game is to roll a pair of 6-sided dice 24 times. If a "double-sixes" comes
up on any of the 24 rolls, the player wins. What is the probability of winning?
a) We can simulate rolling two 6-sided dice using the `sample()` function
with the `replace=TRUE` option. Read the help file on `sample()` to see
how to sample from the numbers $1,2,\dots,6$. Sum those two die rolls
and save it as `throw`.
b) Write a `for{}` loop that wraps your code from part (a) and then tests
if any of the throws of dice summed to 12. Read the help files for the
functions `any()` and `all()`. Your code should look something
like the following:
```{r, eval=FALSE}
throws <- NULL
for( i in 1:24 ){
throws[i] <- ?????? # Your part (a) code goes here
}
game <- any( throws == 12 ) # Gives a TRUE/FALSE value
```
c) Wrap all of your code from part (b) in *another* `for(){}` loop that you run
10,000 times. Save the result of each game in a `games` vector that will
have 10,000 elements that are either TRUE/FALSE depending on the outcome
of that game. You'll need initiallize the `games` vector to NULL and
modify your part (b) code to save the result into some location in the
vector `games`.
d) Finally, calculate win percentage by taking the average of the `games` vector.