# Chapter 7 Functions and Programming

We have been working with a wide variety of R functions, from simple functions such as `mean()`

and `sd()`

to more complex functions such as `ggplot()`

and `apply()`

. Gaining a better understanding of existing functions and the ability to write your own functions dramatically increases what we can do with R. Learning about R’s programming capabilities is an important step in gaining facility with functions. Note that in this chapter we will use some statistical test (i.e., t-test) to motivate our development of a function. While an understanding of the t-test may make the chapter a bit more fun, you don’t need any knowledge of the statistical theory behind the test at all. Rather, you should focus on how the need for the test motivates the use of a function and other programming techniques.

## 7.1 R Functions

Data on the yield (pounds per acre) of two types of corn seeds (regular and kiln dried) were collected. Each of the 11 plots of land was split into two subplots, and one of the subplots was planted in regular corn while the other was planted in kiln dried corn. These data were analyzed in a famous paper authored by William Gosset. Here are the data.

```
"https://www.finley-lab.com/files/data/corn.csv"
u.corn = read.csv(u.corn, header=TRUE)
corn = corn
```

```
## regular kiln_dried
## 1 1903 2009
## 2 1935 1915
## 3 1910 2011
## 4 2496 2463
## 5 2108 2180
## 6 1961 1925
## 7 2060 2122
## 8 1444 1482
## 9 1612 1542
## 10 1316 1443
## 11 1511 1535
```

A paired t test, or a confidence interval for the mean difference, may be used to assess the difference in yield between the two varieties. Of course R has a function `t.test`

that performs a paired t test and computes a confidence interval, but we will perform the test without using that function. We will focus for now on testing the hypotheses \(H_0\colon \mu_d = 0\) versus \(H_a\colon \mu_d \neq 0\) and on a two-sided confidence interval for \(\mu_d\). Here \(\mu_d\) represents the population mean difference.

The paired \(t\) statistic is defined by \[\begin{equation} t = \frac{\overline d}{S_d/\sqrt{n}} \end{equation}\]

where \(\overline d\) is the mean of the differences, \(S_d\) is the standard deviation of the differences, and \(n\) is the sample size. The p-value is twice the area to the right of \(|t_{\text{obs}}|\), where \(t_{\text{obs}}\) is the observed \(t\) statistic, and a confidence interval is given by \[\begin{equation} \overline d \pm t^* (S_d/\sqrt{n}). \end{equation}\]

Here \(t^*\) is an appropriate quantile of a \(t\) distribution with \(n-1\) degrees of freedom.

```
mean(corn$kiln_dried - corn$regular)
dbar <- length(corn$regular)
n <- sd(corn$kiln_dried - corn$regular)
S_d <- dbar/(S_d/sqrt(n))
t_obs <- t_obs
```

`## [1] 1.69`

```
2*(1 - pt(abs(t_obs), n-1))
pval <- pval
```

`## [1] 0.1218`

```
qt(0.975, n-1)*(S_d/sqrt(n))
margin <- dbar - margin
lcl <- dbar + margin
ucl <- lcl
```

`## [1] -10.73`

` ucl`

`## [1] 78.18`

With a few lines of R code we have calculated the t statistic, the p-value, and the confidence interval. Since paired t tests are pretty common, however, it would be helpful to automate this procedure. One obvious reason is to save time and effort, but another important reason is to avoid mistakes. It would be easy to make a mistake (e.g., using \(n\) instead of \(n-1\) as the degrees of freedom) when repeating the above computations.

Here is a first basic function which automates the computation.

```
function(x1, x2){
paired_t <- length(x1)
n <- mean(x1 - x2)
dbar <- sd(x1 - x2)
s_d <- dbar/(s_d/sqrt(n))
tstat <- 2*(1 - pt(abs(tstat), n-1))
pval <- qt(0.975, n-1)*s_d/sqrt(n)
margin <- dbar - margin
lcl <- dbar + margin
ucl <-return(list(tstat = tstat, pval = pval, lcl=lcl, ucl=ucl))
}
```

And here is the function in action

`paired_t(x1 = corn$kiln_dried, x2 = corn$regular)`

```
## $tstat
## [1] 1.69
##
## $pval
## [1] 0.1218
##
## $lcl
## [1] -10.73
##
## $ucl
## [1] 78.18
```

An explanation and comments on the function are in order.

`paired_t <- function(x1, x2)`

assigns a function of two variables,`x1`

and`x2`

, to an R object called`paired_t`

.- The
*compound expression*, i.e., the code that makes up the body of the function, is enclosed in curly braces`{}`

. `return(list(tstat = tstat, pval = pval, lcl=lcl, ucl=ucl))`

indicates the object(s) returned by the function. In this case the function returns a list with four components.- The body of the function basically mimics the computations required to compute the t statistic, the p-value, and the confidence interval.
- Several objects such as
`n`

and`dbar`

are created in the body of the function. These objects are NOT available outside the function. We will discuss this further when we cover environments and scope in R.

Our function has automated the basic calculations. But it is still somewhat limited in usefulness. For example, it only computes a 95% confidence interval, while a user may want a different confidence level. And the function only performs a two-sided test, while a user may want a one-sided procedure. We modify the function slightly to allow the user to specify the confidence level next.

```
function(x1, x2, cl = 0.95){
paired_t <- length(x1)
n <- mean(x1 - x2)
dbar <- sd(x1 - x2)
s_d <- dbar/(s_d/sqrt(n))
tstat <- 2*(1 - pt(abs(tstat), n-1))
pval <- 1 - (1 - cl)/2
pctile <- qt(pctile, n-1)*s_d/sqrt(n)
margin <- dbar - margin
lcl <- dbar + margin
ucl <-return(list(tstat = tstat, pval = pval, lcl = lcl, ucl=ucl))
}
```

`paired_t(corn$kiln_dried, corn$regular)`

```
## $tstat
## [1] 1.69
##
## $pval
## [1] 0.1218
##
## $lcl
## [1] -10.73
##
## $ucl
## [1] 78.18
```

`paired_t(corn$kiln_dried, corn$regular, cl = 0.99)`

```
## $tstat
## [1] 1.69
##
## $pval
## [1] 0.1218
##
## $lcl
## [1] -29.5
##
## $ucl
## [1] 96.96
```

Two things to note. First, arguments do not have to be named. So

`paired_t(corn$kiln_dried, corn$regular)`

and

`paired_t(x1 = corn$kiln_dried, x2 = corn$regular)`

are equivalent. But we need to be careful if we do not name arguments because then we have to know the ordering of the arguments in the function declaration.

Second, in the declaration of the function, the third argument `cl`

was given a default value of `0.95`

. If a user does not specify a value for `cl`

it will silently be set to `0.95`

. But of course a user can override this, as we did in

`paired_t(corn$kiln_dried, corn$regular, cl = 0.99)`

### 7.1.1 Practice Problem

Like all things in R, getting the hang of writing functions just requires practice. Create a simple function called `FtoK`

that is given a temperature in Farenheit and converts it to Kelvin using the following formula

\[ K = (F - 32) * \frac{5}{9} + 273.15 \]

You should get the following if your function is correct

`FtoK(80)`

`## [1] 299.8`

### 7.1.2 Creating Functions

Creating very short functions at the command prompt is a reasonable strategy. For longer functions, one option is to write the function in a script and then submit the whole function. Or a function can be written in any text editor, saved as a plain text file (possibly with a `.r`

extension), and then read into R using the `source()`

function.

## 7.2 Programming: Conditional Statements

The `paired_t`

function is somewhat useful, but could be improved in several ways. For example, consider the following:

`paired_t(1:5, 1:4)`

```
## Warning in x1 - x2: longer object length is not a
## multiple of shorter object length
## Warning in x1 - x2: longer object length is not a
## multiple of shorter object length
```

```
## $tstat
## [1] 1
##
## $pval
## [1] 0.3739
##
## $lcl
## [1] -1.421
##
## $ucl
## [1] 3.021
```

The user specified data had different numbers of observations in `x1`

and `x2`

, which of course can’t be data tested by a paired t test. Rather than stopping and letting the user know that this is a problem, the function continued and produced meaningless output.

Also, the function as written only allows testing against a two-sided alternative hypothesis, and it would be good to allow one-sided alternatives.

First we will address some checks on arguments specified by the user. For this we will use an `if()`

function and a `stop()`

function.

```
function(x1, x2, cl = 0.95){
paired_t <-if(length(x1) != length(x2)){
stop("The input vectors must have the same length")
} length(x1)
n <- mean(x1 - x2)
dbar <- sd(x1 - x2)
s_d <- dbar/(s_d/sqrt(n))
tstat <- 2*(1 - pt(abs(tstat), n-1))
pval <- 1 - (1 - cl)/2
pctile <- qt(pctile, n-1)*s_d/sqrt(n)
margin <- dbar - margin
lcl <- dbar + margin
ucl <-return(list(tstat = tstat, pval = pval, lcl = lcl, ucl=ucl))
}paired_t(1:5, 1:4)
```

`## Error in paired_t(1:5, 1:4): The input vectors must have the same length`

The argument to the `if()`

function is evaluated. If the argument returns `TRUE`

the ensuing code is executed. Otherwise, the ensuing code is skipped and the rest of the function is evaluated. If a `stop()`

function is executed, the function is exited and the argument of `stop()`

is printed.

To better understand and use `if()`

statements, we need to understand comparison operators and logical operators.

### 7.2.1 Comparison and Logical Operators

We have made use of some of the comparison operators in R. These include

- Equal:
`==`

- Not equal:
`!=`

- Greater than:
`>`

- Less than:
`<`

- Greater than or equal to:
`>=`

- Less than or equal to:
`<=`

Special care needs to be taken with the `==`

and `!=`

operators because of how numbers are represented on computers, see Section 7.3.

There are also three logical operators, with two variants of the “and” operator and the “or” operator.

- and: Either
`&`

or`&&`

- or: Either
`|`

or`||`

- not:
`!`

The “double” operators `&&`

and `||`

just examine the first element of the two vectors, whereas the “single” operators `&`

and `|`

compare element by element.

`c(FALSE, TRUE, FALSE) || c(TRUE, FALSE, FALSE)`

```
## Warning in c(FALSE, TRUE, FALSE) || c(TRUE, FALSE,
## FALSE): 'length(x) = 3 > 1' in coercion to 'logical(1)'
## Warning in c(FALSE, TRUE, FALSE) || c(TRUE, FALSE,
## FALSE): 'length(x) = 3 > 1' in coercion to 'logical(1)'
```

`## [1] TRUE`

`c(FALSE, TRUE, FALSE) | c(TRUE, FALSE, FALSE)`

`## [1] TRUE TRUE FALSE`

`c(FALSE, TRUE, FALSE) && c(TRUE, TRUE, FALSE)`

```
## Warning in c(FALSE, TRUE, FALSE) && c(TRUE, TRUE,
## FALSE): 'length(x) = 3 > 1' in coercion to 'logical(1)'
```

`## [1] FALSE`

`c(FALSE, TRUE, FALSE) & c(TRUE, TRUE, FALSE)`

`## [1] FALSE TRUE FALSE`

We can use the logical operators to check whether a user-specified confidence level is between 0 and 1.

```
function(x1, x2, cl = 0.95){
paired_t <-if(length(x1) != length(x2)){
stop("The input vectors must have the same length")
}if(cl <= 0 || cl >= 1){
stop("The confidence level must be between 0 and 1")
} length(x1)
n <- mean(x1 - x2)
dbar <- sd(x1 - x2)
s_d <- dbar/(s_d/sqrt(n))
tstat <- 2*(1 - pt(abs(tstat), n-1))
pval <- 1 - (1 - cl)/2
pctile <- qt(pctile, n-1)*s_d/sqrt(n)
margin <- dbar - margin
lcl <- dbar + margin
ucl <-return(list(tstat = tstat, pval = pval, lcl = lcl, ucl=ucl))
}paired_t(1:5, 2:6, cl=15)
```

`## Error in paired_t(1:5, 2:6, cl = 15): The confidence level must be between 0 and 1`

### 7.2.2 If else statements

The `if()`

statement we have used so far has the form

```
if (condition) {
expression
}
```

Often we want to evaluate one expression if the condition is true, and evaluate a different expression if the condition is false. That is accomplished by the `if else`

statement. Here we determine whether a number is positive, negative, or zero.

```
function(x){
Sign <-if(x < 0){
print("the number is negative")
else if(x > 0){
}print("the number is positive")
else{
}print("the number is zero")
}
}Sign(3)
```

`## [1] "the number is positive"`

`Sign(-3)`

`## [1] "the number is negative"`

`Sign(0)`

`## [1] "the number is zero"`

Notice the “different expression” for the first `if`

statement was itself an `if`

statement.

Next we modify the `paired_t`

function to allow two-sided and one-sided alternatives.

```
function(x1, x2, cl = 0.95, alternative="not.equal"){
paired_t <-if(length(x1) != length(x2)){
stop("The input vectors must be of the same length")
}if(cl <= 0 || cl >= 1){
stop("The confidence level must be between 0 and 1")
} length(x1)
n <- mean(x1 - x2)
dbar <- sd(x1 - x2)
s_d <- dbar/(s_d/sqrt(n))
tstat <-if(alternative == "not.equal"){
2*(1 - pt(abs(tstat), n-1))
pval <- 1 - (1 - cl)/2
pctile <- qt(pctile, n-1)*s_d/sqrt(n)
margin <- dbar - margin
lcl <- dbar + margin
ucl <-else if(alternative == "greater"){
} 1 - pt(tstat, n-1)
pval <- qt(cl, n-1)*s_d/sqrt(n)
margin <- dbar - margin
lcl <- Inf
ucl <-else if(alternative == "less"){
} pt(tstat, n-1)
pval <- qt(cl, n-1)*s_d/sqrt(n)
margin <- -Inf
lcl <- dbar + margin
ucl <-
}
return(list(tstat = tstat, pval = pval, lcl=lcl, ucl=ucl))
}paired_t(corn$kiln_dried, corn$regular)
```

```
## $tstat
## [1] 1.69
##
## $pval
## [1] 0.1218
##
## $lcl
## [1] -10.73
##
## $ucl
## [1] 78.18
```

`paired_t(corn$kiln_dried, corn$regular, alternative = "less")`

```
## $tstat
## [1] 1.69
##
## $pval
## [1] 0.9391
##
## $lcl
## [1] -Inf
##
## $ucl
## [1] 69.89
```

`paired_t(corn$kiln_dried, corn$regular, alternative = "greater")`

```
## $tstat
## [1] 1.69
##
## $pval
## [1] 0.06091
##
## $lcl
## [1] -2.434
##
## $ucl
## [1] Inf
```

## 7.3 Computer Arithmetic

Like most software, R does not perform exact arithmetic. Rather, R follows the IEEE 754 floating point standards. This can have profound effects on how computational algorithms are implemented, but is also important when considering things like comparisons.

Note first that computer arithmetic does not follow some of the rules of ordinary arithmetic. For example, it is not associative.

`2^-30`

`## [1] 0.0000000009313`

`2^-30 + (2^30 - 2^30)`

`## [1] 0.0000000009313`

`2^-30 + 2^30) - 2^30 (`

`## [1] 0`

Computer arithmetic is not exact.

`1.5 - 1.4`

`## [1] 0.1`

`1.5 - 1.4 == 0.1`

`## [1] FALSE`

`1.5 - 1.4) - 0.1 (`

`## [1] 0.00000000000000008327`

So for example an `if`

statement that uses an equality test may not give the expected answer. One way to avoid this problem is to test “near equality” using `all.equal()`

. The function takes as arguments two objects to be compared, and a tolerance. If the objects are within the tolerance of each other, the function returns `TRUE`

. The tolerance has a default value of about \(1.5\times 10^{-8}\), which works well in many cases.

`all.equal((1.5 - 1.4), 0.1)`

`## [1] TRUE`

## 7.4 Loops

Loops are an important component of any programming language, including R. Vectorized calculations and functions such as `apply()`

make loops a bit less central to R than to many other languages, but an understanding of the three looping structures in R is still quite important.

We will investigate loops in the context of computing what is sometimes called the “machine epsilon.” Because of the inexact representation of numbers in R (and other languages) sometimes R cannot distinguish between the numbers `1`

and `|1 + x|`

for small values of `x`

. The smallest value of `x`

such that `1`

and `|1+x|`

are not declared equal is the machine epsilon.

`1 == 1+10^-4`

`## [1] FALSE`

`1 == 1 + 10^-50`

`## [1] TRUE`

Clearly the machine epsilon is somewhere between \(10^{-4}\) and \(10^{-50}\). How can we find its value exactly? Since floating point numbers use a binary representation, we know that the machine epsilon will be equal to \(1/2^k\) for some value of \(k\). So to find the machine epsilon, we can keep testing whether \(1\) and \(1+1/2^k\) are equal, until we find a value \(k\) where the two are equal. Then the machine epsilon will be \(1/2^{k-1}\), since it is the smallest value for which the two are NOT equal.

`1 == 1+1/2`

`## [1] FALSE`

`1 == 1 + 1/2^2`

`## [1] FALSE`

`1 == 1 + 1/2^3`

`## [1] FALSE`

Testing by hand like this gets tedious quickly. A loop can automate the process. We will do this with two R loop types, `repeat`

and `while`

.

### 7.4.1 A Repeat Loop

A `repeat`

loop just repeats a given expression over and over again until a `break`

statement is encountered.

```
1
k <-repeat{
if(1 == 1+1/2^k){
break
else{
} k+1
k <-
}
} k
```

`## [1] 53`

`1/2^(k-1)`

`## [1] 0.000000000000000222`

This code initializes `k`

at 1. The body of the loop initially checks whether \(1\) and \(1+1/2^k\) are equal. If they are equal, the `break`

statement is executed and control is transferred outside the loop. If they are not equal, `k`

is increased by 1, and we return to the beginning of the top of the body of the loop.

### 7.4.2 A While Loop

A `while`

loop has the form

```
while (condition) {
expression
}
```

As long as the `condition`

is `TRUE`

the `expression`

is evaluated. Once the `condition`

is `FALSE`

control is transferred outside the loop.

```
1
k <-while(1 != 1+1/2^k){
k+1
k <-
} k
```

`## [1] 53`

`1/2^(k-1)`

`## [1] 0.000000000000000222`

### 7.4.3 A For Loop

A `for`

loop has the form

```
for (variable in vector) {
expression
}
```

The `for`

loop sets the `variable`

equal to each element of the `vector`

in succession, and evaluates the `expression`

each time. Here are two different ways to use a `for`

loop to calculate the sum of the elements in a vector.

```
1:10
x <- 0
S <-for(i in 1:length(x)){
S + x[i]
S <-
} S
```

`## [1] 55`

```
0
S =for(value in x){
S + value
S =
} S
```

`## [1] 55`

In the first case we loop over the positions of the vector elements, while in the second case we loop over the elements themselves.

### 7.4.4 Practice Problem

Often when students initially learn about `if()`

statements and `for()`

loops, they use them more often than is necessary, leading to over complications and often slower, less sophisticated code. Many simple `if()`

statements can be accomplished using logical subsetting and vectorization. Using the ideas you used above, and what you learned in Chapters 4 and 6, we can use logical subsetting to replace the simplest of `if()`

statements. Rewrite the following `if()`

statement and `for()`

loop using logical subsetting.

```
rnorm(1000)
a <-for (i in 1:length(a)) {
if (a[i] > 1) {
0
a[i] <-
} }
```

## 7.5 Efficiency Considerations

In many contexts R and modern computers are fast enough that the user does not need to worry about writing efficient code. There are a few simple ways to write efficient code that are easy enough, and provide enough speed-up, that they are worth following as often as possible. The R function `system.time()`

reports how long a set of R code takes to execute, and we will use this function to compare different ways to accomplish objectives in R.

### 7.5.1 Growing Objects

Consider two ways to create a sequence of integers from 1 to n, implemented in functions `f1`

and `f2`

.

- Start with a zero-length vector, and let it grow:

```
function(n){
f1 <- numeric(0)
x <-for(i in 1:n){
c(x,i)
x <-
}
x }
```

- Start with a vector of length \(n\) and fill in the values:

```
function(n){
f2 <- numeric(n)
x <-for(i in 1:n){
i
x[i] <-
}
x }
```

Here are the two functions in action, with \(n = 100000\).

```
100000
n <-system.time(f1(n))
```

```
## user system elapsed
## 11.955 0.044 12.031
```

`system.time(f2(n))`

```
## user system elapsed
## 0.007 0.000 0.007
```

It is *much* more efficient to start with a full-length vector and then fill in values.^{42}

Of course another way to create a vector of integers from 1 to n is to use `1:n`

. Let’s see how fast this is.

`system.time(1:n)`

```
## user system elapsed
## 0.000 0.000 0.005
```

```
1000000
n <-system.time(1:n)
```

```
## user system elapsed
## 0 0 0
```

For \(n=100000\) this is so fast the system time is very close to zero. Even when \(n\) is 1000000 the time is very small. So another important lesson is to use built-in R functions whenever possible, since they have had substantial development focused on efficiency, correctness, etc.

### 7.5.2 Vectorization

Next consider calculating the sum of the squared entries in each column of a matrix. For example with the matrix \(M\),

\[ M = \left(\begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array}\right), \]

the sums would be \(1^2 + 4^2 = 17\), \(2^2 + 5^2 = 29\), and \(3^2 + 6^2 = 45\). One possibility is to have an outer loop that traverses the columns and an inner loop that traverses the rows within a column, squaring the entries and adding them together.

```
matrix(1:6, byrow=TRUE, nrow=2)
test_matrix <- test_matrix
```

```
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
```

```
function(M){
ss1 <- dim(M)[1]
n <- dim(M)[2]
m <- rep(0,m)
out <-for(j in 1:m){
for(i in 1:n){
out[j] + M[i,j]^2
out[j] <-
}
}return(out)
}ss1(test_matrix)
```

`## [1] 17 29 45`

Another possibility eliminates the inner loop, using the `sum()`

function to compute the sum of the squared entries in the column directly.

```
function(M){
ss2 <- dim(M)[2]
m <- numeric(m)
out <-for(j in 1:m){
sum(M[,j]^2)
out[j] <-
}return(out)
}ss2(test_matrix)
```

`## [1] 17 29 45`

A third possibility uses the `colSums()`

function.

```
function(M){
ss3 <- colSums(M^2)
out <-return(out)
}ss3(test_matrix)
```

`## [1] 17 29 45`

Here is a speed comparison, using a \(1000\times 10000\) matrix.

```
matrix(1:10000000, byrow=TRUE, nrow=1000)
mm <-system.time(ss1(mm))
```

```
## user system elapsed
## 0.822 0.000 0.823
```

`system.time(ss2(mm))`

```
## user system elapsed
## 0.075 0.000 0.075
```

`system.time(ss3(mm))`

```
## user system elapsed
## 0.246 0.024 0.271
```

`rm(mm)`

## 7.6 More on Functions

Understanding functions deeply requires a careful study of R’s scoping rules, as well as a good understanding of environments in R. That’s beyond the scope of this book, but we will briefly discuss some issues that are most salient. For a more in-depth treatment, see “Advanced R” by Hadley Wickham, especially the chapters on functions and environments.

## 7.7 Calling Functions

When using a function, the functions arguments can be specified in three ways:

- By the full name of the argument.
- By the position of the argument.
- By a partial name of the argument.

```
function(first.arg, second.arg, third.arg, fourth.arg){
tmp_function <-return(c(first.arg, second.arg, third.arg, fourth.arg))
}tmp_function(34, 15, third.arg = 11, fou = 99)
```

`## [1] 34 15 11 99`

Positional matching of arguments is convenient, but should be used carefully, and probably limited to the first few, and most commonly used, arguments in a function. Partial does have pitfalls. A partially specified argument must unambiguously match exactly one argument—a requirement that’s not met below.

```
function(first.arg, fourth.arg){
tmp_function <-return(c(first.arg, fourth.arg))
}tmp_function(1, f=2)
```

`## Error in tmp_function(1, f = 2): argument 2 matches multiple formal arguments`

### 7.7.1 The `...`

argument

In defining a function, a special argument denoted by `...`

can be used. Sometimes this is called the “ellipsis” argument, sometimes the “three dot” argument, sometimes the “dot dot dot” argument, etc. The R language definition https://cran.r-project.org/doc/manuals/r-release/R-lang.html describes the argument in this way:

The special type of argument `…’ can contain any number of supplied arguments. It is used for a variety of purposes. It allows you to write a function that takes an arbitrary number of arguments. It can be used to absorb some arguments into an intermediate function which can then be extracted by functions called subsequently.

Consider for example the `sum()`

function.

`sum(1:5)`

`## [1] 15`

`sum(1:5, c(3,4,90))`

`## [1] 112`

`sum(1,2,3,c(3,4,90), 1:5)`

`## [1] 118`

Think about writing such a function. There is no way to predict in advance the number of arguments a user might specify. So the function is defined with `...`

as the first argument:

` sum`

`## function (..., na.rm = FALSE) .Primitive("sum")`

This is true of many commonly-used functions in R such as `c()`

among others.

Next, consider a function that calls another function in its body. For example, suppose that a collaborator always supplies comma delimited files that have five lines of description, followed by a line containing variable names, followed by the data. You are tired of having to specify `skip = 5`

, `header = TRUE`

, and `sep = ","`

to `read.table()`

and want to create a function `my.read()`

which uses these as defaults.

```
function(file, header=TRUE, sep = ",", skip = 5, ...){
my.read <-read.table(file = file, header = header, sep = sep, skip = skip, ...)
}
```

The `...`

in the definition of `my.read()`

allows the user to specify other arguments, for example, `stringsAsFactors = FALSE`

. These will be passed on to the `read.table()`

function. In fact, that is how `read.csv()`

is defined.

` read.csv`

```
## function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
## fill = TRUE, comment.char = "", ...)
## read.table(file = file, header = header, sep = sep, quote = quote,
## dec = dec, fill = fill, comment.char = comment.char, ...)
## <bytecode: 0x55e88eb175c0>
## <environment: namespace:utils>
```

### 7.7.2 Lazy Evaluation

Arguments to R functions are not evaluated until they are needed, sometimes called *lazy* evaluation.

```
function(a,b){
f <-print(a^2)
print(a^3)
print(a*b)
}f(a=3, b=2)
```

```
## [1] 9
## [1] 27
## [1] 6
```

`f(a=3)`

```
## [1] 9
## [1] 27
```

`## Error in print(a * b): argument "b" is missing, with no default`

The first call specified both of the arguments `a`

and `b`

, and produced the expected output. In the second call the argument `b`

was not specified. Since it was not needed until the third `print`

statement, R happily executed the first two `print`

statements, and only reported an error in the third statement, when `b`

was needed to compute `a*b`

.

```
function(a,b = a^3){
f <-return(a*b)
}f(2)
```

`## [1] 16`

`f(2,10)`

`## [1] 20`

In the first call, since `b`

was not specified, it was computed as `a^3`

. In the second call, `b`

was specified, and the specified value was used.

Roughly speaking, the first option is slower because each time the vector is increased in size, R must resize the vector and re-allocate memory.↩︎