# Chapter 4 Data Structures

A data structure is a format for organizing and storing data. The structure is designed so that data can be accessed and worked with in specific ways. Statistical software and programming languages have methods (or functions) designed to operate on different kinds of data structures.

This chapter’s focus is on data structures. To help initial understanding, the data in this chapter will be relatively modest in size and complexity. The ideas and methods, however, generalize to larger and more complex data sets.

The base data structures in R are vectors, matrices, arrays, data frames, and lists. The first three, vectors, matrices, and arrays, require all elements to be of the same type or homogeneous, e.g., all numeric or all character. Data frames and lists allow elements to be of different types or heterogeneous, e.g., some elements of a data frame may be numeric while other elements may be character. These base structures can also be organized by their dimensionality, i.e., 1-dimensional, 2-dimensional, or N-dimensional, as shown in Table 4.1.

Dimension | Homogeneous | Heterogeneous |
---|---|---|

1 | Atomic vector | List |

2 | Matrix | Data frame |

N | Array |

R has no scalar types, i.e., 0-dimensional. Individual numbers or strings are actually vectors of length one.

An efficient way to understand what comprises a given object is to use the `str()`

function. `str()`

is short for structure and prints a compact, human-readable description of any R data structure. For example, in the code below, we prove to ourselves that what we might think of as a scalar value is actually a vector of length one.

```
1
a <-str(a)
```

`## num 1`

`is.vector(a)`

`## [1] TRUE`

`length(a)`

`## [1] 1`

Here we assigned `a`

the scalar value one. The `str(a)`

prints `num 1`

, which says `a`

is numeric of length one. Then just to be sure we used the function `is.vector()`

to test if `a`

is in fact a vector. Then, just for fun, we asked the length of `a`

, which again returns one. There are a set of similar logical tests for the other base data structures, e.g., `is.matrix()`

, `is.array()`

, `is.data.frame()`

, and `is.list()`

. These will all come in handy as we encounter different R objects.

## 4.1 Vectors

Think of a vector^{23} as a structure to represent one variable in a data set. For example a vector might hold the weights, in pounds, of 7 people in a data set. Or another vector might hold the genders of those 7 people. The `c()`

function in R is useful for creating (small) vectors and for modifying existing vectors. Think of `c`

as standing for “combine”.

```
c(123, 157, 205, 199, 223, 140, 105)
weight <- weight
```

`## [1] 123 157 205 199 223 140 105`

```
c("female", "female", "male", "female", "male",
gender <-"male", "female")
gender
```

```
## [1] "female" "female" "male" "female" "male"
## [6] "male" "female"
```

Notice that elements of a vector are separated by commas when using the `c()`

function to create a vector. Also notice that character values are placed inside quotation marks.

The `c()`

function also can be used to add to an existing vector. For example, if an eighth male person was included in the data set, and his weight was 194 pounds, the existing vectors could be modified as follows.

```
c(weight, 194)
weight <- c(gender, "male")
gender <- weight
```

`## [1] 123 157 205 199 223 140 105 194`

` gender`

```
## [1] "female" "female" "male" "female" "male"
## [6] "male" "female" "male"
```

### 4.1.1 Types, Conversion, Coercion

Clearly it is important to distinguish between different types of vectors. For example, it makes sense to ask R to calculate the mean of the weights stored in `weight`

, but does not make sense to ask R to compute the mean of the genders stored in `gender`

. Vectors in R may have one of six different “types”: character, double, integer, logical, complex, and raw. Vectors in R may have one of six different “types”: character, double, integer, logical, complex, and raw. We will not encounter the complex and raw types in everyday data analysis, and so we focus on the first four data types.

`character`

: consists of letters or words. Our vector`gender`

is a character vector because it consists of the genders for each person in our dataset.

`typeof(gender)`

`## [1] "character"`

`double`

: a numeric object that can be an integer or non-integer value (e.g., 10, 4.2). Our vector`weight`

is a double vector.

`typeof(weight)`

`## [1] "double"`

`integer`

: a numeric object that can only be an integer. It may be surprising to see the weight variable`weight`

is of type`double`

, even though its values are all integers. By default, R creates a double type vector when numeric values are given via the`c`

function. We can create an integer vector of weight variables by placing the letter`L`

next to each of the numbers when we place it in the vector:

```
c(123L, 157L, 205L, 199L, 223L, 140L, 105L, 194L)
weight.int <-typeof(weight.int)
```

`## [1] "integer"`

`logical`

: used to represent variables that can take values`TRUE`

or`FALSE`

. To illustrate logical vectors, imagine that each of the eight people in the data setwas asked whether they were taking blood pressure medication, and the responses were coded as`TRUE`

if the person answered yes, and`FALSE`

if the person answered no.

```
c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)
bp <- bp
```

`## [1] TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE`

`typeof(bp)`

`## [1] "logical"`

When it makes sense, it is possible to convert vectors to a different type. Consider the following examples.

```
as.integer(weight)
weight.int <- weight.int
```

`## [1] 123 157 205 199 223 140 105 194`

`typeof(weight.int)`

`## [1] "integer"`

```
as.character(weight)
weight.char <- weight.char
```

`## [1] "123" "157" "205" "199" "223" "140" "105" "194"`

```
as.double(bp)
bp.double <- bp.double
```

`## [1] 1 1 0 1 0 0 1 1`

` as.double(gender) gender.oops <-`

`## Warning: NAs introduced by coercion`

` gender.oops`

`## [1] NA NA NA NA NA NA NA NA`

`sum(bp)`

`## [1] 5`

The integer version of `weight`

doesn’t look any different, but it is stored differently, which can be important both for computational efficiency and for interfacing with other languages such as `C++`

. As noted above, however, we will not worry about the distinction between integer and double types. Converting `weight`

to character goes as expected: The character representations of the numbers replace the numbers themselves. Converting the logical vector `bp`

to double is pretty straightforward too: `FALSE`

is converted to zero, and `TRUE`

is converted to one. Now think about converting the character vector `gender`

to a numeric double vector. It’s not at all clear how to represent “female” and “male” as numbers. In fact in this case what R does is to create a character vector, but with each element set to `NA`

, which is the representation of missing data.^{24} Finally consider the code `sum(bp)`

. Now `bp`

is a logical vector, but when R sees that we are asking to sum this logical vector, it automatically converts it to a numerical vector and then adds the zeros and ones representing `FALSE`

and `TRUE`

.

R also has functions to test whether a vector is of a particular type.

`is.double(weight)`

`## [1] TRUE`

`is.character(weight)`

`## [1] FALSE`

`is.integer(weight.int)`

`## [1] TRUE`

`is.logical(bp)`

`## [1] TRUE`

#### 4.1.1.1 Coercion

Consider the following examples.

```
c(1, 2, 3, TRUE)
xx <- xx
```

`## [1] 1 2 3 1`

```
c(1, 2, 3, "dog")
yy <- yy
```

`## [1] "1" "2" "3" "dog"`

```
c(TRUE, FALSE, "cat")
zz <- zz
```

`## [1] "TRUE" "FALSE" "cat"`

`+bp weight`

`## [1] 124 158 205 200 223 140 106 195`

Vectors in R can only contain elements of one type. If more than one type is included in a `c()`

function, R silently *coerces* the vector to be of one type. The examples illustrate the hierarchy—if any element is a character, then the whole vector is character. If some elements are numeric (either integer or double) and other elements are logical, the whole vector is numeric. Note what happened when R was asked to add the numeric vector `weight`

to the logical vector `bp`

. The logical vector was silently coerced to be numeric, so that FALSE became zero and TRUE became one, and then the two numeric vectors were added.

### 4.1.2 Accessing Specific Elements of Vectors

To access and possibly change specific elements of vectors, refer to the position of the element in square brackets. For example, `weight[4]`

refers to the fourth element of the vector `weight`

. Note that R starts the numbering of elements at 1, i.e., the first element of a vector `x`

is `x[1]`

.

` weight`

`## [1] 123 157 205 199 223 140 105 194`

`5] weight[`

`## [1] 223`

`1:3] weight[`

`## [1] 123 157 205`

`length(weight)`

`## [1] 8`

`length(weight)] weight[`

`## [1] 194`

` weight[]`

`## [1] 123 157 205 199 223 140 105 194`

```
3] <- 202
weight[ weight
```

`## [1] 123 157 202 199 223 140 105 194`

Note that including nothing in the square brackets results in the whole vector being returned.

Negative numbers in the square brackets tell R to omit the corresponding value. And a zero as a subscript returns nothing (more precisely, it returns a length zero vector of the appropriate type).

`-3] weight[`

`## [1] 123 157 199 223 140 105 194`

`-length(weight)] weight[`

`## [1] 123 157 202 199 223 140 105`

```
weight[-c(1,3,5)]
lessWeight <- lessWeight
```

`## [1] 157 199 140 105 194`

`0] weight[`

`## numeric(0)`

`c(0,2,1)] weight[`

`## [1] 157 123`

`c(-1, 2)] weight[`

`## Error in weight[c(-1, 2)]: only 0's may be mixed with negative subscripts`

Note that mixing zero and other nonzero subscripts is allowed, but mixing negative and positive subscripts is not allowed.

What about the (usual) case where we don’t know the positions of the elements we want? For example possibly we want the weights of all females in the data. Later we will learn how to subset using logical indices, which is a very powerful way to access desired elements of a vector.

### 4.1.3 Practice Problem

A bad programming technique that often plagues beginners is a technique called *hardcoding*. Consider the following simple vector containing data on the number of tree species found at ten different sites.

` c(10, 13, 15, 8, 2, 9, 10, 20, 9, 11) tree.sp <-`

Suppose we are interested in the second to last value of the data set. Since we know there are ten values in the data set, we do this as follows

`10 - 1] tree.sp[`

`## [1] 9`

This is an example of *hardcoding*. But what if we attempt to use the same code on a second vector of tree species data that only has six sites?

```
c(8, 4, 3, 2, 19, 3)
tree.sp <-10 - 1] tree.sp[
```

`## [1] NA`

That’s clearly not what we want. Fix this code so we can always extract the second to last value in the vector, regardless of the length of the vector.

## 4.2 Factors

Categorical variables can be represented as character vectors. In many cases this simple representation is sufficient. Consider, however, two other categorical variables, one representing age via categories `youth`

, `young adult`

, `middle age`

, `senior`

, and another representing income via categories `lower`

, `middle`

, and `upper`

. Suppose that for the small health data set, all the people are either middle aged or senior citizens. If we just represented the variable via a character vector, there would be no way to know that there are two other categories, representing youth and young adults, which happen not to be present in the data set. And for the income variable, the character vector representation does not explicitly indicate that there is an ordering of the levels.

Factors in R provide a more sophisticated way to represent categorical variables. Factors explicitly contain all possible levels, and allow ordering of levels.

```
c("middle age", "senior", "middle age", "senior",
age <-"senior", "senior", "senior", "middle age")
c("lower", "lower", "upper", "middle", "upper",
income <-"lower", "lower", "middle")
age
```

```
## [1] "middle age" "senior" "middle age" "senior"
## [5] "senior" "senior" "senior" "middle age"
```

` income`

```
## [1] "lower" "lower" "upper" "middle" "upper"
## [6] "lower" "lower" "middle"
```

```
factor(age, levels=c("youth", "young adult", "middle age",
age <-"senior"))
age
```

```
## [1] middle age senior middle age senior
## [5] senior senior senior middle age
## Levels: youth young adult middle age senior
```

```
factor(income, levels=c("lower", "middle", "upper"),
income <-ordered = TRUE)
income
```

```
## [1] lower lower upper middle upper lower lower
## [8] middle
## Levels: lower < middle < upper
```

In the factor version of `age`

the levels are explicitly listed, so it is clear that the two included levels are not all the possible levels. And in the factor version of income, the ordering is explicit.

In many cases the character vector representation of a categorical variable is sufficient and easier to work with. In this book, factors will not be used extensively. It is important to note that R often by default creates a factor when character data are read in, and sometimes it is necessary to use the argument `stringsAsFactors = FALSE`

to explicitly tell R not to do this. This is shown later in the chapter when data frames are introduced.

## 4.3 Missing Data, Infinity, etc.

Most real-world data sets have variables where some observations are missing. In a longitudinal study participants may drop out. In a survey, participants may decide not to respond to certain questions. Statistical software should be able to represent missing data and to analyze data sets in which some data are missing.

In R, the value `NA`

is used for a missing data value. Since missing values may occur in numeric, character, and other types of data, and since R requires that a vector contain only elements of one type, there are different types of `NA`

values. Usually R determines the appropriate type of `NA`

value automatically. It is worth noting that the default type for `NA`

is logical, and that `NA`

is NOT the same as the character string `"NA"`

.

```
c("dog", "cat", NA, "pig", NA, "horse")
missingCharacter <- missingCharacter
```

`## [1] "dog" "cat" NA "pig" NA "horse"`

`is.na(missingCharacter)`

`## [1] FALSE FALSE TRUE FALSE TRUE FALSE`

```
c(missingCharacter, "NA")
missingCharacter <- missingCharacter
```

```
## [1] "dog" "cat" NA "pig" NA "horse"
## [7] "NA"
```

`is.na(missingCharacter)`

`## [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE`

```
c(NA, NA, NA)
allMissing <-typeof(allMissing)
```

`## [1] "logical"`

How should missing data be treated in computations, such as finding the mean or standard deviation of a variable? One possibility is to return `NA`

. Another is to remove the missing value(s) and then perform the computation.

`> mean(c(1,2,3,NA,5))`

`## [1] NA`

`> mean(c(1,2,3,NA,5), na.rm=TRUE)`

`## [1] 2.75`

As this example shows, the default behavior for the `mean()`

function is to return `NA`

. If removal of the missing values and then computing the mean is desired, the argument `na.rm`

is set to `TRUE`

. Different R functions have different default behaviors, and there are other possible actions. Consulting the help for a function provides the details.

### 4.3.1 Practice Problem

Collecting data is often a messy process resulting in multiple errors in the data. Consider the following small vector representing the weights of 10 adults in pounds.

` c(150, 138, 289, 239, 12, 103, 310, 200, 218, 178) my.weights <-`

As far as I know, it’s not possible for an adult to weigh 12 pounds, so that is most likely an error. Change this value to NA, and then find the standard deviation of the weights after removing the NA value.

### 4.3.2 Infinity and NaN

What happens if R code requests division by zero, or results in a number that is too large to be represented? Here are some examples.

```
> x <- 0:4
> x
```

`## [1] 0 1 2 3 4`

`> 1/x`

`## [1] Inf 1.0000 0.5000 0.3333 0.2500`

`> x/x`

`## [1] NaN 1 1 1 1`

```
> y <- c(10, 1000, 10000)
> 2^y
```

`## [1] 1.024e+03 1.072e+301 Inf`

`Inf`

and `-Inf`

represent infinity and negative infinity (and numbers which are too large in magnitude to be represented as floating point numbers). `NaN`

represents the result of a calculation where the result is undefined, such as dividing zero by zero. All of these are common to a variety of programming languages, including R.

## 4.4 Data Frames

Commonly, data is rectangular in form, with variables as columns and cases as rows. Continuing with the (contrived) data on weight, gender, and blood pressure medication, each of those variables would be a column of the data set, and each person’s measurements would be a row. In R, such data are represented as a *data frame*.

```
data.frame(Weight = weight, Gender=gender,
healthData <-bp.meds = bp,
stringsAsFactors=FALSE)
healthData
```

```
## Weight Gender bp.meds
## 1 123 female TRUE
## 2 157 female TRUE
## 3 202 male FALSE
## 4 199 female TRUE
## 5 223 male FALSE
## 6 140 male FALSE
## 7 105 female TRUE
## 8 194 male TRUE
```

`names(healthData)`

`## [1] "Weight" "Gender" "bp.meds"`

`colnames(healthData)`

`## [1] "Weight" "Gender" "bp.meds"`

```
names(healthData) <- c("Wt", "Gdr", "bp")
healthData
```

```
## Wt Gdr bp
## 1 123 female TRUE
## 2 157 female TRUE
## 3 202 male FALSE
## 4 199 female TRUE
## 5 223 male FALSE
## 6 140 male FALSE
## 7 105 female TRUE
## 8 194 male TRUE
```

`rownames(healthData)`

`## [1] "1" "2" "3" "4" "5" "6" "7" "8"`

`names(healthData) <- c("Weight", "Gender", "bp.meds")`

The `data.frame`

function can be used to create a data frame (although it’s more common to read a data frame into R from an external file, something that will be introduced later). The names of the variables in the data frame are given as arguments, as are the vectors of data that make up the variable’s values. The argument `stringsAsFactors=FALSE`

asks R not to convert character vectors into factors. As of version `R 4.0.0`

, `R`

does not automatically convert character vectors into factors. However, up until this recent version, `R`

would automatically convert strings to factors (i.e., `stringsAsFactors = TRUE`

), and so to avoid confusion we will typically display `stringsAsFactors=FALSE`

throughout most of the book. Names of the columns (variables) can be extracted and set via either `names`

or `colnames`

. In the example, the variable names are changed to `Wt, Gdr, bp`

and then changed back to the original `Weight, Gender, bp.meds`

in this way. Rows can be named also. In this case since specific row names were not provided, the default row names of `"1", "2"`

etc. are used.

In the next example a built-in dataset called `mtcars`

is made available by the `data`

function, and then the first and last six rows are displayed using `head`

and `tail`

.

```
data(mtcars)
head(mtcars)
```

```
## mpg cyl disp hp drat wt qsec
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02
## Valiant 18.1 6 225 105 2.76 3.460 20.22
## vs am gear carb
## Mazda RX4 0 1 4 4
## Mazda RX4 Wag 0 1 4 4
## Datsun 710 1 1 4 1
## Hornet 4 Drive 1 0 3 1
## Hornet Sportabout 0 0 3 2
## Valiant 1 0 3 1
```

`tail(mtcars)`

```
## mpg cyl disp hp drat wt qsec vs
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1
## am gear carb
## Porsche 914-2 1 5 2
## Lotus Europa 1 5 2
## Ford Pantera L 1 5 4
## Ferrari Dino 1 5 6
## Maserati Bora 1 5 8
## Volvo 142E 1 4 2
```

Note that the `mtcars`

data frame does have non-default row names which give the make and model of the cars.

### 4.4.1 Accessing Specific Elements of Data Frames

Data frames are two-dimensional, so to access a specific element (or elements) we need to specify both the row and column.

`1,4] mtcars[`

`## [1] 110`

`1:3, 3] mtcars[`

`## [1] 160 160 108`

`1:3, 2:3] mtcars[`

```
## cyl disp
## Mazda RX4 6 160
## Mazda RX4 Wag 6 160
## Datsun 710 4 108
```

`1] mtcars[,`

```
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
## [11] 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9
## [21] 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
```

Note that `mtcars[,1]`

returns ALL elements in the first column. This agrees with the behavior for vectors, where leaving a subscript out of the square brackets tells R to return all values. In this case we are telling R to return all rows, and the first column.

For a data frame there is another way to access specific columns, using the `$`

notation.

`> mtcars$mpg`

```
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2
## [11] 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9
## [21] 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
```

`> mtcars$cyl`

```
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8
## [26] 4 4 4 8 6 8 4
```

`> mpg`

`## Error in eval(expr, envir, enclos): object 'mpg' not found`

`> cyl`

`## Error in eval(expr, envir, enclos): object 'cyl' not found`

`> weight`

`## [1] 123 157 202 199 223 140 105 194`

Notice that typing the variable name, such as `mpg`

, without the name of the data frame (and a dollar sign) as a prefix, does not work. This is sensible. There may be several data frames that have variables named `mpg`

, and just typing `mpg`

doesn’t provide enough information to know which is desired. But if there is a vector named `mpg`

that is created outside a data frame, it will be retrieved when `mpg`

is typed, which is why typing `weight`

does work, since `weight`

was created outside of a data frame, although ultimately it was incorporated into the `healthData`

data frame.

## 4.5 Lists

The third main data structure we will work with is a list. Technically a list is a vector, but one in which elements can be of different types. For example a list may have one element that is a vector, one element that is a data frame, and another element that is a function. Consider designing a function that fits a simple linear regression model to two quantitative variables. We might want that function to compute and return several things such as

- The fitted slope and intercept (a numeric vector with two components)
- The residuals (a numeric vector with \(n\) components, where \(n\) is the number of data points)
- Fitted values for the data (a numeric vector with \(n\) components, where \(n\) is the number of data points)
- The names of the dependent and independent variables (a character vector with two components)

In fact R has a function, `lm`

, which does this (and much more).

```
lm(mpg ~ hp, data=mtcars)
mpgHpLinMod <-mode(mpgHpLinMod)
```

`## [1] "list"`

`names(mpgHpLinMod)`

```
## [1] "coefficients" "residuals" "effects"
## [4] "rank" "fitted.values" "assign"
## [7] "qr" "df.residual" "xlevels"
## [10] "call" "terms" "model"
```

`$coefficients mpgHpLinMod`

```
## (Intercept) hp
## 30.09886 -0.06823
```

`$residuals mpgHpLinMod`

```
## Mazda RX4 Mazda RX4 Wag
## -1.59375 -1.59375
## Datsun 710 Hornet 4 Drive
## -0.95363 -1.19375
## Hornet Sportabout Valiant
## 0.54109 -4.83489
## Duster 360 Merc 240D
## 0.91707 -1.46871
## Merc 230 Merc 280
## -0.81717 -2.50678
## Merc 280C Merc 450SE
## -3.90678 -1.41777
## Merc 450SL Merc 450SLC
## -0.51777 -2.61777
## Cadillac Fleetwood Lincoln Continental
## -5.71206 -5.02978
## Chrysler Imperial Fiat 128
## 0.29364 6.80421
## Honda Civic Toyota Corolla
## 3.84901 8.23598
## Toyota Corona Dodge Challenger
## -1.98072 -4.36462
## AMC Javelin Camaro Z28
## -4.66462 -0.08293
## Pontiac Firebird Fiat X1-9
## 1.04109 1.70421
## Porsche 914-2 Lotus Europa
## 2.10991 8.01093
## Ford Pantera L Ferrari Dino
## 3.71340 1.54109
## Maserati Bora Volvo 142E
## 7.75761 -1.26198
```

The `lm`

function returns a list (which in the code above has been assigned to the object `mpgHpLinMod`

).^{25} One component of the list is the length 2 vector of coefficients, while another component is the length 32 vector of residuals. The code also illustrates that named components of a list can be accessed using the dollar sign notation, as with data frames.

The `list`

function is used to create lists.

```
list(first=weight, second=healthData,
temporaryList <-pickle=list(a = 1:10, b=healthData))
temporaryList
```

```
## $first
## [1] 123 157 202 199 223 140 105 194
##
## $second
## Weight Gender bp.meds
## 1 123 female TRUE
## 2 157 female TRUE
## 3 202 male FALSE
## 4 199 female TRUE
## 5 223 male FALSE
## 6 140 male FALSE
## 7 105 female TRUE
## 8 194 male TRUE
##
## $pickle
## $pickle$a
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $pickle$b
## Weight Gender bp.meds
## 1 123 female TRUE
## 2 157 female TRUE
## 3 202 male FALSE
## 4 199 female TRUE
## 5 223 male FALSE
## 6 140 male FALSE
## 7 105 female TRUE
## 8 194 male TRUE
```

Here, for illustration, I assembled a list to hold some of the R data structures we have been working with in this chapter. The first list element, named `first`

, holds the `weight`

vector we created in Section 4.1, the second list element, named `second`

, holds the `healthData`

data frame, and the third list element, named `pickle`

, holds a list with elements named `a`

and `b`

that hold a vector of values 1 through 10 and another copy of the `healthData`

data frame, respectively. As this example shows, a list can contain another list.

### 4.5.1 Accessing Specific Elements of Lists

We already have seen the dollar sign notation works for lists. In addition, the square bracket subsetting notation can be used. There is an added, somewhat subtle wrinkle—using either single or double square brackets.

`$first temporaryList`

`## [1] 123 157 202 199 223 140 105 194`

`mode(temporaryList$first)`

`## [1] "numeric"`

`1]] temporaryList[[`

`## [1] 123 157 202 199 223 140 105 194`

`mode(temporaryList[[1]])`

`## [1] "numeric"`

`1] temporaryList[`

```
## $first
## [1] 123 157 202 199 223 140 105 194
```

`mode(temporaryList[1])`

`## [1] "list"`

Note the dollar sign and double bracket notation return a numeric vector, while the single bracket notation returns a list. Notice also the difference in results below.

`c(1,2)] temporaryList[`

```
## $first
## [1] 123 157 202 199 223 140 105 194
##
## $second
## Weight Gender bp.meds
## 1 123 female TRUE
## 2 157 female TRUE
## 3 202 male FALSE
## 4 199 female TRUE
## 5 223 male FALSE
## 6 140 male FALSE
## 7 105 female TRUE
## 8 194 male TRUE
```

`c(1,2)]] temporaryList[[`

`## [1] 157`

The single bracket form returns the first and second elements of the list, while the double bracket form returns the second element in the first element of the list. Generally, do not put a vector of indices or names in a double bracket, you will likely get unexpected results. See, for example, the results below.^{26}

`c(1,2,3)]] temporaryList[[`

`## Error in temporaryList[[c(1, 2, 3)]]: recursive indexing failed at level 2`

So, in summary, there are two main differences between using the single bracket `[]`

and double bracket `[[]]`

. First, the single bracket will return a list that holds the object(s) held at the given indices or names placed in the bracket, whereas the double brackets will return the actual object held at the index or name placed in the innermost bracket. Put differently, a single bracket can be used to access a range of list elements and will return a list, and a double bracket can only access a single element in the list and will return the object held at the index.

## 4.6 Comparison and logical operators

*Comparison operators* are binary operators that test a comparative condition between the operands and return a logical value to indicate the test result. We often use comparison operators to gain access to only part of an R object that passes some logical test. You’re likely already familiar with many comparison operators.

The basic idea of comparison operators is quite simple. We have a logical test (e.g., what weights are greater than 200) and want to determine what values in a vector (or some other R object) pass the test. When we apply a comparison operator, the results are logical values that indicate whether or not the specific element in the vector passes the test (`TRUE`

) or not (`FALSE`

).

Let’s walk through the comparison operators available in R. We’ll present the operator and its definition, followed by an example using the `weight`

and `gender`

vectors created in Section 4.1. First, let’s recall the values held in these vectors.

` weight`

`## [1] 123 157 202 199 223 140 105 194`

` gender`

```
## [1] "female" "female" "male" "female" "male"
## [6] "male" "female" "male"
```

`==`

the equality operator: The “double equals sign” tests if operands are equal. Below we perform a logical test to determine which`gender`

vector elements equal`male`

.

`== "male" gender `

`## [1] FALSE FALSE TRUE FALSE TRUE TRUE FALSE TRUE`

Not surprisingly, the third, fifth, sixth, and eigth elements return `TRUE`

and all other elements return `FALSE`

. Notice we’re using the `==`

sign, not the `=`

sign. Mixing up the comparison operator `==`

and assignment operator `=`

is a common error.

`!=`

the inequality operator: Tests if operands are not equal, and is thus the inverse of`==`

. We see this by testing which`gender`

vector elements do not equal`male`

.

`!= "male" gender `

`## [1] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE`

`<`

,`<=`

,`>`

,`>=`

less than, less than or equal to, greater than, and greater than or equal to operators, respectively. Using the`weights`

vector, determine which elements are greater than 194 and then greater than or equal to 194.

`> 194 weight `

`## [1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE`

`>= 194 weight `

`## [1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE`

Suppose we want to know which `weight`

vector elements are greater than 194 *and* less than 210. Answering this question requires use of two comparison operators, i.e., \(<\) and \(>\). In such cases, logical operators are used to combine multiple comparison operations into a single logical statement. We consider the following *logical operators* “and”, “or”, “xor”, and “negation”.

Importantly, in order of operation, comparison operators precede logical operators. The `Syntax`

manual page (i.e., run `?Syntax`

on the Console) lists R operators’ order of operation, where you’ll notice the comparison operators are listed before the logical operators in the precedence groups under the Details Section.

Let’s walk through each of the logical operators:

`&`

the “and” operator: A comparison using the`&`

operator returns`TRUE`

when both operands are`TRUE`

and`FALSE`

otherwise. The`&`

operator works elementwise for operand vectors. Consider the following example.

`< 210 weight `

`## [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE`

`> 194 weight `

`## [1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE`

`< 210 & weight > 194 weight `

`## [1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE`

First we show the results of `weight < 210`

and `weight > 194`

separately. When combining the comparison operations using the `&`

operator, R first performs `weight < 210`

and `weight > 194`

, then applies `&`

elementwise on the logical vector operands. The elementwise `&`

returns `TRUE`

when the element in the `weight < 210`

vector is `TRUE`

and the element in the `weight > 194`

vector is `TRUE`

. The key point to remember is that `&`

returns `TRUE`

only if both operands are `TURE`

.

`|`

the “or” operator: A comparison using the`|`

operator returns`TRUE`

if at least one operand is`TRUE`

and`FALSE`

otherwise. Similar to the`&`

operator, the`|`

operator works element by element. Let’s use the same example as before, but now we’ll return individuals with a weight less than 210 or a weight greater than 194.

`< 210 | weight > 194 weight `

`## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE`

Not surprisingly, this operation returns `TRUE`

for all elements, because all elements in `weight`

are either greater than 194 or less than 210.

`xor`

the “exclusive or” operator: A comparison using the`xor`

operator returns`TRUE`

if one of the operands is`TRUE`

and`FALSE`

otherwise.

`xor(weight < 210, weight > 194)`

`## [1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE`

While we can imagine cases where this operator would be handy, we’ve never found the occasion to use it in our own code.

`!`

the “negation” or “not” operator: The exclamation point`!`

(called “bang” in programmer’s slang) reverses a logical value, i.e.`!TRUE`

is`FALSE`

and`!FALSE`

is`TRUE`

. The code below returns`TRUE`

for weight values not greater than 194 (while not required, the parentheses emphasize the order of operation).

`!(weight > 194)`

`## [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE`

There is a “&&” and “||” variant of “&” and “|”, respectively. These “double” operators examine only the first element of operand vectors in a comparison rather than comparing element by element. There are a few cases where using `&&`

and `||`

are useful when writing conditional statements in functions (see, e.g., Chapter 7), however, we’ll generally not use them in this book.

### 4.6.1 The `%in%`

operator

Suppose we want to identify the `weight`

vector elements equal to 123, 199, or 140. We can do this using the equality operator `==`

and the `|`

operator as follows.

` weight`

`## [1] 123 157 202 199 223 140 105 194`

`== 123 | weight == 199 | weight == 140 weight `

`## [1] TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE`

However, this is a little clunky, involves a lot of typing, and generally makes code hard to read. Lucky for us, R has the “in” operator, `%in%`

, to accomplish this task in a more intuitive and easy-to-read manner.

`%in% c(123, 199, 140) weight `

`## [1] TRUE FALSE FALSE TRUE FALSE TRUE FALSE FALSE`

In the spirit of coding techniques to promote efficient and reproducible code, we’ll use the `%in%`

operator throughout the book.

Comparison and logical operators are invaluable to identify subsets of data that meet specified conditions. The next section explores how conditional and logical operators facilitate subsetting vectors, data frames, and lists.

## 4.7 Subsetting with Logical Vectors

Consider the `healthData`

data frame. How can we access only those weights which are more than 200? How can we access the genders of those whose weights are more than 200? How can we compute the mean weight of males and the mean weight of females? Or consider the `mtcars`

data frame. How can we obtain the miles per gallon for all six cylinder cars? Both of these data sets are small enough that it would not be too onerous to extract the values by hand. But for larger or more complex data sets, this would be very difficult or impossible to do in a reasonable amount of time, and would likely result in errors.

R has a powerful method for solving these sorts of problems using a variant of the subsetting methods that we already have learned. When given a logical vector in square brackets, R will return the values corresponding to `TRUE`

.
To begin, focus on the `weight`

and `gender`

vectors created in Section 4.1.

The R code `weight > 200`

returns a `TRUE`

for each value of `weight`

which is more than 200, and a `FALSE`

for each value of `weight`

which is less than or equal to 200. Similarly `gender == "female"`

returns `TRUE`

or `FALSE`

depending on whether an element of `gender`

is equal to `female`

.

` weight`

`## [1] 123 157 202 199 223 140 105 194`

`> 200 weight `

`## [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE`

`> 200] gender[weight `

`## [1] "male" "male"`

`> 200] weight[weight `

`## [1] 202 223`

`== "female" gender `

`## [1] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE`

`== "female"] weight[gender `

`## [1] 123 157 199 105`

Consider the lines of R code one by one.

`weight`

instructs R to display the values in the vector`weight`

.`weight > 200`

instructs R to check whether each value in`weight`

is greater than 200, and to return`TRUE`

if so, and`FALSE`

otherwise.- The next line,
`gender[weight > 200]`

, does two things. First, inside the square brackets, it does the same thing as the second line, namely, returning`TRUE`

or`FALSE`

depending on whether a value of`weight`

is or is not greater than 200. Second, each element of`gender`

is matched with the corresponding`TRUE`

or`FALSE`

value, and is returned if and only if the corresponding value is`TRUE`

. For example the first value of`gender`

is`gender[1]`

. Since the first`TRUE`

or`FALSE`

value is`FALSE`

, the first value of`gender`

is not returned. Only the third and fifth values of`gender`

, both of which happen to be`male`

, are returned. Briefly, this line returns the genders of those people whose weight is over 200 pounds. - The fourth line of code,
`weight[weight > 200]`

, again begins by returning`TRUE`

or`FALSE`

depending on whether elements of`weight`

are larger than 200. Then those elements of`weight`

corresponding to`TRUE`

values, are returned. So this line returns the weights of those people whose weights are more than 200 pounds. - The fifth line returns
`TRUE`

or`FALSE`

depending on whether elements of`gender`

are equal to`female`

or not. - The sixth line returns the weights of those whose gender is
`female`

.

### 4.7.1 Modifying or Creating Objects via Subsetting

The results of subsetting can be assigned to a new (or existing) R object, and subsetting on the left side of an assignment is a common way to modify an existing R object.

` weight`

`## [1] 123 157 202 199 223 140 105 194`

```
weight[weight < 200]
light.weight <- light.weight
```

`## [1] 123 157 199 140 105 194`

```
1:10
x <- x
```

`## [1] 1 2 3 4 5 6 7 8 9 10`

```
< 5] <- 0
x[x x
```

`## [1] 0 0 0 0 5 6 7 8 9 10`

```
-3:9
y <- y
```

`## [1] -3 -2 -1 0 1 2 3 4 5 6 7 8 9`

```
< 0] <- NA
y[y y
```

`## [1] NA NA NA 0 1 2 3 4 5 6 7 8 9`

```
rm(x)
rm(y)
```

### 4.7.2 Logical Subsetting and Data Frames

First consider the small and simple `healthData`

data frame.

` healthData`

```
## Weight Gender bp.meds
## 1 123 female TRUE
## 2 157 female TRUE
## 3 202 male FALSE
## 4 199 female TRUE
## 5 223 male FALSE
## 6 140 male FALSE
## 7 105 female TRUE
## 8 194 male TRUE
```

`$Weight[healthData$Gender == "male"] healthData`

`## [1] 202 223 140 194`

`$Gender == "female", ] healthData[healthData`

```
## Weight Gender bp.meds
## 1 123 female TRUE
## 2 157 female TRUE
## 4 199 female TRUE
## 7 105 female TRUE
```

`$Weight > 190, 2:3] healthData[healthData`

```
## Gender bp.meds
## 3 male FALSE
## 4 female TRUE
## 5 male FALSE
## 8 male TRUE
```

The first example is really just subsetting a vector, since the `$`

notation creates vectors. The second two examples return subsets of the whole data frame. Note that the logical vector subsets the rows of the data frame, choosing those rows where the gender is female or the weight is more than 190. Note also that the specification for the columns (after the comma) is left blank in the first case, telling R to return all the columns. In the second case the second and third columns are requested explicitly.

Next consider the much larger and more complex `WorldBank`

data frame. Recall, the `str`

function displays the “structure” of an R object. Here is a look at the structure of several R objects.

`str(mtcars)`

```
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
```

`str(temporaryList)`

```
## List of 3
## $ first : num [1:8] 123 157 202 199 223 140 105 194
## $ second:'data.frame': 8 obs. of 3 variables:
## ..$ Weight : num [1:8] 123 157 202 199 223 140 105 194
## ..$ Gender : chr [1:8] "female" "female" "male" "female" ...
## ..$ bp.meds: logi [1:8] TRUE TRUE FALSE TRUE FALSE FALSE ...
## $ pickle:List of 2
## ..$ a: int [1:10] 1 2 3 4 5 6 7 8 9 10
## ..$ b:'data.frame': 8 obs. of 3 variables:
## .. ..$ Weight : num [1:8] 123 157 202 199 223 140 105 194
## .. ..$ Gender : chr [1:8] "female" "female" "male" "female" ...
## .. ..$ bp.meds: logi [1:8] TRUE TRUE FALSE TRUE FALSE FALSE ...
```

`str(WorldBank)`

```
## 'data.frame': 11880 obs. of 15 variables:
## $ iso2c : chr "AD" "AD" "AD" "AD" ...
## $ country : chr "Andorra" "Andorra" "Andorra" "Andorra" ...
## $ year : int 1978 1979 1977 2007 1976 2011 2012 2008 1980 1972 ...
## $ fertility.rate : num NA NA NA 1.18 NA NA NA 1.25 NA NA ...
## $ life.expectancy : num NA NA NA NA NA NA NA NA NA NA ...
## $ population : num 33746 34819 32769 81292 31781 ...
## $ GDP.per.capita.Current.USD : num 9128 11820 7751 39923 7152 ...
## $ X15.to.25.yr.female.literacy: num NA NA NA NA NA NA NA NA NA NA ...
## $ iso3c : chr "AND" "AND" "AND" "AND" ...
## $ region : chr "Europe & Central Asia (all income levels)" "Europe & Central Asia (all income levels)" "Europe & Central Asia (all income levels)" "Europe & Central Asia (all income levels)" ...
## $ capital : chr "Andorra la Vella" "Andorra la Vella" "Andorra la Vella" "Andorra la Vella" ...
## $ longitude : num 1.52 1.52 1.52 1.52 1.52 ...
## $ latitude : num 42.5 42.5 42.5 42.5 42.5 ...
## $ income : chr "High income: nonOECD" "High income: nonOECD" "High income: nonOECD" "High income: nonOECD" ...
## $ lending : chr "Not classified" "Not classified" "Not classified" "Not classified" ...
```

First we see that `mtcars`

is a data frame which has 32 observations (rows) on each of 11 variables (columns). The names of the variables are given, along with their type (in this case, all numeric), and the first few values of each variable is given.

Second we see that `temporaryList`

is a list with three components. Each of the components is described separately, with the first few values again given.

Third we examine the structure of `WorldBank`

. It is a data frame with 11880 observations on each of 15 variables. Some of these are character variables, some are numeric, and one (`year`

) is integer. Looking at the first few values we see that some variables have missing values.

Consider creating a data frame which only has the observations from one year, say 1971. That’s relatively easy. Just choose rows for which `year`

is equal to 1971.

```
WorldBank[WorldBank$year == 1971, ]
WorldBank1971 <-dim(WorldBank1971)
```

`## [1] 216 15`

The `dim`

function returns the dimensions of a data frame, i.e., the number of rows and the number of columns. From `dim`

we see that there are `dim(WorldBank1971)[1]`

cases from 1971.

Next, how can we create a data frame which only contains data from 1971, and also only contains cases for which there are no missing values in the fertility rate variable? R has a built in function `is.na`

which returns `TRUE`

if the observation is missing and returns `FALSE`

otherwise. And `!is.na`

returns the negation, i.e., it returns `FALSE`

if the observation is missing and `TRUE`

if the observation is not missing.

`$fertility.rate[1:25] WorldBank1971`

```
## [1] NA 6.512 7.671 3.517 4.933 3.118 7.264 3.104
## [9] NA 2.200 2.961 2.788 4.479 2.260 2.775 2.949
## [17] 6.942 2.210 6.657 2.100 6.293 7.329 6.786 NA
## [25] 5.771
```

`!is.na(WorldBank1971$fertility.rate[1:25])`

```
## [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [9] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [17] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [25] TRUE
```

```
WorldBank1971[!is.na(WorldBank1971$fertility.rate),]
WorldBank1971 <-dim(WorldBank1971)
```

`## [1] 193 15`

From `dim`

we see that there are 193 cases from 1971 with non-missing fertility rate data.

Return attention now to the original `WorldBank`

data frame with data not only from 1971. How can we extract only those cases (rows) which have NO missing data? Consider the following simple example:

```
data.frame(V1 = c(1, 2, 3, 4, NA),
temporaryDataFrame <-V2 = c(NA, 1, 4, 5, NA),
V3 = c(1, 2, 3, 5, 7))
temporaryDataFrame
```

```
## V1 V2 V3
## 1 1 NA 1
## 2 2 1 2
## 3 3 4 3
## 4 4 5 5
## 5 NA NA 7
```

`is.na(temporaryDataFrame)`

```
## V1 V2 V3
## [1,] FALSE TRUE FALSE
## [2,] FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE
## [5,] TRUE TRUE FALSE
```

`rowSums(is.na(temporaryDataFrame))`

`## [1] 1 0 0 0 2`

First notice that `is.na`

will test each element of a data frame for missingness. Also recall that if R is asked to sum a logical vector, it will first convert the logical vector to numeric and then compute the sum, which effectively counts the number of elements in the logical vector which are `TRUE`

. The `rowSums`

function computes the sum of each row. So `rowSums(is.na(temporaryDataFrame))`

returns a vector with as many elements as there are rows in the data frame. If an element is zero, the corresponding row has no missing values. If an element is greater than zero, the value is the number of variables which are missing in that row. This gives a simple method to return all the cases which have no missing data.

`dim(WorldBank)`

`## [1] 11880 15`

```
WorldBank[rowSums(is.na(WorldBank)) == 0,]
WorldBankComplete <-dim(WorldBankComplete)
```

`## [1] 564 15`

Out of the 564 rows in the original data frame, only 564 have no missing observations!

## 4.8 Patterned Data

Sometimes it is useful to generate all the integers from 1 through 20, to generate a sequence of 100 points equally spaced between 0 and 1, etc. The R functions `seq()`

and `rep()`

as well as the “colon operator” `:`

help to generate such sequences.

The colon operator generates a sequence of values with increments of \(1\) or \(-1\).

`1:10`

`## [1] 1 2 3 4 5 6 7 8 9 10`

`-5:3`

`## [1] -5 -4 -3 -2 -1 0 1 2 3`

`10:4`

`## [1] 10 9 8 7 6 5 4`

`:7 pi`

`## [1] 3.142 4.142 5.142 6.142`

The `seq()`

function generates either a sequence of pre-specified length or a sequence with pre-specified increments.

`seq(from = 0, to = 1, length = 11)`

`## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0`

`seq(from = 1, to = 5, by = 1/3)`

```
## [1] 1.000 1.333 1.667 2.000 2.333 2.667 3.000 3.333
## [9] 3.667 4.000 4.333 4.667 5.000
```

`seq(from = 3, to = -1, length = 10)`

```
## [1] 3.0000 2.5556 2.1111 1.6667 1.2222 0.7778
## [7] 0.3333 -0.1111 -0.5556 -1.0000
```

The `rep()`

function replicates the values in a given vector.

`rep(c(1,2,4), length = 9)`

`## [1] 1 2 4 1 2 4 1 2 4`

`rep(c(1,2,4), times = 3)`

`## [1] 1 2 4 1 2 4 1 2 4`

`rep(c("a", "b", "c"), times = c(3, 2, 7))`

`## [1] "a" "a" "a" "b" "b" "c" "c" "c" "c" "c" "c" "c"`

### 4.8.1 Practice Problem

Often when using R you will want to simulate data from a specific probability distribution (i.e. normal/Gaussian, bionmial, Poisson). R has a vast suite of functions for working with statistical distributions. To generate values from a statistical distribution, the function has a name beginning with an “r” followed by some abbreviation of the probability distribution. For example to simulate from the three distributions mentioned above, we can use the functions `rnorm()`

, `rbinom()`

, and `rpois()`

.

Use the `rnorm()`

function to generate 10,000 values from the standard normal distribution (the normal distribution with mean = 0 and variance = 1). Consult the help page for `rnorm()`

if you need to. Save this vector of variables to a vector named `sim.vals`

. Then use the `hist()`

function to draw a histogram of the simulated data. Does the data look like it follows a normal distribution?

## 4.9 Exercises

**Exercise 3** Learning objectives: create, subset, and manipulate vector contents and attributes; summarize vector data using R `table()`

and other functions; generate basic graphics using vector data.

**Exercise 4** Learning objectives: use functions to describe data frame characteristics; summarize and generate basic graphics for variables held in data frames; apply the subset function with logical operators; illustrate `NA`

, `NaN`

, `Inf`

, and other special values; recognize the implications of using floating point arithmetic with logical operators.

**Exercise 5** Learning objectives: practice with lists, data frames, and associated functions; summarize variables held in lists and data frames; work with R’s linear regression `lm()`

function output; review logical subsetting of vectors for partitioning and assigning of new values; generate and visualize data from mathematical functions.

Technically the objects described in this section are “atomic” vectors (all elements of the same type), since lists, to be described below, also are actually vectors. This will not be an important issue, and the shorter term vector will be used for atomic vectors below.↩︎

Missing data will be discussed in more detail later in the chapter.↩︎

The

`mode`

function returns the type or storage mode of an object.↩︎Try this example using only single brackets\(\ldots\) it will return a list holding elements

`first`

,`second`

, and`pickle`

.↩︎