Chapter 6 Vectors

6.1 Introduction

Vector is the most basic data structure in R. It is a sequence of elements of the same data type. If the elements are of different data types, they will be coerced to a common type that can accommodate all the elements. Vectors are generally created using c() (concatenate function), although depending on the data type of vector being created, other methods can be used.

6.2 Numeric Vector

We will create a numeric vector using c() but you can use any function that creates a sequence of numbers. After that we will use is.vector() to check if it is a vector and class to check the data type.

## [1] 1 2 3
## [1] TRUE
## [1] "numeric"

Let us look at other ways to create a sequence of numbers. We leave it as an exercise to the reader to understand the functions using help.

##  [1]  1  2  3  4  5  6  7  8  9 10
## [1] 1 1 1 1 1
##  [1]  1  2  3  4  5  6  7  8  9 10

6.3 Integer Vector

Creating an integer vector is similar to numeric vector except that we need to instruct R to treat the data as integer and not numeric or double. We will use the same methods we used for creating numeric vectors. To specify that the data is of type integer, we suffix the number with L.

## [1] 1 2 3
## [1] "integer"
##  [1]  1  2  3  4  5  6  7  8  9 10
## [1] 1 1 1 1 1
##  [1]  1  2  3  4  5  6  7  8  9 10

6.4 Character Vector

A character vector may contain a single character, a word or a group of words. The elements must be enclosed in single or double quotations.

## [1] "hello"        "good morning"
## [1] "character"

6.5 Logical Vector

A vector of logical values will either contain TRUE or FALSE or both.

## [1]  TRUE FALSE  TRUE  TRUE FALSE
## [1] "logical"

In fact, you can create an integer vector and coerce it to type logical.

## [1] 0 1 0 1 0 1
## [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE
## [1] "logical"

6.6 Naming Vector Elements

It is possible to name the different elements of a vector. The advantage of naming vector elements is that we can later on use these names to access the elements. Use names() to specify the names of a vector. You can specify the names while creating the vector or add them later.

6.7 Vector Coercion

Vectors are homogeneous i.e. all the elements of the vector must be of the same type. If we try to create a vector by combining different data types, the elements will be coerced to the most flexible type. The below table shows the order in which coercion occurs.

character data type is the most flexible while logical data type is the least flexible. If you try to combine any other data type with character, all the elements will be coerced to type character. In the absence of character data, all elements will be coerced to numeric. Finally, if the data does not include character or numeric types, all the elements will be coerced to integer type.

6.7.3 Case : Integer and Logical

To summarize, below is the order in which coercion takes place:

6.8 Vector Operations

In this section, we look at simple operations that can be performed on vectors in R. Remember that the nature of the operations depends upon the type of data. Below are a few examples:

6.8.2 Case 2: Vectors of different length

In the previous case, the length i.e. the number of elements in the vectors were same. What happens if the length of the vectors are unequal? In such cases, the shorter vector is recycled to match the length of the longer vector. The below example should clear this concept:

6.9 Missing Data

Missing data is a reality. No matter how careful you are in collecting data for your analysis, chances are always high that you end up with some missing data. In R missing values are represented by NA. In this section, we will focus on the following:

  • test for missing data
  • remove missing data
  • exclude missing data from analysis

6.9.1 Detect missing data

We first create a vector with missing values. After that, we will use is.na() to test whether the data contains missing values. is.na() returns a logical vector equal to the length of the vector being tested. Another function that can be used for detecting missing values is complete.cases(). Below is an example:

6.9.2 Omit missing data

In the presensce of missing data, all computations in R will return NA. To avoid this, we might want to remove the missing data before doing any computation. na.omit() will remove all missing values from the data. Let us look at an example:

6.9.3 Exclude missing data

To exclude missing values from computations, use na.rm and set it to TRUE.

6.10 Index Vectors

One of the most important steps in data analysis is selecting a subset of data from a bigger data set. Indexing helps in retrieving values individually or a set of values that meet a specific criteria. In this chapter, we look at various ways of indexing/subsetting vectors.

[] is the index operator in R. We can use various expressions within [] to subset data. In R, index positions begin at 1 and not 0. To begin with, let us look at values in different index positions:

##  [1]  1  9  7  6 10  2  8  4  5  3
## [1] 7
## [1] 8

6.10.1 Out of range index

##  [1]  3  4  8 10  9  2  5  7  1  6
## integer(0)
## [1] 10
## [1] NA

In the first case, we specified the index as 0 and in the second case we used the index 11, which is greater than the length of the vector. R returns an empty vector in the first case and NA in the second case.

6.10.2 Negative Index

Using a negative index will delete the value in the said index position. Unlike other languages, it will not index elements from the end of the vector counting backwards. Let us look at an example to understand how negative index works in R:

##  [1]  1  6  2 10  9  7  5  8  3  4
## [1]  1  6 10  9  7  5  8  3  4
## [1]  1  6  2 10  9  7  8  3  4

6.10.3 Subset Multiple Elements

If we do not specify anything within [], all the elements in the vector will be returned. We can specify the index elements using any expression that generates a sequence of integers. Let us look at a few examples:

##  [1]  4  3  5  1  8 10  9  7  6  2
##  [1]  4  3  5  1  8 10  9  7  6  2
## [1] 4 3 5 1 8
## [1]  8 10  9  7  6  2

If you are using the colon to generate the index positions, you will have to specify both the starting and ending position, else, R will return an error.

What if we want elements that are not in a sequence as we saw in the last example? In such cases, we have to create a vector using c() and use it to extract elements from the original vector. Below is an example:

##  [1]  7  5  9  1  4 10  2  8  6  3
## [1] 5 4 2
## [1]  7  5  9  1 10  6

6.10.4 Subset Named Vectors

Vectors can be subset using the name of the elements. When using name of elements for subsetting, ensure that the names are enclosed in single or double quotations, else R will return an error. Let us look at a few examples:

## score1 score2 score3 
##      8      6      9
## score2 
##      6
## score1 score3 
##      8      9

6.10.5 Subset using logical values

Logical values can be used to subset vectors. They are not very flexible but can be used for simple indexing. In all of the below examples, the logical vectors are recycled to match the length of the vector from which we subset data:

##  [1]  7  8  4  2 10  5  3  1  9  6
##  [1]  7  8  4  2 10  5  3  1  9  6
## integer(0)
## [1]  7  4 10  3  9
## [1] 8 2 5 1 6

6.10.6 Subset using logical expressions

Logical expressions can be used to extract elements that meet specific criteria. This method is most flexible and useful as we can combine multiple conditions using relational and logical operators. Before we use logical expressions, let us spend some time understanding comparison and logical operators as we will be using them extensively hereafter.

6.10.6.1 Comparison Operators

When you create an expression using a comparison operator, the output is always a logical value i.e. TRUE or FALSE. Let us see how we can use comparison operators to subset data:

##  [1]  3 10  4  6  8  9  1  2  7  5
##  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE
## [1] 10  6  8  9  7
##  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
## [1] 10  6  8  9  7  5
##  [1]  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [1] 3 4 1 2
##  [1]  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE
## [1] 3 4 1 2 5
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
## [1] 5
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
## [1]  3 10  4  6  8  9  1  2  7

6.10.7 Logical Operators

Let us combine comparison and logical operators to create expressions and use them to subset vectors:

##  [1]  9 10  7  4  8  1  6  2  5  3
## [1] 9 7 4 1 6 2 5 3
## [1] 10  4  8  1  6  2  5  3