Chapter 6 Vectors in R
6.1 Introduction
Vector is the most basic data structure in R. It is a sequence of elements of the same data type. If the elements are of different data types, they will be coerced to a common type that can accommodate all the elements. Vectors are generally created using c()
(concatenate function), although depending on the data type of vector being created, other methods can be used.
6.2 Numeric Vector
We will create a numeric vector using c()
but you can use any function that creates a sequence of numbers. After that we will use is.vector()
to check if it is a vector and class
to check the data type.
# create a numeric vector
<- c(1, 2, 3)
num_vect
# display the vector
num_vect
## [1] 1 2 3
# check if it is a vector
is.vector(num_vect)
## [1] TRUE
# check data type
class(num_vect)
## [1] "numeric"
Let us look at other ways to create a sequence of numbers. We leave it as an exercise to the reader to understand the functions using help
.
# using colon
<- 1:10
vect1 vect1
## [1] 1 2 3 4 5 6 7 8 9 10
# using rep
<- rep(1, 5)
vect2 vect2
## [1] 1 1 1 1 1
# using seq
<- seq(10)
vect3 vect3
## [1] 1 2 3 4 5 6 7 8 9 10
6.3 Integer Vector
Creating an integer vector is similar to numeric vector except that we need to instruct R to treat the data as integer
and not numeric
or double
. We will use the same methods we used for creating numeric vectors. To specify that the data is of type integer
, we suffix the number with L
.
# integer vector
<- c(1L, 2L, 3L)
int_vect int_vect
## [1] 1 2 3
# check data type
class(int_vect)
## [1] "integer"
# using colon
<- 1L:10L
vect1 vect1
## [1] 1 2 3 4 5 6 7 8 9 10
# using rep
<- rep(1L, 5)
vect2 vect2
## [1] 1 1 1 1 1
# using seq
<- seq(10L)
vect3 vect3
## [1] 1 2 3 4 5 6 7 8 9 10
6.4 Character Vector
A character vector may contain a single character, a word or a group of words. The elements must be enclosed in single or double quotations.
# character vector
<- c("hello", "good morning")
greetings greetings
## [1] "hello" "good morning"
# check data type
class(greetings)
## [1] "character"
6.5 Logical Vector
A vector of logical values will either contain TRUE
or FALSE
or both.
# logical vector
<- c(TRUE, FALSE, TRUE, TRUE, FALSE)
vect_logic vect_logic
## [1] TRUE FALSE TRUE TRUE FALSE
# check data type
class(vect_logic)
## [1] "logical"
In fact, you can create an integer
vector and coerce it to type logical
.
# integer vector
<- rep(0L:1L, 3)
int_vect int_vect
## [1] 0 1 0 1 0 1
# coerce to logical vector
<- as.logical(int_vect)
log_vect log_vect
## [1] FALSE TRUE FALSE TRUE FALSE TRUE
# check data type
class(log_vect)
## [1] "logical"
6.6 Naming Vector Elements
It is possible to name the different elements of a vector. The advantage of naming vector elements is that we can later on use these names to access the elements. Use names()
to specify the names of a vector. You can specify the names while creating the vector or add them later.
6.6.1 Method 1: Create vector and add names later
# create vector and add names later
<- c(1, 2, 3)
vect1
# name the elements of the vector
names(vect1) <- c("One", "Two", "Three")
# call vect1
vect1## One Two Three
## 1 2 3
6.6.2 Method 2: Specify names while creating vector
# specify names while creating vector
<- c(John = 1, Jack = 2, Jill = 3, Jovial = 4)
vect2
# call vect2
vect2## John Jack Jill Jovial
## 1 2 3 4
6.7 Vector Coercion
Vectors are homogeneous i.e. all the elements of the vector must be of the same type. If we try to create a vector by combining different data types, the elements will be coerced to the most flexible type. The below table shows the order in which coercion occurs.
character
data type is the most flexible while logical
data type is the least flexible. If you try to combine any other data type with character
, all the elements will be coerced to type character
. In the absence of character
data, all elements will be coerced to numeric
. Finally, if the data does not include character
or numeric
types, all the elements will be coerced to integer
type.
6.7.1 Case 1: Different Data Types
# vector of different data types
<- c(1, 1L, 'one', TRUE)
vect1
# call vect1
vect1## [1] "1" "1" "one" "TRUE"
# check data type
class(vect1)
## [1] "character"
6.7.2 Case 2: Numeric, Integer and Logical
# vector of different data types
<- c(1, 1L, TRUE)
vect1
# call vect1
vect1## [1] 1 1 1
# check data type
class(vect1)
## [1] "numeric"
6.7.3 Case : Integer and Logical
# vector of different data types
<- c(1L, TRUE)
vect1
# call vect1
vect1## [1] 1 1
# check data type
class(vect1)
## [1] "integer"
To summarize, below is the order in which coercion takes place:
6.8 Vector Operations
In this section, we look at simple operations that can be performed on vectors in R. Remember that the nature of the operations depends upon the type of data. Below are a few examples:
6.8.1 Case 1: Vectors of same length
# create two vectors
<- c(1, 3, 8, 4)
vect1 <- c(2, 7, 1, 9)
vect2
# addition
+ vect2
vect1 ## [1] 3 10 9 13
# subtraction
- vect2
vect1 ## [1] -1 -4 7 -5
# multiplication
* vect2
vect1 ## [1] 2 21 8 36
# division
/ vect2
vect1 ## [1] 0.5000000 0.4285714 8.0000000 0.4444444
6.8.2 Case 2: Vectors of different length
In the previous case, the length i.e. the number of elements in the vectors were same. What happens if the length of the vectors are unequal? In such cases, the shorter vector is recycled to match the length of the longer vector. The below example should clear this concept:
# create two vectors
<- c(2, 7)
vect1 <- c(1, 8, 5, 2)
vect2
# addition
+ vect2
vect1 ## [1] 3 15 7 9
# subtraction
- vect2
vect1 ## [1] 1 -1 -3 5
# multiplication
* vect2
vect1 ## [1] 2 56 10 14
# division
/ vect2
vect1 ## [1] 2.000 0.875 0.400 3.500
6.9 Missing Data
Missing data is a reality. No matter how careful you are in collecting data for your analysis, chances are always high that you end up with some missing data. In R missing values are represented by NA
. In this section, we will focus on the following:
- test for missing data
- remove missing data
- exclude missing data from analysis
6.9.1 Detect missing data
We first create a vector with missing values. After that, we will use is.na()
to test whether the data contains missing values. is.na()
returns a logical vector equal to the length of the vector being tested. Another function that can be used for detecting missing values is complete.cases()
. Below is an example:
# vector with missing values
<- c(1, 3, NA, 5, 2)
vect1
# use is.na
is.na(vect1)
## [1] FALSE FALSE TRUE FALSE FALSE
# use complete.cases
complete.cases(vect1)
## [1] TRUE TRUE FALSE TRUE TRUE
6.9.2 Omit missing data
In the presensce of missing data, all computations in R will return NA
. To avoid this, we might want to remove the missing data before doing any computation. na.omit()
will remove all missing values from the data. Let us look at an example:
# vector with missing values
<- c(1, 3, NA, 5, 2)
vect1
# call vect1
vect1## [1] 1 3 NA 5 2
# omit missing values
na.omit(vect1)
## [1] 1 3 5 2
## attr(,"na.action")
## [1] 3
## attr(,"class")
## [1] "omit"
6.9.3 Exclude missing data
To exclude missing values from computations, use na.rm
and set it to TRUE
.
# vector with missing values
<- c(1, 3, NA, 5, 2)
vect1
# compute mean
mean(vect1)
## [1] NA
# compute mean by excluding missing value
mean(vect1, na.rm = TRUE)
## [1] 2.75
6.10 Index Vectors
One of the most important steps in data analysis is selecting a subset of data from a bigger data set. Indexing helps in retrieving values individually or a set of values that meet a specific criteria. In this chapter, we look at various ways of indexing/subsetting vectors.
[]
is the index operator in R. We can use various expressions within []
to subset data. In R, index positions begin at 1 and not 0. To begin with, let us look at values in different index positions:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 2 5 10 6 8 3 4 1 9 7
# return third element
3] vect1[
## [1] 10
# return seventh element
7] vect1[
## [1] 4
6.10.1 Out of range index
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 8 10 4 9 3 6 1 7 5 2
# return value at index 0
0] vect1[
## integer(0)
# length of the vector
length(vect1)
## [1] 10
# out of range index
11] vect1[
## [1] NA
In the first case, we specified the index as 0 and in the second case we used the index 11, which is greater than the length of the vector. R returns an empty vector in the first case and NA
in the second case.
6.10.2 Negative Index
Using a negative index will delete the value in the said index position. Unlike other languages, it will not index elements from the end of the vector counting backwards. Let us look at an example to understand how negative index works in R:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 5 2 1 8 7 3 6 9 4 10
# drop third element
-3] vect1[
## [1] 5 2 8 7 3 6 9 4 10
# drop seventh element
-7] vect1[
## [1] 5 2 1 8 7 3 9 4 10
6.10.3 Subset Multiple Elements
If we do not specify anything within []
, all the elements in the vector will be returned. We can specify the index elements using any expression that generates a sequence of integers. Let us look at a few examples:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 7 1 6 4 10 9 2 3 8 5
# return all elements
vect1[]
## [1] 7 1 6 4 10 9 2 3 8 5
# return first 5 values
1:5] vect1[
## [1] 7 1 6 4 10
# return all values from the 5th position
<- length(vect1)
end 5:end] vect1[
## [1] 10 9 2 3 8 5
If you are using the colon to generate the index positions, you will have to specify both the starting and ending position, else, R will return an error.
What if we want elements that are not in a sequence as we saw in the last example? In such cases, we have to create a vector using c()
and use it to extract elements from the original vector. Below is an example:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 8 2 5 1 10 3 4 7 6 9
# extract 2nd, 5th and 7th element
<- c(2, 5, 7)
select vect1[select]
## [1] 2 10 4
# extract elements in position 1 to 4, 6 and 9
<- c(1:4, 6, 9)
select vect1[select]
## [1] 8 2 5 1 3 6
6.10.4 Subset Named Vectors
Vectors can be subset using the name of the elements. When using name of elements for subsetting, ensure that the names are enclosed in single or double quotations, else R will return an error. Let us look at a few examples:
<- c(score1 = 8, score2 = 6, score3 = 9)
vect1 vect1
## score1 score2 score3
## 8 6 9
# extract score2
'score2'] vect1[
## score2
## 6
# extract score1 and score3
c('score1', 'score3')] vect1[
## score1 score3
## 8 9
6.10.5 Subset using logical values
Logical values can be used to subset vectors. They are not very flexible but can be used for simple indexing. In all of the below examples, the logical vectors are recycled to match the length of the vector from which we subset data:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 9 10 8 2 3 5 1 4 6 7
# returns all values
TRUE] vect1[
## [1] 9 10 8 2 3 5 1 4 6 7
# empty vector
FALSE] vect1[
## integer(0)
# values in odd positions
c(TRUE, FALSE)] vect1[
## [1] 9 8 3 1 6
# values in even positions
c(FALSE, TRUE)] vect1[
## [1] 10 2 5 4 7
6.10.6 Subset using logical expressions
Logical expressions can be used to extract elements that meet specific criteria. This method is most flexible and useful as we can combine multiple conditions using relational and logical operators. Before we use logical expressions, let us spend some time understanding comparison and logical operators as we will be using them extensively hereafter.
6.10.6.1 Comparison Operators
When you create an expression using a comparison operator, the output is always a logical value i.e. TRUE
or FALSE
. Let us see how we can use comparison operators to subset data:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 8 1 3 9 10 6 2 5 4 7
# return elements greater than 5
> 5 vect1
## [1] TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
> 5] vect1[vect1
## [1] 8 9 10 6 7
# return elements greater than or equal to 5
>= 5 vect1
## [1] TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
>= 5] vect1[vect1
## [1] 8 9 10 6 5 7
# return elements lesser than 5
< 5 vect1
## [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
< 5] vect1[vect1
## [1] 1 3 2 4
# return elements lesser than or equal to 5
<= 5 vect1
## [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
<= 5] vect1[vect1
## [1] 1 3 2 5 4
# return elements equal to 5
== 5 vect1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
== 5] vect1[vect1
## [1] 5
# return elements not equal to 5
!= 5 vect1
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
!= 5] vect1[vect1
## [1] 8 1 3 9 10 6 2 4 7
6.10.7 Logical Operators
Let us combine comparison and logical operators to create expressions and use them to subset vectors:
# random sample of 10 values
<- sample(10)
vect1 vect1
## [1] 8 6 3 7 4 5 10 9 2 1
# return all elements less than 8 or divisible by 3
< 8 | (vect1 %% 3 == 0))] vect1[(vect1
## [1] 6 3 7 4 5 9 2 1
# return all elements less than 7 or divisible by 2
< 7 | (vect1 %% 2 == 0))] vect1[(vect1
## [1] 8 6 3 4 5 10 2 1