Chapter 7 Dataframes in R
7.1 Introduction
In this chapter, we will learn to:
- create dataframe
- select columns
- select rows
- utitlity functions
7.2 Create dataframes
Use data.frame
to create dataframes. Below is the function syntax:
args(data.frame)
## function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,
## fix.empty.names = TRUE, stringsAsFactors = FALSE)
## NULL
Data frames are basically lists with elements of equal lenght and as such, they are heterogeneous. Let us create a dataframe:
<- c('John', 'Jack', 'Jill')
name <- c(29, 25, 27)
age <- c(TRUE, TRUE, FALSE)
graduate <- data.frame(name, age, graduate)
students students
## name age graduate
## 1 John 29 TRUE
## 2 Jack 25 TRUE
## 3 Jill 27 FALSE
7.3 Basic Information
class(students)
## [1] "data.frame"
names(students)
## [1] "name" "age" "graduate"
colnames(students)
## [1] "name" "age" "graduate"
str(students)
## 'data.frame': 3 obs. of 3 variables:
## $ name : chr "John" "Jack" "Jill"
## $ age : num 29 25 27
## $ graduate: logi TRUE TRUE FALSE
dim(students)
## [1] 3 3
nrow(students)
## [1] 3
ncol(students)
## [1] 3
7.4 Select Columns
7.4.1 Single Column
[]
[[]]
$
# using [
1]
students[## name
## 1 John
## 2 Jack
## 3 Jill
# using [[
1]]
students[[## [1] "John" "Jack" "Jill"
# using $
$name
students## [1] "John" "Jack" "Jill"
7.4.2 Multiple Columns
1:3]
students[, ## name age graduate
## 1 John 29 TRUE
## 2 Jack 25 TRUE
## 3 Jill 27 FALSE
c(1, 3)]
students[, ## name graduate
## 1 John TRUE
## 2 Jack TRUE
## 3 Jill FALSE
7.5 Select Rows
# single row
1, ]
students[## name age graduate
## 1 John 29 TRUE
# multiple row
c(1, 3), ]
students[## name age graduate
## 1 John 29 TRUE
## 3 Jill 27 FALSE
If you have observed carefully, the column names
has been coerced to type
factor. This happens because of a default argument in data.frame
which is
stringsAsFactors
which is set to TRUE
. If you do not want to treat it as
factors
, set the argument to FALSE
.
<- data.frame(name, age, graduate, stringsAsFactors = FALSE) students