Learning Objectives
- Describe what a vector is.
- Create vectors of different data types.
- Use indexing to subset and modify specific portions of vectors.
- Understand how to use vectorized functions to avoid loop operations.
Suggested readings
- Chapter 20 of “R for Data Science”, by Garrett Grolemund and Hadley Wickham
- Chapter 5.1 of “Hands-On Programming with R”, by Garrett Grolemund
So far we’ve only dealt with objects that contain one value
(e.g. x <- 1
), but R actually stores those values in a
vector of length one:
x <- 1
length(x)
#> [1] 1
is.vector(x)
#> [1] TRUE
A vector is a basic data structure in R. All elements in a vector must have the same type.
The most basic way of creating a vector is to use the
c()
function (“c” is for “concatenate”):
x <- c(1, 2, 3)
length(x)
#> [1] 3
As we saw in the loops lesson, you can
also create vectors of sequences using the :
operator or
the seq()
function:
seq(1, 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
1:5
#> [1] 1 2 3 4 5
You can also create a vector by using the rep()
function, which replicates the same value n
times:
y <- rep(5, 10) # The number 5 ten times
z <- rep(10, 5) # The number 10 five times
y
#> [1] 5 5 5 5 5 5 5 5 5 5
z
#> [1] 10 10 10 10 10
In fact, you can use the rep()
function to create longer
vectors made up of repeated vectors:
rep(c(1, 2), 3) # Repeat the vector c(1, 2) three times
#> [1] 1 2 1 2 1 2
If you add the each
argument, rep()
will
repeat each element in the vector:
rep(c(1, 2), each = 3) # Repeat each element of the vector c(1, 2) three times
#> [1] 1 1 1 2 2 2
You can see how long a vector is using the length()
function:
length(y)
#> [1] 10
length(z)
#> [1] 5
Each element in a vector must have the same type. If you mix types in a vector, R will coerce all the elements to either a numeric or character type.
If a vector has a single character element, R makes everything a character:
c(1, 2, "3")
#> [1] "1" "2" "3"
c(TRUE, FALSE, "TRUE")
#> [1] "TRUE" "FALSE" "TRUE"
If a vector has numeric and logical elements, R makes everything a number:
c(1, 2, TRUE, FALSE)
#> [1] 1 2 1 0
If a vector has integers and floats, R makes everything a float:
c(1L, 2, pi)
#> [1] 1.000000 2.000000 3.141593
You can delete a vector by assigning NULL
to it:
x <- seq(1, 10)
x
#> [1] 1 2 3 4 5 6 7 8 9 10
x <- NULL
x
#> NULL
As we saw in the loops lesson, you can
create a vector of integers using the :
operator or the
seq()
function:
1:10
#> [1] 1 2 3 4 5 6 7 8 9 10
seq(1, 10)
#> [1] 1 2 3 4 5 6 7 8 9 10
Numeric vectors don’t all have to be integers though - they can be any number:
v <- c(pi, 7, 42, 365)
v
#> [1] 3.141593 7.000000 42.000000 365.000000
typeof(v)
#> [1] "double"
R has many built-in functions that are designed to give summary information about numeric vectors. Note that these functions take a vectors of numbers and return single values. Here are some common ones:
Function | Description | Example |
---|---|---|
mean(x) |
Mean of values in x |
mean(c(1,2,3,4,5)) returns 3 |
median(x) |
Median of values in x |
median(c(1,2,2,4,5)) returns 2 |
max(x) |
Max element in x |
max(c(1,2,3,4,5)) returns 5 |
min(x) |
Min element in x |
min(c(1,2,3,4,5)) returns 1 |
sum(x) |
Sums the elements in x |
sum(c(1,2,3,4,5)) returns 15 |
prod(x) |
Product of the elements in x |
prod(c(1,2,3,4,5)) returns 120 |
Character vectors are vectors where each element is a string:
stringVector <- c('oh', 'what', 'a', 'beautiful', 'morning')
stringVector
#> [1] "oh" "what" "a" "beautiful" "morning"
typeof(stringVector)
#> [1] "character"
As we’ll see in the next lesson on strings, you can “collapse” a character
vector into a single string using the str_c()
function from
the stringr
library:
library(stringr)
str_c(stringVector, collapse = ' ')
#> [1] "oh what a beautiful morning"
Logical vectors contain only TRUE
or FALSE
elements:
logicalVector <- c(rep(TRUE, 3), rep(FALSE, 3))
logicalVector
#> [1] TRUE TRUE TRUE FALSE FALSE FALSE
If you add a numeric type to a logical vector, the logical elements
will be converted to either a 1
for TRUE
or
0
for FALSE
:
c(logicalVector, 42)
#> [1] 1 1 1 0 0 0 42
Warning: If you add a character type to a logical
vector, the logical elements will be converted to strings of
"TRUE"
and "FALSE"
. So even though they may
still look like logical types, they aren’t:
y <- c(logicalVector, 'string')
y
#> [1] "TRUE" "TRUE" "TRUE" "FALSE" "FALSE" "FALSE" "string"
typeof(y)
#> [1] "character"
If you want to check if two vectors are identical (in that they
contain all the same elements), you can’t use the typical
==
operator by itself. The reason is because the
==
operator is performed element-wise, so it will return a
logical vector:
x <- c(1,2,3)
y <- c(1,2,3)
x == y
#> [1] TRUE TRUE TRUE
Instead of getting one TRUE
, you get a vector of
TRUE
s, because the individual elements are indeed equal. To
compare if all the elements in the two vectors are identical,
wrap the comparison inside the all()
function:
all(x == y)
#> [1] TRUE
Keep in mind that there are really two steps going on here: 1)
x == y
creates a logical vectors of TRUE
’s and
FALSE
’s based on element-wise comparisons, and 2) the
all()
function compares whether all of the values in the
logical vector are TRUE
.
You can also use the all()
function to compare if other
types of conditions are all TRUE
for all elements in two
vectors:
a <- c(1,2,3)
b <- -1*c(1,2,3)
all(a > b)
#> [1] TRUE
In contrast to the all()
function, the
any()
function will return TRUE
if
any of the elements in a vector are TRUE
:
a <- c(1,2,3)
b <- c(-1,2,-3)
a == b
#> [1] FALSE TRUE FALSE
any(a == b)
#> [1] TRUE
For most situations, the all()
function works just fine
for comparing vectors, but it only compares the elements in the
vectors, not their attributes. In some situations, you might
also want to check if the attributes of vector, such as their
names and data types, are also the same. In this case,
you should use the identical()
function.
names(x) <- c('a', 'b', 'c')
names(y) <- c('one', 'two', 'three')
all(x == y) # Only compares the elements
#> [1] TRUE
identical(x, y) # Also compares the **names** of the elements
#> [1] FALSE
Notice that for the identical()
function, you don’t need
to add a conditional statement - you just provide it the two vectors you
want to compare. This is because identical()
by definition
is comparing if two things are the same.
You can access elements from a vector using brackets []
and indices inside the brackets. You can use integer indices (probably
the most common way), character indices (by naming each element), and
logical indices.
Vector indices start from 1 (this is important - most programming languages start from 0):
x <- seq(1, 10)
x[1] # Returns the first element
#> [1] 1
x[3] # Returns the third element
#> [1] 3
x[length(x)] # Returns the last element
#> [1] 10
You can access multiple elements by using a vector of indices inside the brackets:
x[c(1:3)] # Returns the first three elements
#> [1] 1 2 3
x[c(2, 7)] # Returns the 2nd and 7th elements
#> [1] 2 7
You can also use negative integers to remove elements, which returns all elements except that those specified:
x[-1] # Returns everything except the first element
#> [1] 2 3 4 5 6 7 8 9 10
x[-c(2, 7)] # Returns everything except the 2nd and 7th elements
#> [1] 1 3 4 5 6 8 9 10
But you cannot mix positive and negative integers while indexing:
x[c(-2, 7)]
#> Error in x[c(-2, 7)]: only 0's may be mixed with negative subscripts
If you try to use a float as an index, it gets rounded down to the nearest integer:
x[3.1415] # Returns the 3rd element
#> [1] 3
x[3.9999] # Still returns the 3rd element
#> [1] 3
You can name the elements in a vector and then use those names to
access elements. To create a named vector, use the names()
function:
x <- seq(5)
names(x) <- c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j')
#> Error in names(x) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"): 'names' attribute [10] must be the same length as the vector [5]
x
#> [1] 1 2 3 4 5
You can also create a named vector by putting the names directly in
the c()
function:
x <- c('a' = 1, 'b' = 2, 'c' = 3, 'd' = 4, 'e' = 5)
x
#> a b c d e
#> 1 2 3 4 5
Once your vector has names, you can then use those names as indices:
x['a'] # Returns the first element
#> a
#> 1
x[c('a', 'c')] # Returns the 1st and 3rd elements
#> a c
#> 1 3
When using a logical vector for indexing, the position where the
logical vector is TRUE
is returned. This is helpful for
filtering vectors based on conditions:
x <- seq(1, 10)
x > 5 # Create logical vector
#> [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
x[x > 5] # Put logical vector in brackets to filter out the TRUE elements
#> [1] 6 7 8 9 10
You can also use the which()
function to find the
numeric indices for which a condition is TRUE
, and then use
those indices to select elements:
which(x < 5) # Returns indices of TRUE elements
#> [1] 1 2 3 4
x[which(x < 5)] # Use which to select elements based on a condition
#> [1] 1 2 3 4
Most base functions in R are “vectorized”, meaning that when you give them a vector, they perform the operation on each element in the vector.
When you perform arithmetic operations on vectors, they are executed on an element-by-element basis:
x1 <- c(1, 2, 3)
x2 <- c(4, 5, 6)
# Addition
x1 + x2 # Returns (1+4, 2+5, 3+6)
#> [1] 5 7 9
# Subtraction
x1 - x2 # Returns (1-4, 2-5, 3-6)
#> [1] -3 -3 -3
# Multiplicattion
x1 * x2 # Returns (1*4, 2*5, 3*6)
#> [1] 4 10 18
# Division
x1 / x2 # Returns (1/4, 2/5, 3/6)
#> [1] 0.25 0.40 0.50
When performing vectorized operations, the vectors need to have the same dimensions, or one of the vectors needs to be a single-value vector:
# Careful! Mis-matched dimensions will only give you a warning, but will still return a value:
x1 <- c(1, 2, 3)
x2 <- c(4, 5)
x1 + x2
#> [1] 5 7 7
What R does in these cases is repeat the shorter vector, so
in the above case the last value is 3 + 4
.
If you have a single value vector, R will add it element-wise:
x1 <- c(1, 2, 3)
x2 <- c(4)
x1 + x2
#> [1] 5 6 7
You can reorder the arrangement of elements in a vector by using the
sort()
function:
a = c(2, 4, 6, 3, 1, 5)
sort(a)
#> [1] 1 2 3 4 5 6
sort(a, decreasing = TRUE)
#> [1] 6 5 4 3 2 1
To get the index values of the sorted order, use the
order()
function:
order(a)
#> [1] 5 1 4 2 6 3
These indices tell us that the first value in the sorted arrangement
of vector a
is element number 5 (which is a
1
), the second value is element number 1
(which is a 2
), and so on. If you use order()
as the indices to the vector, you’ll get the sorted vector:
a[order(a)] # Same as sort(a)
#> [1] 1 2 3 4 5 6
As we saw in the loops lesson, you can use a loop to perform an operation on each element in a vector. For example, the following loop get the decimal values for each element in a vector of floats:
x <- c(3.1415, 1.618, 2.718)
remainder <- c()
for (i in x) {
remainder <- c(remainder, i %% 1)
}
remainder
#> [1] 0.1415 0.6180 0.7180
You could achieve the same thing by just performing the operation
inside the loop (the i %% 1
bit) on the whole vector:
remainder <- x %% 1
remainder
#> [1] 0.1415 0.6180 0.7180
In many cases, using a vector can save you a whole lot of code (and time!) by avoiding loops entirely!
Page sources:
Some content on this page has been modified from other courses, including: