Learning Objectives

  • Describe what a vector is.
  • Create vectors of different data types.
  • Use indexing to subset and modify specific portions of vectors.
  • Understand how to use vectorized functions to avoid loop operations.

Suggested readings

  • Chapter 20 of “R for Data Science”, by Garrett Grolemund and Hadley Wickham
  • Chapter 5.1 of “Hands-On Programming with R”, by Garrett Grolemund

So far we’ve only dealt with objects that contain one value (e.g. x <- 1), but R actually stores those values in a vector of length one:

x <- 1
length(x)
#> [1] 1
is.vector(x)
#> [1] TRUE

A vector is a basic data structure in R. All elements in a vector must have the same type.

Watch this 1-minute video for a quick summary of vectors


Vector basics

Creating vectors

The most basic way of creating a vector is to use the c() function (“c” is for “concatenate”):

x <- c(1, 2, 3)
length(x)
#> [1] 3

As we saw in the loops lesson, you can also create vectors of sequences using the : operator or the seq() function:

seq(1, 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10
1:5
#> [1] 1 2 3 4 5

You can also create a vector by using the rep() function, which replicates the same value n times:

y <- rep(5, 10) # The number 5 ten times
z <- rep(10, 5) # The number 10 five times
y
#>  [1] 5 5 5 5 5 5 5 5 5 5
z
#> [1] 10 10 10 10 10

In fact, you can use the rep() function to create longer vectors made up of repeated vectors:

rep(c(1, 2), 3) # Repeat the vector c(1, 2) three times
#> [1] 1 2 1 2 1 2

If you add the each argument, rep() will repeat each element in the vector:

rep(c(1, 2), each = 3) # Repeat each element of the vector c(1, 2) three times
#> [1] 1 1 1 2 2 2

You can see how long a vector is using the length() function:

length(y)
#> [1] 10
length(z)
#> [1] 5

Vector coercion

Each element in a vector must have the same type. If you mix types in a vector, R will coerce all the elements to either a numeric or character type.

If a vector has a single character element, R makes everything a character:

c(1, 2, "3")
#> [1] "1" "2" "3"
c(TRUE, FALSE, "TRUE")
#> [1] "TRUE"  "FALSE" "TRUE"

If a vector has numeric and logical elements, R makes everything a number:

c(1, 2, TRUE, FALSE)
#> [1] 1 2 1 0

If a vector has integers and floats, R makes everything a float:

c(1L, 2, pi)
#> [1] 1.000000 2.000000 3.141593

Deleting vectors

You can delete a vector by assigning NULL to it:

x <- seq(1, 10)
x
#>  [1]  1  2  3  4  5  6  7  8  9 10
x <- NULL
x
#> NULL

Numeric vectors

As we saw in the loops lesson, you can create a vector of integers using the : operator or the seq() function:

1:10
#>  [1]  1  2  3  4  5  6  7  8  9 10
seq(1, 10)
#>  [1]  1  2  3  4  5  6  7  8  9 10

Numeric vectors don’t all have to be integers though - they can be any number:

v <- c(pi, 7, 42, 365)
v
#> [1]   3.141593   7.000000  42.000000 365.000000
typeof(v)
#> [1] "double"

R has many built-in functions that are designed to give summary information about numeric vectors. Note that these functions take a vectors of numbers and return single values. Here are some common ones:

Function Description Example
mean(x) Mean of values in x mean(c(1,2,3,4,5)) returns 3
median(x) Median of values in x median(c(1,2,2,4,5)) returns 2
max(x) Max element in x max(c(1,2,3,4,5)) returns 5
min(x) Min element in x min(c(1,2,3,4,5)) returns 1
sum(x) Sums the elements in x sum(c(1,2,3,4,5)) returns 15
prod(x) Product of the elements in x prod(c(1,2,3,4,5)) returns 120

Character vectors

Character vectors are vectors where each element is a string:

stringVector <- c('oh', 'what', 'a', 'beautiful', 'morning')
stringVector
#> [1] "oh"        "what"      "a"         "beautiful" "morning"
typeof(stringVector)
#> [1] "character"

As we’ll see in the next lesson on strings, you can “collapse” a character vector into a single string using the str_c() function from the stringr library:

library(stringr)
str_c(stringVector, collapse = ' ')
#> [1] "oh what a beautiful morning"

Logical vectors

Logical vectors contain only TRUE or FALSE elements:

logicalVector <- c(rep(TRUE, 3), rep(FALSE, 3))
logicalVector
#> [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

If you add a numeric type to a logical vector, the logical elements will be converted to either a 1 for TRUE or 0 for FALSE:

c(logicalVector, 42)
#> [1]  1  1  1  0  0  0 42

Warning: If you add a character type to a logical vector, the logical elements will be converted to strings of "TRUE" and "FALSE". So even though they may still look like logical types, they aren’t:

y <- c(logicalVector, 'string')
y
#> [1] "TRUE"   "TRUE"   "TRUE"   "FALSE"  "FALSE"  "FALSE"  "string"
typeof(y)
#> [1] "character"

Comparing vectors

If you want to check if two vectors are identical (in that they contain all the same elements), you can’t use the typical == operator by itself. The reason is because the == operator is performed element-wise, so it will return a logical vector:

x <- c(1,2,3)
y <- c(1,2,3)
x == y
#> [1] TRUE TRUE TRUE

Instead of getting one TRUE, you get a vector of TRUEs, because the individual elements are indeed equal. To compare if all the elements in the two vectors are identical, wrap the comparison inside the all() function:

all(x == y)
#> [1] TRUE

Keep in mind that there are really two steps going on here: 1) x == y creates a logical vectors of TRUE’s and FALSE’s based on element-wise comparisons, and 2) the all() function compares whether all of the values in the logical vector are TRUE.

You can also use the all() function to compare if other types of conditions are all TRUE for all elements in two vectors:

a <- c(1,2,3)
b <- -1*c(1,2,3)
all(a > b)
#> [1] TRUE

In contrast to the all() function, the any() function will return TRUE if any of the elements in a vector are TRUE:

a <- c(1,2,3)
b <- c(-1,2,-3)
a == b
#> [1] FALSE  TRUE FALSE
any(a == b)
#> [1] TRUE

For most situations, the all() function works just fine for comparing vectors, but it only compares the elements in the vectors, not their attributes. In some situations, you might also want to check if the attributes of vector, such as their names and data types, are also the same. In this case, you should use the identical() function.

names(x) <- c('a', 'b', 'c')
names(y) <- c('one', 'two', 'three')
all(x == y) # Only compares the elements
#> [1] TRUE
identical(x, y) # Also compares the **names** of the elements
#> [1] FALSE

Notice that for the identical() function, you don’t need to add a conditional statement - you just provide it the two vectors you want to compare. This is because identical() by definition is comparing if two things are the same.

Accessing elements in a vector

You can access elements from a vector using brackets [] and indices inside the brackets. You can use integer indices (probably the most common way), character indices (by naming each element), and logical indices.

Using integer indices

Vector indices start from 1 (this is important - most programming languages start from 0):

x <- seq(1, 10)
x[1] # Returns the first element
#> [1] 1
x[3] # Returns the third element
#> [1] 3
x[length(x)] # Returns the last element
#> [1] 10

You can access multiple elements by using a vector of indices inside the brackets:

x[c(1:3)]  # Returns the first three elements
#> [1] 1 2 3
x[c(2, 7)] # Returns the 2nd and 7th elements
#> [1] 2 7

You can also use negative integers to remove elements, which returns all elements except that those specified:

x[-1] # Returns everything except the first element
#> [1]  2  3  4  5  6  7  8  9 10
x[-c(2, 7)] # Returns everything except the 2nd and 7th elements
#> [1]  1  3  4  5  6  8  9 10

But you cannot mix positive and negative integers while indexing:

x[c(-2, 7)]
#> Error in x[c(-2, 7)]: only 0's may be mixed with negative subscripts

If you try to use a float as an index, it gets rounded down to the nearest integer:

x[3.1415] # Returns the 3rd element
#> [1] 3
x[3.9999] # Still returns the 3rd element
#> [1] 3

Using characters indices

You can name the elements in a vector and then use those names to access elements. To create a named vector, use the names() function:

x <- seq(5)
names(x) <- c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j')
#> Error in names(x) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"): 'names' attribute [10] must be the same length as the vector [5]
x
#> [1] 1 2 3 4 5

You can also create a named vector by putting the names directly in the c() function:

x <- c('a' = 1, 'b' = 2, 'c' = 3, 'd' = 4, 'e' = 5)
x
#> a b c d e 
#> 1 2 3 4 5

Once your vector has names, you can then use those names as indices:

x['a'] # Returns the first element
#> a 
#> 1
x[c('a', 'c')] # Returns the 1st and 3rd elements
#> a c 
#> 1 3

Using logical indices

When using a logical vector for indexing, the position where the logical vector is TRUE is returned. This is helpful for filtering vectors based on conditions:

x <- seq(1, 10)
x > 5 # Create logical vector
#>  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
x[x > 5] # Put logical vector in brackets to filter out the TRUE elements
#> [1]  6  7  8  9 10

You can also use the which() function to find the numeric indices for which a condition is TRUE, and then use those indices to select elements:

which(x < 5) # Returns indices of TRUE elements
#> [1] 1 2 3 4
x[which(x < 5)] # Use which to select elements based on a condition
#> [1] 1 2 3 4

Vectorized operations

Most base functions in R are “vectorized”, meaning that when you give them a vector, they perform the operation on each element in the vector.

Arithmetic operations

When you perform arithmetic operations on vectors, they are executed on an element-by-element basis:

x1 <- c(1, 2, 3)
x2 <- c(4, 5, 6)
# Addition
x1 + x2 # Returns (1+4, 2+5, 3+6)
#> [1] 5 7 9
# Subtraction
x1 - x2 # Returns (1-4, 2-5, 3-6)
#> [1] -3 -3 -3
# Multiplicattion
x1 * x2 # Returns (1*4, 2*5, 3*6)
#> [1]  4 10 18
# Division
x1 / x2 # Returns (1/4, 2/5, 3/6)
#> [1] 0.25 0.40 0.50

When performing vectorized operations, the vectors need to have the same dimensions, or one of the vectors needs to be a single-value vector:

# Careful! Mis-matched dimensions will only give you a warning, but will still return a value:
x1 <- c(1, 2, 3)
x2 <- c(4, 5)
x1 + x2
#> [1] 5 7 7

What R does in these cases is repeat the shorter vector, so in the above case the last value is 3 + 4.

If you have a single value vector, R will add it element-wise:

x1 <- c(1, 2, 3)
x2 <- c(4)
x1 + x2
#> [1] 5 6 7

Sorting

You can reorder the arrangement of elements in a vector by using the sort() function:

a = c(2, 4, 6, 3, 1, 5)
sort(a)
#> [1] 1 2 3 4 5 6
sort(a, decreasing = TRUE)
#> [1] 6 5 4 3 2 1

To get the index values of the sorted order, use the order() function:

order(a)
#> [1] 5 1 4 2 6 3

These indices tell us that the first value in the sorted arrangement of vector a is element number 5 (which is a 1), the second value is element number 1 (which is a 2), and so on. If you use order() as the indices to the vector, you’ll get the sorted vector:

a[order(a)] # Same as sort(a)
#> [1] 1 2 3 4 5 6

Tips

Use vectors instead of a loop

As we saw in the loops lesson, you can use a loop to perform an operation on each element in a vector. For example, the following loop get the decimal values for each element in a vector of floats:

x <- c(3.1415, 1.618, 2.718)
remainder <- c()
for (i in x) {
    remainder <- c(remainder, i %% 1)
}
remainder
#> [1] 0.1415 0.6180 0.7180

You could achieve the same thing by just performing the operation inside the loop (the i %% 1 bit) on the whole vector:

remainder <- x %% 1
remainder
#> [1] 0.1415 0.6180 0.7180

In many cases, using a vector can save you a whole lot of code (and time!) by avoiding loops entirely!


Page sources:

Some content on this page has been modified from other courses, including:


EMSE 4571: Intro to Programming for Analytics (Spring 2022)
Thursdays | 12:45 - 3:15 PM EST | Tompkins 208 | Dr. John Paul Helveston | jph@gwu.edu
LICENSE: CC-BY-SA