Learning Objectives
- Describe what a vector is.
- Create vectors of different data types.
- Use indexing to subset and modify specific portions of vectors.
- Understand how to use vectorized functions to avoid loop operations.
Suggested readings
- Chapter 20 of “R for Data Science”, by Garrett Grolemund and Hadley Wickham
- Chapter 5.1 of “Hands-On Programming with R”, by Garrett Grolemund
So far we’ve only dealt with objects that contain one value (e.g. x <- 1), but R actually stores those values in a vector of length one:
x <- 1
length(x)## [1] 1is.vector(x)## [1] TRUEA vector is a basic data structure in R. All elements in a vector must have the same type.
The most basic way of creating a vector is to use the c() function (“c” is for “concatenate”):
x <- c(1, 2, 3)
length(x)## [1] 3As we saw in the loops lesson, you can also create vectors of sequences using the : operator or the seq() function:
seq(1, 10)##  [1]  1  2  3  4  5  6  7  8  9 101:5## [1] 1 2 3 4 5You can also create a vector by using the rep() function, which replicates the same value n times:
y <- rep(5, 10) # The number 5 ten times
z <- rep(10, 5) # The number 10 five timesy##  [1] 5 5 5 5 5 5 5 5 5 5z## [1] 10 10 10 10 10In fact, you can use the rep() function to create longer vectors made up of repeated vectors:
rep(c(1, 2), 3) # Repeat the vector c(1, 2) three times## [1] 1 2 1 2 1 2If you add the each argument, rep() will repeat each element in the vector:
rep(c(1, 2), each = 3) # Repeat each element of the vector c(1, 2) three times## [1] 1 1 1 2 2 2You can see how long a vector is using the length() function:
length(y)## [1] 10length(z)## [1] 5Each element in a vector must have the same type. If you mix types in a vector, R will coerce all the elements to either a numeric or character type.
If a vector has a single character element, R makes everything a character:
c(1, 2, "3")## [1] "1" "2" "3"c(TRUE, FALSE, "TRUE")## [1] "TRUE"  "FALSE" "TRUE"If a vector has numeric and logical elements, R makes everything a number:
c(1, 2, TRUE, FALSE)## [1] 1 2 1 0If a vector has integers and floats, R makes everything a float:
c(1L, 2, pi)## [1] 1.000000 2.000000 3.141593You can delete a vector by assigning NULL to it:
x <- seq(1, 10)
x##  [1]  1  2  3  4  5  6  7  8  9 10x <- NULL
x## NULLAs we saw in the loops lesson, you can create a vector of integers using the : operator or the seq() function:
1:10##  [1]  1  2  3  4  5  6  7  8  9 10seq(1, 10)##  [1]  1  2  3  4  5  6  7  8  9 10Numeric vectors don’t all have to be integers though - they can be any number:
v <- c(pi, 7, 42, 365)
v## [1]   3.141593   7.000000  42.000000 365.000000typeof(v)## [1] "double"R has many built-in functions that are designed to give summary information about numeric vectors. Note that these functions take a vectors of numbers and return single values. Here are some common ones:
| Function | Description | Example | 
|---|---|---|
| mean(x) | Mean of values in x | mean(c(1,2,3,4,5))returns3 | 
| median(x) | Median of values in x | median(c(1,2,2,4,5))returns2 | 
| max(x) | Max element in x | max(c(1,2,3,4,5))returns5 | 
| min(x) | Min element in x | min(c(1,2,3,4,5))returns1 | 
| sum(x) | Sums the elements in x | sum(c(1,2,3,4,5))returns15 | 
| prod(x) | Product of the elements in x | prod(c(1,2,3,4,5))returns120 | 
Character vectors are vectors where each element is a string:
stringVector <- c('oh', 'what', 'a', 'beautiful', 'morning')
stringVector## [1] "oh"        "what"      "a"         "beautiful" "morning"typeof(stringVector)## [1] "character"As we’ll see in the next lesson on strings, you can “collapse” a character vector into a single string using the str_c() function from the stringr library:
library(stringr)
str_c(stringVector, collapse = ' ')## [1] "oh what a beautiful morning"Logical vectors contain only TRUE or FALSE elements:
logicalVector <- c(rep(TRUE, 3), rep(FALSE, 3))
logicalVector## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSEIf you add a numeric type to a logical vector, the logical elements will be converted to either a 1 for TRUE or 0 for FALSE:
c(logicalVector, 42)## [1]  1  1  1  0  0  0 42Warning: If you add a character type to a logical vector, the logical elements will be converted to strings of "TRUE" and "FALSE". So even though they may still look like logical types, they aren’t:
y <- c(logicalVector, 'string')
y## [1] "TRUE"   "TRUE"   "TRUE"   "FALSE"  "FALSE"  "FALSE"  "string"typeof(y)## [1] "character"If you want to check if two vectors are identical (in that they contain all the same elements), you can’t use the typical == operator by itself. The reason is because the == operator is performed element-wise, so it will return a logical vector:
x <- c(1,2,3)
y <- c(1,2,3)
x == y## [1] TRUE TRUE TRUEInstead of getting one TRUE, you get a vector of TRUEs, because the individual elements are indeed equal. To compare if all the elements in the two vectors are identical, wrap the comparison inside the all() function:
all(x == y)## [1] TRUEKeep in mind that there are really two steps going on here: 1) x == y creates a logical vectors of TRUE’s and FALSE’s based on element-wise comparisons, and 2) the all() function compares whether all of the values in the logical vector are TRUE.
You can also use the all() function to compare if other types of conditions are all TRUE for all elements in two vectors:
a <- c(1,2,3)
b <- -1*c(1,2,3)
all(a > b)## [1] TRUEIn contrast to the all() function, the any() function will return TRUE if any of the elements in a vector are TRUE:
a <- c(1,2,3)
b <- c(-1,2,-3)
a == b## [1] FALSE  TRUE FALSEany(a == b)## [1] TRUEFor most situations, the all() function works just fine for comparing vectors, but it only compares the elements in the vectors, not their attributes. In some situations, you might also want to check if the attributes of vector, such as their names and data types, are also the same. In this case, you should use the identical() function.
names(x) <- c('a', 'b', 'c')
names(y) <- c('one', 'two', 'three')
all(x == y) # Only compares the elements## [1] TRUEidentical(x, y) # Also compares the **names** of the elements## [1] FALSENotice that for the identical() function, you don’t need to add a conditional statement - you just provide it the two vectors you want to compare. This is because identical() by definition is comparing if two things are the same.
You can access elements from a vector using brackets [] and indices inside the brackets. You can use integer indices (probably the most common way), character indices (by naming each element), and logical indices.
Vector indices start from 1 (this is important - most programming languages start from 0):
x <- seq(1, 10)
x[1] # Returns the first element## [1] 1x[3] # Returns the third element## [1] 3x[length(x)] # Returns the last element## [1] 10You can access multiple elements by using a vector of indices inside the brackets:
x[c(1:3)]  # Returns the first three elements## [1] 1 2 3x[c(2, 7)] # Returns the 2nd and 7th elements## [1] 2 7You can also use negative integers to remove elements, which returns all elements except that those specified:
x[-1] # Returns everything except the first element## [1]  2  3  4  5  6  7  8  9 10x[-c(2, 7)] # Returns everything except the 2nd and 7th elements## [1]  1  3  4  5  6  8  9 10But you cannot mix positive and negative integers while indexing:
x[c(-2, 7)]## Error in x[c(-2, 7)]: only 0's may be mixed with negative subscriptsIf you try to use a float as an index, it gets rounded down to the nearest integer:
x[3.1415] # Returns the 3rd element## [1] 3x[3.9999] # Still returns the 3rd element## [1] 3You can name the elements in a vector and then use those names to access elements. To create a named vector, use the names() function:
x <- seq(5)
names(x) <- c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j')## Error in names(x) <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"): 'names' attribute [10] must be the same length as the vector [5]x## [1] 1 2 3 4 5You can also create a named vector by putting the names directly in the c() function:
x <- c('a' = 1, 'b' = 2, 'c' = 3, 'd' = 4, 'e' = 5)
x## a b c d e 
## 1 2 3 4 5Once your vector has names, you can then use those names as indices:
x['a'] # Returns the first element## a 
## 1x[c('a', 'c')] # Returns the 1st and 3rd elements## a c 
## 1 3When using a logical vector for indexing, the position where the logical vector is TRUE is returned. This is helpful for filtering vectors based on conditions:
x <- seq(1, 10)
x > 5 # Create logical vector##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUEx[x > 5] # Put logical vector in brackets to filter out the TRUE elements## [1]  6  7  8  9 10You can also use the which() function to find the numeric indices for which a condition is TRUE, and then use those indices to select elements:
which(x < 5) # Returns indices of TRUE elements## [1] 1 2 3 4x[which(x < 5)] # Use which to select elements based on a condition## [1] 1 2 3 4Most base functions in R are “vectorized”, meaning that when you give them a vector, they perform the operation on each element in the vector.
When you perform arithmetic operations on vectors, they are executed on an element-by-element basis:
x1 <- c(1, 2, 3)
x2 <- c(4, 5, 6)# Addition
x1 + x2 # Returns (1+4, 2+5, 3+6)## [1] 5 7 9# Subtraction
x1 - x2 # Returns (1-4, 2-5, 3-6)## [1] -3 -3 -3# Multiplicattion
x1 * x2 # Returns (1*4, 2*5, 3*6)## [1]  4 10 18# Division
x1 / x2 # Returns (1/4, 2/5, 3/6)## [1] 0.25 0.40 0.50When performing vectorized operations, the vectors need to have the same dimensions, or one of the vectors needs to be a single-value vector:
# Careful! Mis-matched dimensions will only give you a warning, but will still return a value:
x1 <- c(1, 2, 3)
x2 <- c(4, 5)
x1 + x2## Warning in x1 + x2: longer object length is not a multiple of shorter object
## length## [1] 5 7 7What R does in these cases is repeat the shorter vector, so in the above case the last value is 3 + 4.
If you have a single value vector, R will add it element-wise:
x1 <- c(1, 2, 3)
x2 <- c(4)
x1 + x2## [1] 5 6 7You can reorder the arrangement of elements in a vector by using the sort() function:
a = c(2, 4, 6, 3, 1, 5)
sort(a)## [1] 1 2 3 4 5 6sort(a, decreasing = TRUE)## [1] 6 5 4 3 2 1To get the index values of the sorted order, use the order() function:
order(a)## [1] 5 1 4 2 6 3These indices tell us that the first value in the sorted arrangement of vector a is element number 5 (which is a 1), the second value is element number 1 (which is a 2), and so on. If you use order() as the indices to the vector, you’ll get the sorted vector:
a[order(a)] # Same as sort(a)## [1] 1 2 3 4 5 6As we saw in the loops lesson, you can use a loop to perform an operation on each element in a vector. For example, the following loop get the decimal values for each element in a vector of floats:
x <- c(3.1415, 1.618, 2.718)
remainder <- c()
for (i in x) {
    remainder <- c(remainder, i %% 1)
}
remainder## [1] 0.1415 0.6180 0.7180You could achieve the same thing by just performing the operation inside the loop (the i %% 1 bit) on the whole vector:
remainder <- x %% 1
remainder## [1] 0.1415 0.6180 0.7180In many cases, using a vector can save you a whole lot of code (and time!) by avoiding loops entirely!
Page sources:
Some content on this page has been modified from other courses, including: