Learning Objectives
- Know the basic syntax of custom functions in R.
- Understand how function arguments work in R.
- Understand the different types of return statements in R.
- Create your own custom functions.
- Understand the concept of helper functions.
- Understand the distinction between local and global variables.
Suggested Readings
- Chapter 19 of “R for Data Science”, by Garrett Grolemund and Hadley Wickham
- Chapters 2.4 - 2.7 of “Hands-On Programming with R”, by Garrett Grolemund
We already know how to use built-in functions like
sum()
, round()
, sqrt()
, etc. And
we can access other functions by installing external packages. But many
times there just isn’t a function out there to do what you need.
Fortunately, you can write your own!
Here’s the syntax that you use to create a function:
FNAME <- function(ARG1, ARG2, ETC) {
STATEMENT1
STATEMENT2
return(VALUE)
}
What this does is create a function with the name FNAME
,
which has arguments ARG1
, ARG2
, etc. Whenever
the function is called, R executes the statements within the curly
braces {}
, and then returns the VALUE
inside
the return()
statement.
There’s a lot of different pieces to making a function. The way I like to remember how they all go together is to read the following English sentence:
“function name” is a function of () that does…
Each piece of the above sentence corresponds with a piece of code for writing a function:
“function name” | is a | function | of () | that does… |
---|---|---|---|---|
FNAME |
<- |
function |
(ARG1, ARG2, ETC) |
{} |
All the commands your function will execute go in the
{}
.
For example, here’s the function mySqrt(n)
, which
returns the square root of n
:
“function name” | is a | function | of () | that does… |
---|---|---|---|---|
mySqrt |
<- |
function |
(n) |
{ return(n^0.5) } |
And here’s mySqrt(n)
written in the typical format:
mySqrt <- function(n) {
return(n^0.5)
}
Here’s a function with one argument:
square <- function(x) {
y <- x^2
return(y)
}
square(2)
#> [1] 4
square(8)
#> [1] 64
Here’s a function with multiple arguments:
sumTwoValues <- function(x, y) {
value <- x + y
return(value)
}
sumTwoValues(2, 3)
#> [1] 5
sumTwoValues(3, 4)
#> [1] 7
Functions don’t always have to take arguments. For example:
doSomething <- function() {
cat("Carpe diem!") # The cat() function prints whatever's inside it to the console
}
doSomething()
#> Carpe diem!
Default arguments:
Sometimes, a function has a parameter that has a natural default. We can specify that default value in the function definition, then choose whether or not to include it in the function call:
f <- function(x, y=10) {
return(x + y)
}
f(5) # 15
#> [1] 15
f(5, 1) # 6
#> [1] 6
return()
statementHere’s a basic example of using return()
to return a
value:
isPositive <- function(x) {
return (x > 0)
}
isPositive(5) # TRUE
#> [1] TRUE
isPositive(-5) # FALSE
#> [1] FALSE
isPositive(0) # FALSE
#> [1] FALSE
The return()
statement ends the function
immediately:
isPositive <- function(x) {
cat("Hello!") # Runs
return(x > 0)
cat("Goodbye!") # Does not run ("dead code")
}
x <- isPositive(5) # Prints Hello, then assigns TRUE to x
#> Hello!
x
#> [1] TRUE
Notice that in the above example, the cat("Goodbye!")
statement is ignored.
If you don’t include a return()
statement, R will return
the value of the last statement by default (Don’t do
this):
f <- function(x) {
x + 42
}
f(5)
#> [1] 47
f <- function(x) {
x + 42
x + 7
}
f(5)
#> [1] 12
cat()
statementThe cat()
(short for “concatenating”) statement prints
whatever arguments it is given to the console. The arguments can be of
mixed types and it will convert them all to a concatenated string:
printX <- function(x) {
cat("The value of x provided is", x)
}
printX(7)
#> The value of x provided is 7
printX(42)
#> The value of x provided is 42
Mixing up return()
and cat()
is a common
early mistake. For example:
cubed <- function(x) {
cat(x^3)
}
cubed(2) # Seems to work
#> 8
2*cubed(2) # Expected 16...didn't work
#> 8
#> numeric(0)
Here’s a correct version:
cubed <- function(x) {
return(x^3) # That's better!
}
cubed(2) # Works!
#> [1] 8
2*cubed(2) # Works!
#> [1] 16
It is often useful to break down more complicated problems into
smaller “helper functions”. These helpers can be called in other
functions. Here’s an example of using the helper functions
square()
and squareRoot()
to compute the
hypotenuse of a triangle:
square <- function(x) {
return(x^2)
}
squareRoot <- function(x) {
return(x^0.5)
}
hypotenuse <- function(a, b) {
return(squareRoot(square(a) + square(b)))
}
a = 3
b = 4
hypotenuse(a, b)
#> [1] 5
All variables inside a function are called “local” variables and will NOT be created in the working environment. They can only be used locally within the function. For example:
minSquared <- function(x, y) {
smaller = min(x, y)
return(smaller^2)
}
minSquared(3, 4)
#> [1] 9
minSquared(4, 3)
#> [1] 9
If you try to call a local variable in the global environment, you’ll get an error:
square <- function(x) {
y <- x^2
return(y)
}
y
#> Error in eval(expr, envir, enclos): object 'y' not found
“Global” variables are those in the global environment. These will show up in the “Environment” pane in RStudio. You can call these inside functions, but this is BAD practice. Here’s an example (Don’t do this!):
printN <- function() {
cat(n) # n is not local -- so it is global (bad idea!!!)
}
printN() # Nothing happens because n isn't defined
n = 5 # Define n in the global environment
printN()
#> 5
One particularly useful function is almostEqual()
:
almostEqual <- function(d1, d2) {
epsilon = 0.00001
return(abs(d1-d2) <= epsilon)
}
This is useful when comparing numbers that are stored as floats and have lots of trailing zeros. For example, let’s do some simple addition:
x <- 0.1 + 0.2
x
#> [1] 0.3
If we compared x
to 0.3
, we would expect
the result to be TRUE
, right?
x == 0.3
#> [1] FALSE
What went wrong here? Well, what looks like a value of 0.3 is actually a float with a lot of zeros:
print(x, digits = 20)
#> [1] 0.30000000000000004441
By default, R doesn’t print out all these zeros, but they are the result of many small rounding errors that occur when computers do calculations.
This is where almostEqual()
comes in handy:
almostEqual(x, 0.3)
#> [1] TRUE
It only compares numbers out to a predefined decimal place, after which it ignores everything else. This will come in handy in your homework problems where you might get unexpected results.
Page sources:
Some content on this page has been modified from other courses, including: