Learning Objectives

  • Know the basic syntax of custom functions in R.
  • Understand how function arguments work in R.
  • Understand the different types of return statements in R.
  • Create your own custom functions.
  • Understand the concept of helper functions.
  • Understand the distinction between local and global variables.

Suggested Readings

  • Chapter 19 of “R for Data Science”, by Garrett Grolemund and Hadley Wickham
  • Chapters 2.4 - 2.7 of “Hands-On Programming with R”, by Garrett Grolemund

We already know how to use built-in functions like sum(), round(), sqrt(), etc. And we can access other functions by installing external packages. But many times there just isn’t a function out there to do what you need. Fortunately, you can write your own!

Basic syntax

Here’s the syntax that you use to create a function:

FNAME <- function(ARG1, ARG2, ETC) {
  STATEMENT1
  STATEMENT2
  return(VALUE)
}

What this does is create a function with the name FNAME, which has arguments ARG1, ARG2, etc. Whenever the function is called, R executes the statements within the curly braces {}, and then returns the VALUE inside the return() statement.

There’s a lot of different pieces to making a function. The way I like to remember how they all go together is to read the following English sentence:

“function name” is a function of () that does…

Each piece of the above sentence corresponds with a piece of code for writing a function:

“function name” is a function of () that does…
FNAME <- function (ARG1, ARG2, ETC) {}

All the commands your function will execute go in the {}.

For example, here’s the function mySqrt(n), which returns the square root of n:

“function name” is a function of () that does…
mySqrt <- function (n) { return(n^0.5) }

And here’s mySqrt(n) written in the typical format:

mySqrt <- function(n) {
    return(n^0.5)
}

Arguments

Here’s a function with one argument:

square <- function(x) {
  y <- x^2
  return(y)
}
square(2)
#> [1] 4
square(8)
#> [1] 64

Here’s a function with multiple arguments:

sumTwoValues <- function(x, y) {
  value <- x + y
  return(value)
}
sumTwoValues(2, 3)
#> [1] 5
sumTwoValues(3, 4)
#> [1] 7

Functions don’t always have to take arguments. For example:

doSomething <- function() {
    cat("Carpe diem!") # The cat() function prints whatever's inside it to the console
}
doSomething()
#> Carpe diem!

Default arguments:

Sometimes, a function has a parameter that has a natural default. We can specify that default value in the function definition, then choose whether or not to include it in the function call:

f <- function(x, y=10) {
    return(x + y)
}
f(5)     # 15
#> [1] 15
f(5, 1)  # 6
#> [1] 6

The return() statement

Here’s a basic example of using return() to return a value:

isPositive <- function(x) {
    return (x > 0)
}
isPositive(5)  # TRUE
#> [1] TRUE
isPositive(-5) # FALSE
#> [1] FALSE
isPositive(0)  # FALSE
#> [1] FALSE

The return() statement ends the function immediately:

isPositive <- function(x) {
    cat("Hello!")   # Runs
    return(x > 0)
    cat("Goodbye!") # Does not run ("dead code")
}
x <- isPositive(5)  # Prints Hello, then assigns TRUE to x
#> Hello!
x
#> [1] TRUE

Notice that in the above example, the cat("Goodbye!") statement is ignored.

If you don’t include a return() statement, R will return the value of the last statement by default (Don’t do this):

f <- function(x) {
    x + 42
}
f(5)
#> [1] 47
f <- function(x) {
    x + 42
    x + 7
}
f(5)
#> [1] 12

The cat() statement

The cat() (short for “concatenating”) statement prints whatever arguments it is given to the console. The arguments can be of mixed types and it will convert them all to a concatenated string:

printX <- function(x) {
  cat("The value of x provided is", x)
}
printX(7)
#> The value of x provided is 7
printX(42)
#> The value of x provided is 42

Mixing up return() and cat() is a common early mistake. For example:

cubed <- function(x) {
    cat(x^3)
}
cubed(2)   # Seems to work
#> 8
2*cubed(2) # Expected 16...didn't work
#> 8
#> numeric(0)

Here’s a correct version:

cubed <- function(x) {
    return(x^3) # That's better!
}
cubed(2)   # Works!
#> [1] 8
2*cubed(2) # Works!
#> [1] 16

Helper functions

It is often useful to break down more complicated problems into smaller “helper functions”. These helpers can be called in other functions. Here’s an example of using the helper functions square() and squareRoot() to compute the hypotenuse of a triangle:

square <- function(x) {
   return(x^2)
}

squareRoot <- function(x) {
   return(x^0.5)
}

hypotenuse <- function(a, b) {
   return(squareRoot(square(a) + square(b)))
}

a = 3
b = 4
hypotenuse(a, b)
#> [1] 5

Local vs. global variables

All variables inside a function are called “local” variables and will NOT be created in the working environment. They can only be used locally within the function. For example:

minSquared <- function(x, y) {
    smaller = min(x, y)
    return(smaller^2)
}
minSquared(3, 4)
#> [1] 9
minSquared(4, 3)
#> [1] 9

If you try to call a local variable in the global environment, you’ll get an error:

square <- function(x) {
  y <- x^2
  return(y)
}
y
#> Error in eval(expr, envir, enclos): object 'y' not found

“Global” variables are those in the global environment. These will show up in the “Environment” pane in RStudio. You can call these inside functions, but this is BAD practice. Here’s an example (Don’t do this!):

printN <- function() {
    cat(n)  # n is not local -- so it is global (bad idea!!!)
}
printN() # Nothing happens because n isn't defined
n = 5 # Define n in the global environment
printN()
#> 5

Tips

One particularly useful function is almostEqual():

almostEqual <- function(d1, d2) {
    epsilon = 0.00001
    return(abs(d1-d2) <= epsilon)
}

This is useful when comparing numbers that are stored as floats and have lots of trailing zeros. For example, let’s do some simple addition:

x <- 0.1 + 0.2
x
#> [1] 0.3

If we compared x to 0.3, we would expect the result to be TRUE, right?

x == 0.3
#> [1] FALSE

What went wrong here? Well, what looks like a value of 0.3 is actually a float with a lot of zeros:

print(x, digits = 20)
#> [1] 0.30000000000000004441

By default, R doesn’t print out all these zeros, but they are the result of many small rounding errors that occur when computers do calculations.

This is where almostEqual() comes in handy:

almostEqual(x, 0.3)
#> [1] TRUE

It only compares numbers out to a predefined decimal place, after which it ignores everything else. This will come in handy in your homework problems where you might get unexpected results.


Page sources:

Some content on this page has been modified from other courses, including:


EMSE 4571: Intro to Programming for Analytics (Spring 2022)
Thursdays | 12:45 - 3:15 PM EST | Tompkins 208 | Dr. John Paul Helveston | jph@gwu.edu
LICENSE: CC-BY-SA