Learning Objectives
- Know some common functions in R.
- Know how R handles function arguments and named arguments.
- Know how to install, load, and use functions from external R packages.
- Practice programming with functions using the TurtleGraphics package.
Suggested Readings
- Chapter 3.5 of Danielle Navarro’s book “Learning Statistics With R”
You can do a lot with the basic operators like +
,
-
, and *
, but to do more advanced calculations
you’re going to need to start using functions.1
R has a lot of very useful built-in functions. For example, if I
wanted to take the square root of 225, I could use R’s built-in square
root function sqrt()
:
sqrt(225)
#> [1] 15
Here the letters sqrt
are short for “square root,” and
the value inside the ()
is the “argument” to the function.
In the example above, the value 225
is the “argument”.
Keep in mind that not all functions have (or require) arguments:
date() # Returns the current date and time
#> [1] "Wed Apr 12 17:33:14 2023"
(the date above is the date this page was last built)
Some functions have more than one argument. For example, the
round()
function can be used to round some value to the
nearest integer or to a specified decimal place:
round(3.14165) # Rounds to the nearest integer
#> [1] 3
round(3.14165, 2) # Rounds to the 2nd decimal place
#> [1] 3.14
Not all arguments are mandatory. With the round()
function, the decimal place is an optional input - if nothing
is provided, the function will round to the nearest integer by
default.
In the case of round()
, it’s not too hard to remember
which argument comes first and which one comes second. But it starts to
get very difficult once you start using complicated functions that have
lots of arguments. Fortunately, most R functions use argument
names to make your life a little easier. For the
round()
function, for example, the number that needs to be
rounded is specified using the x
argument, and the number
of decimal points that you want it rounded to is specified using the
digits
argument, like this:
round(x = 3.1415, digits = 2)
#> [1] 3.14
Notice that the first time I called the round()
function
I didn’t actually specify the digits
argument at all, and
yet R somehow knew that this meant it should round to the nearest whole
number. How did that happen? The answer is that the digits
argument has a default value of 0
, meaning
that if you decide not to specify a value for digits
then R
will act as if you had typed digits = 0
.
This is quite handy: most of the time when you want to round a number
you want to round it to the nearest whole number, and it would
be pretty annoying to have to specify the digits
argument
every single time. On the other hand, sometimes you actually do want to
round to something other than the nearest whole number, and it would be
even more annoying if R didn’t allow this! Thus, by having
digits = 0
as the default value, we get the best of both
worlds.
Not sure what a function does, how many arguments it has, or what the
argument names are? Ask R for help by typing ?
and then the
function name, and R will return some documentation about it. For
example, type ?round()
into the console and R will return
information about how to use the round()
function.
In the same way that R allows us to put multiple operations together
into a longer command (like 1 + 2 * 4
for instance), it
also lets us put functions together and even combine functions with
operators if we so desire. For example, the following is a perfectly
legitimate command:
round(sqrt(7), digits = 2)
#> [1] 2.65
When R executes this command, starts out by calculating the value of
sqrt(7)
, which produces an intermediate value of
2.645751
. The command then simplifies to
round(2.645751, digits = 2)
, which rounds the value to
2.65
.
R has LOTS of functions. Many of the basic math functions are somewhat self-explanatory, but it can be hard to remember the specific function name. Below is a reference table of some frequently used math functions.
Function | Description | Example input | Example output |
---|---|---|---|
round(x, digits=0) |
Round x to the digits decimal place |
round(3.1415, digits=2) |
3.14 |
floor(x) |
Round x down the nearest integer |
floor(3.1415) |
3 |
ceiling(x) |
Round x up the nearest integer |
ceiling(3.1415) |
4 |
abs() |
Absolute value | abs(-42) |
42 |
min() |
Minimum value | min(1, 2, 3) |
1 |
max() |
Maximum value | max(1, 2, 3) |
3 |
sqrt() |
Square root | sqrt(64) |
8 |
exp() |
Exponential | exp(0) |
1 |
log() |
Natural log | log(1) |
0 |
factorial() |
Factorial | factorial(5) |
120 |
You will often need to check the data type of objects and convert them to other types. To handle this, use these patterns:
x
: is.______()
x
: as.______()
In each of these patterns, replace “______
” with:
character
logical
numeric
/ double
/
integer
You can convert an object from one type to another using
as.______()
, replacing “______
” with a data
type:
Convert numeric types:
as.numeric("3.1415")
#> [1] 3.1415
as.double("3.1415")
#> [1] 3.1415
as.integer("3.1415")
#> [1] 3
Convert non-numeric types:
as.character(3.1415)
#> [1] "3.1415"
as.logical(3.1415)
#> [1] TRUE
A few notes to keep in mind:
When converting from a numeric to a
logical, as.logical()
will always return
TRUE
for any numeric value other than 0
, for
which it returns FALSE
.
as.logical(7)
#> [1] TRUE
as.logical(0)
#> [1] FALSE
The reverse is also true
as.numeric(TRUE)
#> [1] 1
as.numeric(FALSE)
#> [1] 0
Not everything can be converted. For example, if you try to
coerce a character that contains letters into a number, R will return
NA
, because it doesn’t know what number to choose:
as.numeric('foo')
#> [1] NA
The as.integer()
function behaves the same as
floor()
:
as.integer(3.14)
#> [1] 3
as.integer(3.99)
#> [1] 3
Similar to the as.______()
format, you can check if an
object is a specific data type using is.______()
, replacing
“______
” with a data type.
Checking numeric types:
is.numeric(3.1415)
#> [1] TRUE
is.double(3.1415)
#> [1] TRUE
is.integer(3.1415)
#> [1] FALSE
Checking non-numeric types:
is.character(3.1415)
#> [1] FALSE
is.logical(3.1415)
#> [1] FALSE
One thing you’ll notice is that is.integer()
often gives
you a surprising result. For example, why did is.integer(7)
return FALSE
?. Well, this is because numbers are
doubles by default in R, so even though 7
looks like an integer, R thinks it’s a double.
The safer way to check if a number is an integer in value is to compare it against itself converted into an integer:
7 == as.integer(7)
#> [1] TRUE
When you start R, it only loads the “Base R” functions
(e.g. sqrt()
, round()
, etc.), but there are
thousands and thousands of additional functions stored in external
packages.
To install a package, use the install.packages()
function. Make sure you put the package name in quotes:
install.packages("packagename") # This works
install.packages(packagename) # This doesn't work
Just like most software, you only need to install a package once.
After installing a package, you can’t immediately use the functions
that the package contains. This is because when you start up R only the
“base” functions are loaded. If you want R to also load the functions
inside a package, you have to load that package, which you do
with the library()
function. In contrast to the
install.packages()
function, you don’t need quotes around
the package name to load it:
library("packagename") # This works
library(packagename) # This also works
Here’s a helpful image to keep the two ideas of installing vs loading separate:
As an example, try installing the Wikifacts package, by Keith McNulty:
install.packages("wikifacts") # Remember - you only have to do this once!
Now that you have the package installed on your computer, try loading
it using library(wikifacts)
, then trying using some of it’s
functions:
library(wikifacts) # Load the library
wiki_randomfact()
#> [1] "Did you know that on January 27 in 945 – The brothers Stephen and Constantine Lekapenos, having deposed their father as Byzantine emperor a few weeks earlier, were themselves overthrown by Constantine VII, their co-emperor. (Courtesy of Wikipedia)"
wiki_didyouknow()
#> [1] "Did you know that when Pehr Kalm explored North America in the mid-1700s, he reported being served apple dumplings (pictured) at every meal? (Courtesy of Wikipedia)"
In case you’re wondering, the only thing this package does is generate messages containing random facts from Wikipedia.
Sometimes you may only want to use a single function from a library without having to load the whole thing. To do so, use this recipe:
packagename::functionname()
Here I use the name of the package followed by
::
to tell R that I’m looking for a function that is in
that package. For example, if I didn’t want to load the whole
wikifacts library but still wanted to use the
wiki_randomfact()
function, I could do this:
wikifacts::wiki_randomfact()
#> [1] "Did you know that on December 12 in 1936 – Generalissimo Chiang Kai-shek (pictured) of the Republic of China was kidnapped by Marshal Zhang Xueliang, a former warlord of Manchuria. (Courtesy of Wikipedia)"
Where this is particularly handy is when two packages have a function
with the same name. If you load both library, R might not know which
function to use. In those cases, it’s best to also provide the
package name. For example, let’s say there was a
package called apples and another called
bananas, and each had a function named
fruitName()
. If I wanted to use each of them in my code, I
would need to specify the package names like this:
apples::fruitName()
bananas::fruitName()
Turtle graphics is a classic teaching tool in computer science, originally invented in the 1960s and re-implemented over and over again in different programming languages.
In R, there is a similar package called TurtleGraphics. To get started, install the package (remember, you only need to do this once on your computer):
install.packages('TurtleGraphics')
Once installed, load the package (remember, you have to load this every time you restart R to use the package!):
library(TurtleGraphics)
Here’s the idea. You have a turtle, and she lives in a nice warm
terrarium. The terrarium is 100 x 100 units in size, where the
lower-left corner is at the (x, y)
position of
(0, 0)
. When you call turtle_init()
, the
turtle is initially positioned in the center of the terrarium at
(50, 50)
:
turtle_init()
You can move the turtle using a variety of movement functions (see
?turtle_move()
), and she will leave a trail where ever she
goes. For example, you can move her 10 units forward from her starting
position:
turtle_init()
turtle_forward(distance = 10)
You can also make the turtle jump to a new position (without drawing
a line) by using the turtle_setpos(x, y)
, where
(x, y)
is a coordinate within the 100 x 100 terrarium:
turtle_init()
turtle_setpos(x=10, y=10)
Simple enough, right? But what if I want my turtle to draw a more
complicated shape? Let’s say I want her to draw a hexagon. There are six
sides to the hexagon, so the most natural way to write code for this is
to write a for
loop that loops over the sides (don’t worry
if this doesn’t make sense yet - we’ll get to loops in week 5!). At each iteration within the
loop, I’ll have the turtle walk forwards, and then turn 60 degrees to
the left. Here’s what happens:
turtle_init()
for (side in 1:6) {
turtle_forward(distance = 10)
turtle_left(angle = 60)
}
Cool! As you draw more complex shapes, you can speed up the process
by wrapping your turtle commands inside the turtle_do({})
function. This will skip the animations of the turtle moving and will
jump straight to the final position. For example, here’s the hexagon
again without animations:
turtle_init()
turtle_do({
for (side in 1:6) {
turtle_forward(distance = 10)
turtle_left(angle = 60)
}
})
Page sources:
Some content on this page has been modified from other courses, including:
Technically speaking, operators are functions in R: the
addition operator +
is a convenient way of calling the
addition function '+'()
. Thus 10+20
is
equivalent to the function call '+'(20, 30)
. Not
surprisingly, no-one ever uses this version.↩︎