class: middle, inverse .leftcol30[ <center> <img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/blob/master/images/logo.png?raw=true" width=250> </center> ] .rightcol70[ # Week 1: .fancy[Getting Started] ###
EMSE 4571 / 6571: Intro to Programming for Analytics ###
John Paul Helveston ###
January 16, 2025 ] --- class: center, inverse, middle # Two rules: -- ## 1) Be Present -- ## 2) Celebrate Mistakes --- class: inverse, middle # Week 1: .fancy[Getting Started] ### 1. Course orientation ### BREAK ### 2. Getting started with R & RStudio ### 3. Operators & data types ### 4. Preview of HW 1 --- class: inverse, middle # Week 1: .fancy[Getting Started] ### 1. .orange[Course orientation] ### BREAK ### 2. Getting started with R & RStudio ### 3. Operators & data types ### 4. Preview of HW 1 --- # Meet your instructor! .leftcol30[.circle[ <img src="https://www.jhelvy.com/images/lab/john_helveston_circle.png" width="300"> ]] .rightcol70[ ### John Helveston, Ph.D. .font80[ Assistant Professor, Engineering Management & Systems Engineering - 2016-2018 Postdoc at [Institute for Sustainable Energy](https://www.bu.edu/ise/), Boston University - 2016 PhD in Engineering & Public Policy at Carnegie Mellon University - 2015 MS in Engineering & Public Policy at Carnegie Mellon University - 2010 BS in Engineering Science & Mechanics at Virginia Tech - Website: [www.jhelvy.com](http://www.jhelvy.com/) ]] --- # Meet your tutors! .leftcol30[.circle[ <img src="images/pingfan_hu.jpg" width="300"> ]] .rightcol70[ ### **Pingfan Hu** - Graduate Teaching Assistant (GTA) - 2nd Year PhD student in EMSE ] --- # Meet your tutors! .leftcol30[.circle[ <img src="images/lola_nurullaeva.jpg" width="300"> ]] .rightcol70[ ### **Lola Nurullaeva** - Learning Assistant (LA) - EMSE Senior & P4A / EDA alumni ] --- # Course orientation -- ##
Everything you need will be on the course website: ### https://p4a.seas.gwu.edu/2025-Spring/ --- ##
Course is broken into **two chunks**: ### 1. Programming (before Spring Break) ### 2. Analytics (after Spring Break) -- <br> ### In the fall, you'll take EMSE 4572 / 6572: Exploratory Data Analysis ### [Fall 2024 Project Showcase](https://eda.seas.gwu.edu/showcase.html#fall-2024) --- # Learning Objectives <br> ## After this class, you will know how to... ### ...write
code to solve medium-sized tasks. ### ...pro-actively test and debug code. ### ...import, export, manipulate, and visualize data. --- # Attendance / Participation (7%) <br> ### **Attendance will be taken** and will be part of your participation grade --- # Homeworks (45% of grade) <br> -- ##
Every week (13 total, lowest dropped) -- ##
Due 11:59pm Tues. before class --- # Late submissions <br> ### - **3** late days - use them anytime, no questions asked ### - After that, 50% off for up to 24 hours after deadline, 0% afterwards ### - Contact me for special cases --- # Quizzes (18% of grade) <br> -- ##
In class (almost) every other week (10 total, drop lowest 2) -- ##
~10 minutes (1-3 questions) -- > **Why quiz at all?** There's a phenomenon called the "retrieval effect" - basically, you have to _practice_ remembering things, otherwise your brain won't remember them (details in the book ["Make It Stick: The Science of Successful Learning"](https://www.hup.harvard.edu/catalog.php?isbn=9780674729018)). --- # Exams (30% of grade) <br> ##
Midterm (weeks 1 - 7) on March 06 ##
Final (weeks 1 - 14) on May 08 --- # .center[Grades] <br> Component | Weight | Notes ---------------------------|--------|---------------- Participation / Attendance | 7% | Homeworks & Readings (13x) | 45% | Lowest 1 dropped Quizzes (7x) | 18% | Lowest 2 dropped Midterm Exam | 10% | Final Exam | 20% | --- # .center[Alternative Minimum Grade (AMG)] - Designed for those who struggle early but work hard to succeed in 2nd half. - Highest possible grade is "C" <br> Course Component | Weight -------------------|---- Best 10 Homeworks | 40% Best 4 Quizzes | 10% Midterm Exam | 10% Final Exam | 40% --- # Typing Bonus Challenge - Earn a **1% bonus to your final grade** by beating Professor Helveston in a speed typing challenge - Challenge held on https://monkeytype.com/ <br> (Yes, I'm serious...see rules [here](https://p4a.seas.gwu.edu/2025-Spring/syllabus.html#typing-bonus-challenge)) --- # Course policies -- .leftcol35[ - ## BE ON TIME - ## BE NICE - ## BE HONEST - ## DON'T CHEAT ] -- .rightcol65[ ## **Don't copy-paste others' code!** ] --- class: center # [AI Policy](https://p4a.seas.gwu.edu/2025-Spring/syllabus.html#use-of-chatgpt-and-other-ai-tools) ### ([Demo](https://chat.openai.com/)) <br> .leftcol[ ## Assignments 1-7:<br>**Not permitted** ] .rightcol[ ## Assignments 8-13:<br>**Permitted, with caveats** ] --- # How to succeed in this class -- ##
Participate during class! -- ##
Start assignments early and **read carefully**! -- ##
Get sleep and take breaks often! -- ##
Ask for help! --- # Getting Help -- ##
Use [Slack](https://emse-p4a-s25.slack.com/) to ask questions. -- ##
Meet with your tutors -- ##
[Schedule a call](https://jhelvy.appointlet.com/b/professor-helveston) w/Prof. Helveston -- ##
[GW Coders](http://gwcoders.github.io/) --- #
[Course Software](https://p4a.seas.gwu.edu/2025-Spring/software.html) <br> -- ##
[Slack](https://emse-p4a-s25.slack.com/): Install app & **turn notifications on**! -- ##
[R](https://cloud.r-project.org/) & [RStudio](https://posit.co/download/rstudio-desktop/): Install both. -- ##
[RStudio Cloud](https://posit.cloud/plans/free): A (free) web-based version of RStudio. --- class: inverse, center # .fancy[Intermission] ##
Install [course software](https://p4a.seas.gwu.edu/2025-Spring/software.html) if you haven't
−
+
05
:
00
--- class: inverse, middle # Week 1: .fancy[Getting Started] ### 1. Course orientation ### BREAK ### 2. .orange[Getting started with R & RStudio] ### 3. Operators & data types ### 4. Preview of HW 1 --- class: center ## What is
? ([Read a brief history here](https://bookdown.org/rdpeng/rprogdatascience/history-and-overview-of-r.html)) Chambers creates "S" (1976, Bell Labs)<br> Ross & Robert create "R" (1991, U. of Auckland) .cols3[ ## .center[[John Chambers](https://en.wikipedia.org/wiki/John_Chambers_(statistician)] <center> <img src="https://datascience.stanford.edu/sites/g/files/sbiybj25376/files/styles/medium_square/public/media/image/john-chambers_0.png?h=78b7a964&itok=8v_bN9yZ" width=280> </center> ] .cols3[ ## .center[[Ross Ihaka](https://en.wikipedia.org/wiki/Ross_Ihaka)] <center> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Ross_Ihaka_%285189180796%29.jpg/1280px-Ross_Ihaka_%285189180796%29.jpg" width=100%> </center> ] .cols3[ ## .center[[Robert Gentleman](https://en.wikipedia.org/wiki/Robert_Gentleman_(statistician)] <center> <img src="https://upload.wikimedia.org/wikipedia/commons/8/83/Robert_Gentleman_on_R_Consortium.jpg" width=250> </center> ] --- # Wait, why aren't we using Python? .leftcol[ - [Python](https://en.wikipedia.org/wiki/Python_(programming_language) is a general purpose language developed by [**Guido van Rossum**](https://en.wikipedia.org/wiki/Guido_van_Rossum), a computer scientist. - Unlike R, Python was not originally developed for data analysis. - Both languages are extremely useful, and **you should probably learn python too**. ### The vast majority of concepts we'll learn apply to python ] .rightcol[ <center> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Guido-portrait-2014-drc.jpg/1280px-Guido-portrait-2014-drc.jpg" width=100%> </center> ] --- class: center, middle ## What is RStudio? .leftcol[ ##
<center> <img src="images/engine.jpg" width=500> </center> ] .rightcol[ ##
Studio <center> <img src="images/dashboard.jpg" width=600> </center> ] --- class: center # RStudio Orientation .leftcol[ ## Open this <center> <img src="images/rstudio_ball.png" width=300> </center> ] .rightcol[ ## Not this <center> <img src="images/Rlogo.png" width=400> </center> ] --- # RStudio Orientation .leftcol70[ <center> <img src="images/rstudio_panes.png" width=650> </center> ] .rightcol30[ - Know the boxes - Customize the layout - Customize the look - [Extra themes](https://www.garrickadenbuie.com/project/rsthemes/) ] --- # Your first conveRsation ### Write stuff in the console, then press "enter" -- Example: **addition** ``` r 3 + 4 ``` ``` #> [1] 7 ``` -- Example: **error** ``` r 3 + "4" ``` ``` #> Error in 3 + "4": non-numeric argument to binary operator ``` --- # Storing values ### Use the "`<-`" symbol to assign _values_ to _objects_ -- ``` r x <- 40 x ``` ``` #> [1] 40 ``` -- ``` r x + 2 ``` ``` #> [1] 42 ``` --- # Storing values ### If you overwrite an object, R "forgets" the old value -- Example: ``` r x <- 42 x ``` ``` #> [1] 42 ``` -- ``` r x <- 50 x ``` ``` #> [1] 50 ``` --- # Storing values <br> ### You can also use the `=` symbol to assign values ``` r x = 50 x ``` ``` #> [1] 50 ``` ### ...but you should use `<-` --- # Storing values -- .leftcol[ ### **Pro tip 1**: ### Shortcut for `<-` symbol .left[ |OS | Shortcut |:--|:-------- |mac | `option` + `-` |windows | `alt` + `-` ] (see [here](https://support.posit.co/hc/en-us/articles/200711853-Keyboard-Shortcuts-in-the-RStudio-IDE) for more shortcuts) ] -- .rightcol[ ### **Pro tip 2**: ### Always surround `<-` with spaces Example: ``` r x<-2 ``` Does this mean `x <- 2` or `x < -2`? ] --- # Storing values ### You can store more than just numbers -- ``` r x <- "If you want to view paradise" y <- "simply look around and view it" ``` -- ``` r x ``` ``` #> [1] "If you want to view paradise" ``` ``` r y ``` ``` #> [1] "simply look around and view it" ``` --- .leftcol[ ## R ignores **extra space** ``` r x <- 2 y <- 3 z <- 4 ``` Check: ``` r x ``` ``` #> [1] 2 ``` ``` r y ``` ``` #> [1] 3 ``` ``` r z ``` ``` #> [1] 4 ``` ] -- .rightcol[ ## R cares about **casing** ``` r number <- 2 Number <- 3 numbeR <- 4 ``` Check: ``` r number ``` ``` #> [1] 2 ``` ``` r Number ``` ``` #> [1] 3 ``` ``` r numbeR ``` ``` #> [1] 4 ``` ] --- # Use `#` for comments ### R ignores everything after the `#` symbol Example: ``` r speed <- 42 # This is mph, not km/h speed ``` ``` #> [1] 42 ``` --- # Use meaningful variable names -- **Example**: You are recording the speed of a car in mph -- **Poor** variable name: ``` r x <- 42 ``` -- **Good** variable name: ``` r speed <- 42 ``` -- **Even better** variable name: ``` r speed_mph <- 42 ``` --- class: center # Use standard casing styles <center> <img src="images/horst_casing.jpg" width=700> </center> Art by [Allison Horst](https://github.com/allisonhorst/stats-illustrations) --- # Use standard casing styles .leftcol60[ <img src="images/horst_casing.jpg" width=600> Art by [Allison Horst](https://github.com/allisonhorst/stats-illustrations) ] .rightcol40[ I recommend using one of these: - `snake_case_uses_underscores` - `camelCaseUsesCaps` Example: ``` r days_in_week <- 7 monthsInYear <- 12 ``` ] --- ## The workspace .leftcol[ View all the current objects: ``` r objects() ``` ``` #> [1] "class" "days_in_week" "from" "input" "monthsInYear" "number" "numbeR" "Number" "output_file" "path_notes" "path_pdf" "path_slides" "proc" "render_args" #> [15] "render_fn" "root" "self_contained" "speed" "speed_mph" "to" "x" "y" "z" ``` ] -- .rightcol[ Remove an object by name: ``` r rm(class) objects() ``` ``` #> [1] "days_in_week" "from" "input" "monthsInYear" "number" "numbeR" "Number" "output_file" "path_notes" "path_pdf" "path_slides" "proc" "render_args" "render_fn" #> [15] "root" "self_contained" "speed" "speed_mph" "to" "x" "y" "z" ``` ] --- # View prior code in history pane <img src="images/rstudio_panes.png" width=500> -- # Use "up" arrow see previous code --- # Staying organized -- ## 1) Save your code in .R files > ### File > New File > R Script -- ## 2) Keep work in R Project files > ### File > New Project... --- class: inverse
−
+
10
:
00
.leftcol[.font80[ ## Your turn ### A. Practice getting organized 1. Open RStudio and create a new R project called `week1`. 2. Create a new R script and save it as `practice.R`. 3. Open the `practice.R` file and write your answers to these questions in it. ]] .rightcol[.font80[ ### B. Creating & working with objects 1) Create objects to store the values in this table: | City | Area<br>(sq mi) | Population<br>(thousands) | |----------------------|----------------|------------------------| | San Francisco, CA | 47 | 884 | | Chicago, IL | 228 | 2,716 | | Washington, DC | 61 | 694 | 2) Using the objects you created, answer the following questions: - Which city has the highest density? - How many _more_ people would need to live in DC for it to have the same population density as San Francisco? ]] --- class: inverse, middle # Week 1: .fancy[Getting Started] ### 1. Course orientation ### BREAK ### 2. Getting started with R & RStudio ### 3. .orange[Operators & data types] ### 4. Preview of HW 1 --- # R as a calculator .leftcol[ ## Basic operators: ### - Addition: `+` ### - Subtraction: `-` ### - Multiplication: `*` ### - Division: `/` ] -- .rightcol[ ## Other important operators: ### - Power: `^` ### - Integer Division: `%/%` ### - Modulus: `%%` ] --- # Integer division: `%/%` Integer division drops the remainder from regular division -- ``` r 4 / 3 # Regular division ``` ``` #> [1] 1.333333 ``` ``` r 4 %/% 3 # Integer division ``` ``` #> [1] 1 ``` --- # Integer division: `%/%` Integer division drops the remainder from regular division -- What will this return? ``` r 4 %/% 4 ``` -- ``` #> [1] 1 ``` -- What will this return? ``` r 4 %/% 5 ``` -- ``` #> [1] 0 ``` --- # Modulus operator: `%%` Modulus returns the _remainder_ after doing division -- ``` r 5 %% 3 ``` ``` #> [1] 2 ``` -- ``` r 3.1415 %% 3 ``` ``` #> [1] 0.1415 ``` --- # Modulus operator: `%%` Modulus returns the _remainder_ after doing division -- What will this return? ``` r 4 %% 4 ``` -- ``` #> [1] 0 ``` -- What will this return? ``` r 4 %% 5 ``` -- ``` #> [1] 4 ``` --- ## Odds and evens with `n %% 2` -- .leftcol[ If `n %% 2` is `0`, `n` is **EVEN** ``` r 10 %% 2 ``` ``` #> [1] 0 ``` ``` r 12 %% 2 ``` ``` #> [1] 0 ``` Also works with negative numbers! ``` r -42 %% 2 ``` ``` #> [1] 0 ``` ] -- .rightcol[ If `n %% 2` is `1`, `n` is **ODD** ``` r 1 %% 2 ``` ``` #> [1] 1 ``` ``` r 13 %% 2 ``` ``` #> [1] 1 ``` Also works with negative numbers! ``` r -47 %% 2 ``` ``` #> [1] 1 ``` ] --- ## Number "chopping" with 10s -- .leftcol[ The mod operator (`%%`) "chops" a number and returns everything to the _right_ ``` r 123456 %% 1 ``` ``` #> [1] 0 ``` ``` r 123456 %% 10 ``` ``` #> [1] 6 ``` ``` r 123456 %% 100 ``` ``` #> [1] 56 ``` ] -- .rightcol[ Integer division (`%/%`) "chops" a number and returns everything to the _left_ ``` r 123456 %/% 1 ``` ``` #> [1] 123456 ``` ``` r 123456 %/% 10 ``` ``` #> [1] 12345 ``` ``` r 123456 %/% 100 ``` ``` #> [1] 1234 ``` ] --- ## Number "chopping" with 10s - `%%` returns everything to the _right_ (`"chop" ->`) - `%/%` returns everything to the _left_ (`<- "chop"`) - The "chop" point is always just to the _right_ of the chopping digit <br> <div style="width:1000px"> <table class="table table-condensed"> <thead> <tr class="header"> <th>Example</th> <th>“Chop” point</th> <th></th> </tr> </thead> <tbody> <tr class="odd"> <td><code>1234 %% 1</code></td> <td><code>1234 |</code></td> <td>Right of the <code>1</code>’s digit</td> </tr> <tr class="even"> <td><code>1234 %% 10</code></td> <td><code>123 | 4</code></td> <td>Right of the <code>10</code>’s digit</td> </tr> <tr class="odd"> <td><code>1234 %% 100</code></td> <td><code>12 | 34</code></td> <td>Right of the <code>100</code>’s digit</td> </tr> <tr class="even"> <td><code>1234 %% 1000</code></td> <td><code>1 | 234</code></td> <td>Right of the <code>1,000</code>’s digit</td> </tr> <tr class="odd"> <td><code>1234 %% 10000</code></td> <td><code>| 1234</code></td> <td>Right of the <code>10,000</code>’s digit</td> </tr> </tbody> </table> </div> --- # Comparing things: **Relational operators** -- .leftcol[ ### Compare if condition is `TRUE` or `FALSE` using: - Less than: `<` - Less than or equal to : `<=` - Greater than or equal to: `>=` - Greater than: `>` - Equal: `==` - Not equal: `!=` ] -- .rightcol[.code60[ ``` r 2 < 2 ``` ``` #> [1] FALSE ``` ``` r 2 <= 2 ``` ``` #> [1] TRUE ``` ``` r (2 + 2) == 4 ``` ``` #> [1] TRUE ``` ``` r (2 + 2) != 4 ``` ``` #> [1] FALSE ``` ``` r "penguin" == "penguin" ``` ``` #> [1] TRUE ``` ]] --- # Comparing things: **Logical operators** <br> ### Make multiple comparisons with: ### - And: `&` ### - Or: `|` ### - Not: `!` --- # Comparing things: **Logical operators** .leftcol[ With "and" (`&`), every part must be `TRUE`, otherwise the whole statement is `FALSE`: ``` r (2 == 2) & (3 == 3) ``` ``` #> [1] TRUE ``` ``` r (2 == 2) & (2 == 3) ``` ``` #> [1] FALSE ``` ] -- .rightcol[ With "or" (`|`), if _any_ part is `TRUE`, the whole statement is `TRUE`: ``` r (2 == 2) | (3 == 3) ``` ``` #> [1] TRUE ``` ``` r (2 == 2) | (2 == 3) ``` ``` #> [1] TRUE ``` ] --- # Comparing things: **Logical operators** The "not" (`!`) symbol produces the _opposite_ statement: -- ``` r ! (2 == 2) ``` -- ``` #> [1] FALSE ``` -- ``` r ! (2 == 2) | (3 == 3) ``` -- ``` #> [1] TRUE ``` -- ``` r ! ((2 == 2) | (3 == 3)) ``` -- ``` #> [1] FALSE ``` --- # Comparing things: **Logical operators** ### Order precedence for logical operators: `! > & > |` -- .leftcol[ ``` r TRUE | FALSE & FALSE ``` ``` #> [1] TRUE ``` ``` r (TRUE | FALSE) & FALSE ``` ``` #> [1] FALSE ``` ] -- .rightcol[ ``` r ! TRUE | TRUE ``` ``` #> [1] TRUE ``` ``` r ! (TRUE | TRUE) ``` ``` #> [1] FALSE ``` ] --- # Comparing things: **Logical operators** .leftcol[ ### **Pro tip**: Use parentheses ``` r ! 3 == 5 # Confusing ``` ``` #> [1] TRUE ``` ``` r ! (3 == 5) # Less confusing ``` ``` #> [1] TRUE ``` ] --- ## Other important points .leftcol[ ### R follows BEDMAS: 1. **B**rackets 2. **E**xponents 3. **D**ivision 4. **M**ultiplication 5. **A**ddition 6. **S**ubtraction ] -- .rightcol[ ### **Pro tip**: Use parentheses ``` r 1 + 2 * 4 # Confusing ``` ``` #> [1] 9 ``` ``` r 1 + (2 * 4) # Less confusing ``` ``` #> [1] 9 ``` ] --- class: inverse
−
+
08
:
00
# Your turn Consider the following objects: ``` r w <- TRUE x <- FALSE y <- TRUE ``` Write code to answer the following questions: 1. Fill in _relational_ operators to make the following statement return `TRUE`: `! (w __ x) & ! (y __ x)` 2. Fill in _logical_ operators to make this statement return `FALSE`: `! (w __ x) | (y __ x)` --- # Data Types |Type | Description | Example |:---------|:----------------------------|:--------- |`double` | Numbers w/decimals (aka "float") | `3.14` |`integer` | Numbers w/out decimals | `42` |`character` | Text (aka "string") | `"this is some text"` |`logical` | Used for comparing objects | `TRUE`, `FALSE` --- ## Use `typeof()` to find the type ``` r typeof(2) ``` ``` #> [1] "double" ``` ``` r typeof("hello") ``` ``` #> [1] "character" ``` ``` r typeof(TRUE) ``` ``` #> [1] "logical" ``` --- # Numeric types (there are 2) -- .leftcol[ ## Integers ### No decimals (e.g. `7`) ] -- .rightcol[ ## Doubles (aka "float") ### Decimals (e.g. `7.0`) ] --- ## In R, numbers are "doubles" by default -- ``` r typeof(3) ``` ``` #> [1] "double" ``` ### R assumes that `3` is really `3.0` -- ### Make it an integer by adding `L`: ``` r typeof(3L) ``` ``` #> [1] "integer" ``` --- # Character types -- ### Use single or double quotes around anything: ``` r typeof('hello') ``` ``` #> [1] "character" ``` ``` r typeof("3") ``` ``` #> [1] "character" ``` -- Use single / double quotes if the string _contains_ a quote symbol: ``` r typeof("don't") ``` ``` #> [1] "character" ``` --- # Logical types -- .leftcol[ Logical data only have two values:<br>`TRUE` or `FALSE` ``` r typeof(TRUE) ``` ``` #> [1] "logical" ``` ``` r typeof(FALSE) ``` ``` #> [1] "logical" ``` ] -- .rightcol[ Note that these have to be in all caps,<br>and **not** in quotes: ``` r typeof('TRUE') ``` ``` #> [1] "character" ``` ``` r typeof(True) ``` ``` #> Error: object 'True' not found ``` ] --- # Logical types Use to answer questions about logical statements. Example: Is `1` greater than `2`? ``` r 1 > 2 ``` ``` #> [1] FALSE ``` -- Example: Is `2` greater than `1`? ``` r 1 < 2 ``` ``` #> [1] TRUE ``` --- ## Special values -- `Inf`: Infinity (_or really big numbers_) ``` r 1/0 ``` ``` #> [1] Inf ``` -- `NaN`: Not a Number ``` r 0/0 ``` ``` #> [1] NaN ``` -- `NA`: Not available (_value is missing_) -- `NULL`: no value whatsoever --- class: inverse # Your turn
−
+
05
:
00
## Will these return `TRUE` or `FALSE`? ## (try to answer first, then run the code to check) - `! typeof('3') == typeof(3)` - `(typeof(7) != typeof("FALSE")) | FALSE` - `! (typeof(TRUE) == typeof(FALSE)) & FALSE` --- class: inverse, middle # Week 1: .fancy[Getting Started] ### 1. Course orientation ### BREAK ### 2. Getting started with R & RStudio ### 3. Operators & data types ### 4. .orange[Preview of HW 1] --- class: center, middle # Go to the [schedule](https://p4a.seas.gwu.edu/2025-Spring/schedule.html) # ...and read carefully!