class: middle, inverse .leftcol30[ <center> <img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250> </center> ] .rightcol70[ # Week 8: .fancy[Python in R] ###
EMSE 4571: Intro to Programming for Analytics ###
John Paul Helveston ###
March 03, 2022 ] --- class: inverse # Quiz 5
05
:
00
.leftcol[ ## Go to `#class` channel in Slack for quiz link ## Open RStudio first! ## Rules: - You may use your notes and RStudio - You may **not** use any other resources (e.g. the internet, your classmates, etc.) ] .rightcol[ <br> <center> <img src="https://github.com/emse-p4a-gwu/2022-Spring/raw/main/images/quiz_doge.png" width="400"> </center> ] --- class: center, middle, inverse # .fancy[R tip of the week]: # `styler` --- # Install `styler` package <br> ```r install.packages("styler") ``` <br> Go to .red[Addins] menu, search for .red["style"], select .red["Style active file"] --- class: inverse, middle # Week 8: .fancy[Python in R] ### 1. Getting started ### 2. Python basics ### 3. Functions & methods ### 4. Loops & lists ### BREAK ### 5. Strings --- class: inverse, middle # Week 8: .fancy[Python in R] ### 1. .orange[Getting started] ### 2. Python basics ### 3. Functions & methods ### 4. Loops & lists ### BREAK ### 5. Strings --- class: center, middle ## Why Python? <center> <img src="images/languages.png" width=750> </center> <!-- https://towardsdatascience.com/predicting-the-future-popularity-of-programming-languages-4f28c80bd36f --> [image source](https://insights.stackoverflow.com/trends?tags=r%2Cpython%2Cjava%2Cphp%2Cc%23%2Cc%2B%2B%2Cruby) --- class: center, middle .leftcol[ # .center[
] <center> <img src="images/swiss_army_knife.png" width=350> </center> ] .rightcol[ # .center[
] <center> <img src="images/data_analysis.jpg" width=400> </center> ] --- .code90[ ## Install the `reticulate` library ```r install.packages("reticulate") ``` (Only do this once) ] -- <br> .code90[ ## Load the `reticulate` library ```r library(reticulate) ``` (Do this every time you use the package) ] --- ## Do you have Python on your computer? If note, you may see the following message pop up: ```r Would you like to install Miniconda? [Y/n]: ``` My recommendation: type `y` and press `enter` --- ## Starting Python Open a Python REPL ("**R**ead–**E**val–**P**rint-**L**oop"): ```r repl_python() ``` -- You should see the `>>>` symbol in the console. This means you're now using Python! (Remember, the R console has only one `>` symbol). -- **You want to use Python 3, not Python 2** Above the `>>>` symbols, it should say `"Python 3...."` --- ## Exiting Python (but we just got started?) If you want to get back to good 'ol R, just type the command `exit` into the Python console: ```r exit ``` (Note that you type `exit` and not `exit()` with parentheses). --- # Open a Python script > File --> New File --> Python Script -- <br> When you run code from a Python script, R automatically opens a Python REPL --- class: inverse, middle # Week 8: .fancy[Python in R] ### 1. Getting started ### 2. .orange[Python basics] ### 3. Functions & methods ### 4. Loops & lists ### BREAK ### 5. Strings --- # Operators <style type="text/css"> .remark-slide table{ width: 100%; } </style> -- .leftcol[.left[ ## Arithmetic operators Operator | R | Python -----------------|---------|----------- Integer division | `%/%` | `//` Modulus | `%%` | `%` Powers | `^` | `**` ]] -- .rightcol[.left[ ## Logical operators Operator | R | Python ----------|-----------|----------- And | `&` | `and`; `&` Or | | | `or`; | Not | `!` | `not`; `!` You can do this in Python: ```python (3 == 3) and (4 == 4) ``` ``` #> True ``` ]] --- ## Variable assignment Python only uses the `=` symbol to make assignments (no `<-`): ```python value = 3 value ``` ``` #> 3 ``` --- ## Data types Same data types as R, but with more "Computer Science-y" names: .leftcol60[.left[ Description | R | Python ---------------------|--------------|----------- numeric (w/decimal) | `double` | `float` integer | `integer` | `int` character | `character`| `str` logical | `logical` | `bool` ]] --- ## Data types Three important distinctions: .left[ Data type | R | Python --------------|--------------------|----------- Logical | `TRUE` or `FALSE` | `True` or `False` Numbers | `double` by default | `int` by default (unless has decimal) Nothing | `NULL`| `None` ] --- .center[**Get type**] .leftcol[ .center[**R**: `typeof()`] ```r typeof(3.14) ``` ``` #> [1] "double" ``` ```r typeof(3L) ``` ``` #> [1] "integer" ``` ```r typeof("3") ``` ``` #> [1] "character" ``` ```r typeof(TRUE) ``` ``` #> [1] "logical" ``` ] .rightcol[ .center[**Python**: `type()`] ```python type(3.14) ``` ``` #> <class 'float'> ``` ```python type(3) ``` ``` #> <class 'int'> ``` ```python type("3") ``` ``` #> <class 'str'> ``` ```python type(True) ``` ``` #> <class 'bool'> ``` ] --- .center[**Check type**] .leftcol[ .center[**R**: `is.______()`] ```r is.double(3.14) ``` ``` #> [1] TRUE ``` ```r is.integer(3L) ``` ``` #> [1] TRUE ``` ```r is.character("3") ``` ``` #> [1] TRUE ``` ```r is.logical(TRUE) ``` ``` #> [1] TRUE ``` ] .rightcol[ .center[**Python**: `type() == type`] ```python type(3.14) == float ``` ``` #> True ``` ```python type(3) == int ``` ``` #> True ``` ```python type("3") == str ``` ``` #> True ``` ```python type(True) == bool ``` ``` #> True ``` ] --- .center[**Convert type**] .leftcol[ .center[**R**: `as.______()`] ```r as.double("3") ``` ``` #> [1] 3 ``` ```r as.integer(3.14) ``` ``` #> [1] 3 ``` ```r as.character(3.14) ``` ``` #> [1] "3.14" ``` ```r as.logical(3.14) ``` ``` #> [1] TRUE ``` ] .rightcol[ .center[**Python**: `______()`] ```python float("3") ``` ``` #> 3.0 ``` ```python int(3.14) ``` ``` #> 3 ``` ```python str(3.14) ``` ``` #> '3.14' ``` ```python bool(3.14) ``` ``` #> True ``` ] --- ## Quick practice
02
:
00
Write Python code to do the following: 1. Create an object `x` that stores the value `"123"` 2. Create an object `y` that is `x` converted to an integer 3. Write code to confirm that `y` is indeed an integer 4. Write a logical statement to determine if `y` is odd or even --- class: inverse, middle # Week 8: .fancy[Python in R] ### 1. Getting started ### 2. Python basics ### 3. .orange[Functions & methods] ### 4. Loops & lists ### BREAK ### 5. Strings --- # Python and R have many similar functions -- .leftcol[ .center[**R**] ```r abs(-1) ``` ``` #> [1] 1 ``` ```r round(3.14) ``` ``` #> [1] 3 ``` ```r round(3.14, 1) ``` ``` #> [1] 3.1 ``` ] .rightcol[ .center[**Python**] ```python abs(-1) ``` ``` #> 1 ``` ```python round(3.14) ``` ``` #> 3 ``` ```python round(3.14, 1) ``` ``` #> 3.1 ``` ] --- # Writing functions .leftcol[ .center[**R**] ```r isEven <- function(n) { if (n %% 2 == 0) { return(TRUE) } return(FALSE) } ``` ] -- .rightcol[ .center[**Python**] ```python def isEven(n): if (n % 2 == 0): return(True) return(False) ``` Note: - Functions start with `def` - Use `:` and indentation instead of `{}` - **Indentation is precisely 4 spaces!** ] --- # Writing test functions .leftcol[ .center[**R**] ```r test_isEven <- function() { cat("Testing isEven(n)...") stopifnot(isEven(2) == TRUE) stopifnot(isEven(1) == FALSE) cat("Passed!") } ``` ] -- .rightcol[ .center[**Python**] ```python def test_isEven(): print("Testing isEven(n)...") assert(isEven(2) == True) assert(isEven(1) == False) print("Passed!") ``` Note: - Use `print()` instead of `cat()` - Use `assert()` instead of `stopifnot()` ] --- # Python Methods Python objects have "methods" - special functions that _belong_ to certain object classes. -- Example: Make a string upper case .leftcol[ .center[**R**] Use `str_to_upper()` function ```r s <- "foo" stringr::str_to_upper(s) ``` ``` #> [1] "FOO" ``` ] -- .rightcol[ .center[**Python**] Use `upper()` _method_ ```python s = "foo" s.upper() ``` ``` #> 'FOO' ``` ] --- # Python Methods See all the available methods with `dir` function: ```python s = "foo" dir(s) ``` ``` #> ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] ``` --- class: inverse, middle, center # R-Python magic --- # R-Python magic You can source a Python script from R, then use the Python function in R! -- Inside your `notes-blank.py` file, you have the following function defined: ```python def isEven(n): if (n % 2 == 0): return(True) return(False) ``` -- Open your `notes.R` file and _source_ the `notes-blank.py` file: ```r reticulate::source_python('notes-blank.py') ``` -- Magically, the function `isEven(n)` now works inside R! --- class: inverse
15
:
00
# Your turn Write the following two functions in Python code: 1. `hypotenuse(a, b)`: Returns the hypotenuse of the two lines of length `a` and `b`. 2. `isRightTriangle(a, b, c)`: Returns `True` if the triangle formed by the lines of length `a`, `b`, and `c` is a right triangle and `False` otherwise. **Hint**: you may not know which value (`a`, `b`, or `c`) is the hypotenuse. --- class: inverse, middle # Week 8: .fancy[Python in R] ### 1. Getting started ### 2. Python basics ### 3. Functions & methods ### 4. .orange[Loops & lists] ### BREAK ### 5. Strings --- ## `for` loops .leftcol[ .center[**R**] ```r for (i in seq(1, 5, 2)) { cat(i, '\n') } ``` ``` #> 1 #> 3 #> 5 ``` ] -- .rightcol[ .center[**Python**] ```python for i in range(1, 5, 2): print(i) ``` ``` #> 1 #> 3 ``` Notes: - `range()` leaves out stopping number - No `()` in `for` loop line ] --- ## `while` loops .leftcol[ .center[**R**] ```r i <- 1 while (i <= 5) { print(i) i <- i + 2 } ``` ``` #> [1] 1 #> [1] 3 #> [1] 5 ``` ] -- .rightcol[ .center[**Python**] ```python i = 1 while i <= 5: print(i) i += 2 ``` ``` #> 1 #> 3 #> 5 ``` Notes: - Could also use `i = i + 2` to increment - No `()` in `while` loop line ] --- # Python lists These are **not** the same as R vectors! (They're equivalent to R lists) -- Universal list creator: `[]` ```python [1, 2, 3] ``` ``` #> [1, 2, 3] ``` -- Lists can store different types ```python [1, "foo", True] ``` ``` #> [1, 'foo', True] ``` --- ## Adding and removing items -- .leftcol[ Add items with `list.append()` ```python x = [1, 2, 3] x.append(7) x ``` ``` #> [1, 2, 3, 7] ``` **Note**: You don't have to overright `a`,<br> i.e. Don't do this: `x = x.append(7)` ] -- .rightcol[ Remove items with `list.remove()` ```python x = [1, 2, 3] x.remove(3) x ``` ``` #> [1, 2] ``` ] --- ## Sorting lists ```python x = [1, 5, 3] ``` -- .leftcol[ Sorting that returns a new object ```python sorted(x) ``` ``` #> [1, 3, 5] ``` ```python sorted(x, reverse = True) ``` ``` #> [5, 3, 1] ``` ```python x ``` ``` #> [1, 5, 3] ``` ] -- .rightcol[ Sort the object `x` _without_ creating a new object ```python x.sort() x ``` ``` #> [1, 3, 5] ``` ] --- # Slicing lists with `[]` ```python x = ['A', 'list', 'of', 'words'] ``` -- .leftcol[.code80[ Indices start at 0: ```python x[0] # Returns the first element ``` ``` #> 'A' ``` ```python x[3] # Returns the third element ``` ``` #> 'words' ``` ```python x[len(x)-1] # Returns the last element ``` ``` #> 'words' ``` ]] -- .rightcol[.code80[ Slicing with a vector of indices: ```python x[0:3] # Returns the first 3 elements ``` ``` #> ['A', 'list', 'of'] ``` ]] --- # Negative indices slice from the end ```python x = ['A', 'list', 'of', 'words'] ``` -- .leftcol[.code80[ Indices start at 0: ```python x[-1] # Returns the last element ``` ``` #> 'words' ``` ```python x[-2] # Returns 2nd-to-last element ``` ``` #> 'of' ``` ```python x[-len(x)] # Returns first element ``` ``` #> 'A' ``` ]] -- .rightcol[.code80[ Slicing with a vector of indices: ```python x[-3:-1] # Returns middle 2 elements ``` ``` #> ['list', 'of'] ``` ]] --- ## Note on 0 indexing ```python x = ["A", "B", "C", "D", "E"] ``` -- List items sit _between_ fence posts. ```python index: 0 1 2 3 4 | | | | | | item: | "A" | "B" | "C" | "D" | "E" | | | | | | | ``` -- You slice at the _fence post_ number to get elements _between_ the posts. .leftcol[ ```python x[0:1] ``` ``` #> ['A'] ``` ] .rightcol[ ```python x[0:3] ``` ``` #> ['A', 'B', 'C'] ``` ] --- class: inverse
15
:
00
# Your turn Write the following two functions in Python code: 1. `factorial(n)`: Returns the factorial of `n`, e.g. `3! = 3*2*1 = 6`. Note that `0` is a special case, and `0! = 1`. Assume `n >= 0`. 2. `nthHighestValue(n, x)`: Returns the nth highest value in a list of numbers. For example, if `x = [5, 1, 3]`, then `nthHighestValue(1, x)` should return `5`, because `5` is the 1st highest value in `x`, and `nthHighestValue(2, x)` should return `3` because it's the 2nd highest value in `x`. Assume that `n <= len(x)`. --- class: inverse, center # .fancy[Break]
05
:
00
--- class: inverse, middle # Week 8: .fancy[Python in R] ### 1. Getting started ### 2. Python basics ### 3. Functions & methods ### 4. Loops & lists ### BREAK ### 5. .orange[Strings] --- ## Doing "math" with strings -- .leftcol[ Concatenation: .center[**R**] ```r paste("foo", "bar", sep = "") ``` ``` #> [1] "foobar" ``` ] -- .rightcol[ <br> .center[**Python**] ```python "foo" + "bar" ``` ``` #> 'foobar' ``` ] -- .leftcol[ Repetition: .center[**R**] ```r str_dup("foo", 3) ``` ``` #> [1] "foofoofoo" ``` ] -- .rightcol[ <br> .center[**Python**] ```python "foo" * 3 ``` ``` #> 'foofoofoo' ``` ] --- ## Using word commands with strings .leftcol[ Sub-string detection: .center[**R**] ```r str_detect('Apple', 'App') ``` ``` #> [1] TRUE ``` ] -- .rightcol[ <br> .center[**Python**] ```python 'App' in 'Apple' ``` ``` #> True ``` ] --- class: inverse, middle, center ## Most string manipulation is done with _methods_ -- .leftcol[ .center[**R**] ```r str_function(s) ``` ] -- .rightcol[ .center[**Python**] ```python s.method() ``` ] --- ## Case conversion -- .leftcol[ .center[**R**] ```r s <- "A longer string" str_to_upper(s) ``` ``` #> [1] "A LONGER STRING" ``` ```r str_to_lower(s) ``` ``` #> [1] "a longer string" ``` ```r str_to_title(s) ``` ``` #> [1] "A Longer String" ``` ] -- .rightcol[ .center[**Python**] ```python s = "A longer string" s.upper() ``` ``` #> 'A LONGER STRING' ``` ```python s.lower() ``` ``` #> 'a longer string' ``` ```python s.title() ``` ``` #> 'A Longer String' ``` ] --- ## Trimming white space -- .leftcol[ .center[**R**] ```r s <- " A string with space " str_trim(s) ``` ``` #> [1] "A string with space" ``` ] -- .rightcol[ .center[**Python**] ```python s = " A string with space " s.strip() ``` ``` #> 'A string with space' ``` ] --- ## Replacing strings -- .leftcol[ .center[**R**] ```r s <- "Hello world" str_replace(s, "o", "a") ``` ``` #> [1] "Hella world" ``` ```r str_replace_all(s, "o", "a") ``` ``` #> [1] "Hella warld" ``` ] -- .rightcol[ .center[**Python**] ```python s = "Hello world" s.replace("o", "a") ``` ``` #> 'Hella warld' ``` ] --- ## Merge a vector / list of strings together -- .leftcol[ .center[**R**] ```r s <- c("Hello", "world") paste(s, collapse = "") ``` ``` #> [1] "Helloworld" ``` ] -- .rightcol[ .center[**Python**] ```python s = ["Hello", "world"] "".join(s) ``` ``` #> 'Helloworld' ``` ] --- ## Python has some super handy string methods -- Detect if string contains only numbers: .leftcol[ .center[**R**] R doesn't have a function for this...<br>here's one way to do it: ```r s <- "42" ! is.na(as.numeric(s)) ``` ``` #> [1] TRUE ``` ] -- .rightcol[ .center[**Python**] ```python s = "42" s.isnumeric() ``` ``` #> True ``` ] --- ## Getting sub-strings with `[]` -- .leftcol[ .center[**R**] ```r s <- "Apple" str_sub(s, 1, 3) ``` ``` #> [1] "App" ``` ] -- .rightcol[ .center[**Python**] ```python s = "Apple" s[0:3] ``` ``` #> 'App' ``` Notes: - Indexing is the same as lists ] --- ## Getting sub-string indices -- .leftcol[ .center[**R**] ```r s <- "Apple" str_locate(s, "pp") ``` ``` #> start end #> [1,] 2 3 ``` ] -- .rightcol[ .center[**Python**] ```python s = "Apple" s.index("pp") ``` ``` #> 1 ``` Note: - Only returns the starting index ] --- ## String splitting Both languages return a list: -- .leftcol[ .center[**R**] ```r s <- "Apple" str_split(s, "pp") ``` ``` #> [[1]] #> [1] "A" "le" ``` ] -- .rightcol[ .center[**Python**] ```python s = "Apple" s.split("pp") ``` ``` #> ['A', 'le'] ``` ] --- ## Python can only split individual strings -- .leftcol[ R can split vectors of strings ```r s <- c("Apple", "Snapple") str_split(s, "pp") ``` ``` #> [[1]] #> [1] "A" "le" #> #> [[2]] #> [1] "Sna" "le" ``` ] -- .rightcol[ .center[**Python**] ```python s = ["Apple", "Snapple"] s.split("pp") ``` ``` #> Error in py_call_impl(callable, dots$args, dots$keywords): AttributeError: 'list' object has no attribute 'split' ``` ] --- ## Need **numpy** package for this in Python ```python import numpy as np s = np.array(["Apple", "Snapple"]) np.char.split(s, "pp") ``` ``` #> array([list(['A', 'le']), list(['Sna', 'le'])], dtype=object) ``` -- <br> You'll need to install **numpy** to use this: ```r py_install("numpy") ``` --- class: inverse
15
:
00
# Your turn .font80[ Write the following two functions in Python code: 1. `sortString(s)`: Takes a string `s` and returns back an alphabetically sorted string. **Hint**: Use `list(s)` to break a string into a list of letters. - `sortString("cba") == "abc"` - `sortString("abedhg") == "abdegh"` - `sortString("AbacBc") == "ABabcc"` 2. `areAnagrams(s1, s2)`: Takes two strings, `s1` and `s2`, and returns `True` if the strings are [anagrams](https://en.wikipedia.org/wiki/Anagram), and `False` otherwise. **Treat lower and upper case as the same letters**. - `areAnagrams("", "") == True` - `areAnagrams("aabbccdd", "bbccddee") == False` - `areAnagrams("TomMarvoloRiddle", "IAmLordVoldemort") == True` ] --- # [HW 8](https://p4a.seas.gwu.edu/2022-Spring/hw8-python.html) I suggest starting with `reticulate::repl_python()` to work in Python from RStudio. -- - Submit your "hw8.py" file to the autograder - it will (hopefully) work