Strings

.leftcol30[
<center>
<img src="https://github.com/emse-p4a-gwu/emse-p4a-gwu.github.io/raw/master/images/p4a_hex_sticker.png" width=250>
</center>
]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M496 128v16a8 8 0 0 1-8 8h-24v12c0 6.627-5.373 12-12 12H60c-6.627 0-12-5.373-12-12v-12H24a8 8 0 0 1-8-8v-16a8 8 0 0 1 4.941-7.392l232-88a7.996 7.996 0 0 1 6.118 0l232 88A8 8 0 0 1 496 128zm-24 304H40c-13.255 0-24 10.745-24 24v16a8 8 0 0 0 8 8h464a8 8 0 0 0 8-8v-16c0-13.255-10.745-24-24-24zM96 192v192H60c-6.627 0-12 5.373-12 12v20h416v-20c0-6.627-5.373-12-12-12h-36V192h-64v192h-64V192h-64v192h-64V192H96z"/></svg> EMSE 4571: Intro to Programming for Analytics
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M224 256c70.7 0 128-57.3 128-128S294.7 0 224 0 96 57.3 96 128s57.3 128 128 128zm89.6 32h-16.7c-22.2 10.2-46.9 16-72.9 16s-50.6-5.8-72.9-16h-16.7C60.2 288 0 348.2 0 422.4V464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48v-41.6c0-74.2-60.2-134.4-134.4-134.4z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M0 464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V192H0v272zm320-196c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zM192 268c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12h-40c-6.6 0-12-5.4-12-12v-40zM64 268c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12H76c-6.6 0-12-5.4-12-12v-40zm0 128c0-6.6 5.4-12 12-12h40c6.6 0 12 5.4 12 12v40c0 6.6-5.4 12-12 12H76c-6.6 0-12-5.4-12-12v-40zM400 64h-48V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H160V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H48C21.5 64 0 85.5 0 112v48h448v-48c0-26.5-21.5-48-48-48z"/></svg> February 24, 2022
]

---

# Quiz 4

## Open RStudio first!

## Rules:

- You may use your notes and RStudio
- You may **not** use any other resources (e.g. the internet, your classmates, etc.)
]

.rightcol[
<br>
<center>
<img src="https://github.com/emse-p4a-gwu/2022-Spring/raw/main/images/quiz_doge.png" width="400">
</center>
]

---

---

# New HW policies (after Spring Break)

<br>

## - Up to 3 days late for full credit
## - Up to 1 week for 50% credit
## - Beyond 1 week late = 0

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. Case conversion & substrings
### 3. Padding, splitting, & merging

### BREAK

### 4. Detecting & replacing

---

# Week 7: .fancy[Strings]

### 1. .orange[Making strings]
### 2. Case conversion & substrings
### 3. Padding, splitting, & merging

### BREAK

### 4. Detecting & replacing

---

## Install the `stringr` library

```r
install.packages("stringr")
```

(Only do this once...and you already did this in HW 2)

]

<br>

## Load the `stringr` library

```r
library(stringr)
```

(Do this every time you use the package)

]

---

## Make a string with<br>'single' or "double" quotes

```r
cat("This is a string")
```

```
#> This is a string
```

```r
cat('This is a string')
```

```
#> This is a string
```

]]

## Use them where<br>it makes sense

Use double quotes when `'` is in the string

```r
cat("It's great!")
```

```
#> It's great!
```

Use single quotes when `"` is in the string

```r
cat('I said, "Hello"')
```

```
#> I said, "Hello"
```

]]

---

# What if a string has both `'` and `"` symbols?

Example: `It's nice to say, "Hello"`

```r
cat("It's nice to say, "Hello"")
```

```
#> Error: <text>:1:25: unexpected symbol
#> 1: cat("It's nice to say, "Hello
#>                             ^
```

]

```r
cat('It's nice to say, "Hello"')
```

```
#> Error: <text>:1:9: unexpected symbol
#> 1: cat('It's
#>             ^
```

]

---

# "Escaping" to the rescue!

### Use the `\` symbol to "escape" a literal symbol

```r
cat("It's nice to say, \"Hello\"") # Double quote
```

```
#> It's nice to say, "Hello"
```

```r
cat('It\'s nice to say, "Hello"') # Single quote
```

```
#> It's nice to say, "Hello"
```

]]

### Commonly escaped symbols:

```r
cat('This\nthat') # New line: \n
```

```
#> This
#> that
```

```r
cat('This\tthat') # Tab space: \t
```

```
#> This	that
```

```r
cat('This\\that') # Backslash: \\
```

```
#> This\that
```

]]

---

## **String constants**: Sets of common strings

```r
letters
```

```
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
```

```r
LETTERS
```

```
#>  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
```

---

## **String constants**: Sets of common strings

```r
month.name
```

```
#>  [1] "January"   "February"  "March"     "April"     "May"       "June"      "July"      "August"    "September" "October"   "November"  "December"
```

```r
month.abb
```

```
#>  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
```

---

## The **stringr** library has a few _longer_ string constants:
### `fruit`, `words`, `sentences`

```r
length(fruit)
```

```
#> [1] 80
```

```r
fruit[1:4]
```

```
#> [1] "apple"   "apricot" "avocado" "banana"
```

```r
length(words)
```

```
#> [1] 980
```

```r
words[1:4]
```

```
#> [1] "a"        "able"     "about"    "absolute"
```

]]

```r
length(sentences)
```

```
#> [1] 720
```

```r
sentences[1:4]
```

```
#> [1] "The birch canoe slid on the smooth planks."  "Glue the sheet to the dark blue background." "It's easy to tell the depth of a well."      "These days a chicken leg is a rare dish."
```

]]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. .orange[Case conversion & substrings]
### 3. Padding, splitting, & merging

### BREAK

### 4. Detecting & replacing

---

## Case conversion & substrings

|Function         |  Description                            |
|:----------------|:----------------------------------------|
|`str_to_lower()` | converts string to lower case           |
|`str_to_upper()` | converts string to upper case           |
|`str_to_title()` | converts string to title case           |
|`str_length()`   | number of characters                    |
|`str_sub()`      | extracts substrings                     |
|`str_locate()`   | returns indices of substrings           |
|`str_dup()`      | duplicates characters                   |

---

# Case conversion

```r
x <- "Want to hear a joke about paper? Never mind, it's tearable."
```

```r
str_to_lower(x)
```

```
#> [1] "want to hear a joke about paper? never mind, it's tearable."
```

```r
str_to_upper(x)
```

```
#> [1] "WANT TO HEAR A JOKE ABOUT PAPER? NEVER MIND, IT'S TEARABLE."
```

```r
str_to_title(x)
```

```
#> [1] "Want To Hear A Joke About Paper? Never Mind, It's Tearable."
```

---

# Comparing strings

Case matters:

```r
a <- "Apples"
b <- "apples"
a == b
```

```
#> [1] FALSE
```

]

Convert case _before_ comparing if you just want to compare the string text:

```r
str_to_lower(a) == str_to_lower(b)
```

```
#> [1] TRUE
```

```r
str_to_upper(a) == str_to_upper(b)
```

```
#> [1] TRUE
```

]

---

# Get the number of characters in a string

The `length()` function returns the _vector_ length:

```r
length("hello world")
```

```
#> [1] 1
```

]

To get the # of characters, use `str_length()`:

```r
str_length("hello world")
```

```
#> [1] 11
```

```r
str_length(" ") # Spaces count
```

```
#> [1] 1
```

```r
str_length("")  # Empty string
```

```
#> [1] 0
```

]

---

## Access characters by their index with `str_sub()`

Indices start at 1:

```r
str_sub("Apple", 1, 3)
```

```
#> [1] "App"
```

Negative numbers count backwards from end:

```r
str_sub("Apple", -3, -1)
```

```
#> [1] "ple"
```

]

Modify a string with `str_sub()`:

```r
x <- 'abcdef'
str_sub(x, 1, 3) <- 'ABC'
x
```

```
#> [1] "ABCdef"
```

]

---

## Get the indices of substrings

Extract the substring `"Good"` from the following string:

```r
x <- 'thisIsGoodPractice'
```

**1)**: Use `str_locate()` to get<br>the **start** and **end** indices:

```r
indices <- str_locate(x, 'Good')
indices
```

```
#>      start end
#> [1,]     7  10
```

]

**2)**: Use `str_sub()` to get the substring:

```r
str_sub(x, indices[1], indices[2])
```

```
#> [1] "Good"
```

]

---

# Repeat a string with `str_dup()`

```r
str_dup("holla", 3)
```

```
#> [1] "hollahollaholla"
```

Note the difference with `rep()`:

```r
rep("holla", 3)
```

```
#> [1] "holla" "holla" "holla"
```

---

# `stringr` functions work on vectors

```r
x <- c("apples", "oranges")
x
```

```
#> [1] "apples"  "oranges"
```

Get the first 3 letters in each string:

```r
str_sub(x, 1, 3)
```

```
#> [1] "app" "ora"
```

]

Duplicate each string twice

```r
str_dup(x, 2)
```

```
#> [1] "applesapples"   "orangesoranges"
```

]

---

# Quick practice:

Create this string object:

```r
x <- 'thisIsGoodPractice'
```

Then use **stringr** functions to transform `x` into the following strings:

- `'thisIsGood'`
- `'practice'`
- `'GOOD'`
- `'thisthisthis'`
- `'GOODGOODGOOD'`

]

**Hint**: You'll need these:

- `str_to_lower()`
- `str_to_upper()`
- `str_locate()`
- `str_sub()`
- `str_dup()`

]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. Case conversion & substrings
### 3. .orange[Padding, splitting, & merging]

### BREAK

### 4. Detecting & replacing

---

## Padding, splitting, & merging

|Function         |  Description                            |
|:----------------|:----------------------------------------|
|`str_trim()`     | removes leading and trailing whitespace |
|`str_pad()`      | pads a string                           |
|`paste()`        | string concatenation                    |
|`str_split()`    | split a string into a vector            |

---

# Remove excess white space with `str_trim()`

```r
x <- "         aStringWithSpace        "
x
```

```
#> [1] "         aStringWithSpace        "
```

```r
str_trim(x) # Trims both sides by default
```

```
#> [1] "aStringWithSpace"
```

```r
str_trim(x, side = "left") # Only trim left side
```

```
#> [1] "aStringWithSpace        "
```

```r
str_trim(x, side = "right") # Only trim right side
```

```
#> [1] "         aStringWithSpace"
```

---

## Add white space (or other characters) with `str_pad()`

```r
x <- "hello"
x
```

```
#> [1] "hello"
```

```r
str_pad(x, width = 10) # Inserts pad on left by default
```

```
#> [1] "     hello"
```

```r
str_pad(x, width = 10, side = "both") # Pad both sides
```

```
#> [1] "  hello   "
```

```r
str_pad(x, width = 10, side = "both", pad = '*') # Specify the pad
```

```
#> [1] "**hello***"
```

---

# Combine strings into one string with `paste()`

```r
paste('x', 'y', 'z')
```

```
#> [1] "x y z"
```

Control separation with `sep` argument (default is `" "`:

```r
paste('x', 'y', 'z', sep = "-")
```

```
#> [1] "x-y-z"
```

---

# Combine strings into one string with `paste()`

Note the difference with _vectors_ of strings:

```r
paste(c('x', 'y', 'z'))
```

```
#> [1] "x" "y" "z"
```

To make a single string from a vector of strings, use `collapse`:

```r
paste(c('x', 'y', 'z'), collapse = "")
```

```
#> [1] "xyz"
```

---

## Split a string into multiple strings with `str_split()`

```r
x <- 'This string has spaces-and-dashes'
x
```

```
#> [1] "This string has spaces-and-dashes"
```

```r
str_split(x, " ") # Split on the spaces
```

```
#> [[1]]
#> [1] "This"              "string"            "has"               "spaces-and-dashes"
```

```r
str_split(x, "-") # Split on the dashes
```

```
#> [[1]]
#> [1] "This string has spaces" "and"                    "dashes"
```

---

## What's with the `[[1]]` thing?

`str_split()` returns a `list` of vectors

```r
x <- c('babble', 'scrabblebabble')
str_split(x, 'bb')
```

```
#> [[1]]
#> [1] "ba" "le"
#> 
#> [[2]]
#> [1] "scra" "leba" "le"
```

If you're only splitting one string, add `[[1]]` to get the first vector:

```r
str_split('hooray', 'oo')[[1]]
```

```
#> [1] "h"   "ray"
```

---

# Common splits (**memorize these!**)

Splitting on `""` breaks a string into _characters_:

```r
str_split("apples", "")[[1]]
```

```
#> [1] "a" "p" "p" "l" "e" "s"
```

Splitting on `" "` breaks a _sentence_ into words:

```r
x <- "If you want to view paradise, simply look around and view it"
str_split(x, " ")[[1]]
```

```
#>  [1] "If"        "you"       "want"      "to"        "view"      "paradise," "simply"    "look"      "around"    "and"       "view"      "it"
```

---

## Quick practice:

Create the following objects:

```r
x <- 'this_is_good_practice'
y <- c('hello', 'world')
```

Use `stringr` functions to transform `x` and `y` into the following:

- `"hello world"`
- `"***hello world***"`
- `c("this", "is", "good", "practice")`
- `"this is good practice"`
- `"hello world, this is good practice"`

]

**Hint**: You'll need these:

- `str_trim()`
- `str_pad()`
- `paste()`
- `str_split()`

]]

---

## Your turn

1) `reverseString(s)`: Write a function that returns the string `s` in reverse order.

- `reverseString("aWordWithCaps") == "spaChtiWdroWa"`
- `reverseString("abcde") == "edcba"`
- `reverseString("") == ""`

2) `isPalindrome(s)`: Write a function that returns `TRUE` if the string `s` is a [Palindrome](https://en.wikipedia.org/wiki/Palindrome) and `FALSE` otherwise.

- `isPalindrome("abcba") == TRUE`
- `isPalindrome("abcb") == FALSE`
- `isPalindrome("321123") == TRUE`

---

# .fancy[Break]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. Case conversion & substrings
### 3. Padding, splitting, & merging

### BREAK

### 4. .orange[Detecting & replacing]

---

## Detecting & replacing

|Function         |  Description                            |
|:----------------|:----------------------------------------|
|`str_sort()`     | sort a string alphabetically            |
|`str_order()`    | get the order of a sorted string        |
|`str_detect()`   | match a string in another string        |
|`str_replace()`  | replace a string in another string      |

---

## Sort string vectors alphabetically with `str_sort()`

```r
x <- c('Y', 'M', 'C', 'A')
x
```

```
#> [1] "Y" "M" "C" "A"
```

```r
str_sort(x)
```

```
#> [1] "A" "C" "M" "Y"
```

```r
str_sort(x, decreasing = TRUE)
```

```
#> [1] "Y" "M" "C" "A"
```

---

### Detect pattern in string: `str_detect(string, pattern)`

```r
tenFruit <- fruit[1:10]
tenFruit
```

```
#>  [1] "apple"        "apricot"      "avocado"      "banana"       "bell pepper"  "bilberry"     "blackberry"   "blackcurrant" "blood orange" "blueberry"
```

```r
str_detect(tenFruit, "berry")
```

```
#>  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE
```

How many in vector have the string `"berry"`?

```r
sum(str_detect(tenFruit, "berry"))
```

```
#> [1] 3
```

---

## Count number of times pattern appears in string

`str_count(string, pattern)`

Example:

```r
x <- c("apple", "banana", "pear")
str_count(x, "a")
```

```
#> [1] 1 3 1
```

Note the difference with `str_detect()`:

```r
str_detect(x, "a")
```

```
#> [1] TRUE TRUE TRUE
```

---

## Detect if string _starts_ with pattern

Which fruits **start** with "a"?

```r
fiveFruit <- fruit[1:5]
fiveFruit
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

**Wrong**:

```r
str_detect(fiveFruit, "a")
```

```
#> [1]  TRUE  TRUE  TRUE  TRUE FALSE
```

]

**Right**:

```r
str_detect(fiveFruit, "^a")
```

```
#> [1]  TRUE  TRUE  TRUE FALSE FALSE
```

]

---

# Detect if string _ends_ with pattern

Which fruits **end** with an "e"?

```r
fiveFruit
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

**Wrong**:

```r
str_detect(fiveFruit, "e")
```

```
#> [1]  TRUE FALSE FALSE FALSE  TRUE
```

]

**Right**:

```r
str_detect(fiveFruit, "e$")
```

```
#> [1]  TRUE FALSE FALSE FALSE FALSE
```

]

---

## Remember:

### If you _start_ with power (`^`), you'll _end_ up with money (`$`).

```r
fiveFruit
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

```r
str_detect(fiveFruit, "^a") # Start with power (^)
```

```
#> [1]  TRUE  TRUE  TRUE FALSE FALSE
```

```r
str_detect(fiveFruit, "e$") # End with money ($)
```

```
#> [1]  TRUE FALSE FALSE FALSE FALSE
```

---

# Quick practice:

```r
fruit[1:5]
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

Use `stringr` functions to answer the following questions about the  `fruit` vector:

1. How many fruit have the string `"rr"` in it?
2. Which fruit end with string `"fruit"`?
3. Which fruit contain more than one `"o"` character?

**Hint**: You'll need to use `str_detect()` and `str_count()`

---

# Replace matched strings with new string

`str_replace(string, pattern, replacement)`

```r
x <- c("apple", "pear", "banana")
```

```r
str_replace(x, "a", "-") # Only replaces the first match
```

```
#> [1] "-pple"  "pe-r"   "b-nana"
```

```r
str_replace_all(x, "a", "-") # Replaces all matches
```

```
#> [1] "-pple"  "pe-r"   "b-n-n-"
```

---

### Quick practice redux

```r
x <- 'this_is_good_practice'
```

Convert `x` into: `"this is good practice"`

We did this earlier:

```r
paste(str_split(x, "_")[[1]], collapse = " ")
```

```
#> [1] "this is good practice"
```

But now we can do this!

```r
str_replace_all(x, "_", " ")
```

```
#> [1] "this is good practice"
```

---

## Your turn

1) `sortString(s)`: Write the function `sortString(s)` that takes a string `s` and returns back an alphabetically sorted string.

- `sortString("cba") == "abc"`
- `sortString("abedhg") == "abdegh"`
- `sortString("AbacBc") == "aAbBcc"`

2) `areAnagrams(s1, s2)`: Write the function `areAnagrams(s1, s2)` that takes two strings, `s1` and `s2`, and returns `TRUE` if the strings are [anagrams](https://en.wikipedia.org/wiki/Anagram), and `FALSE` otherwise. **Treat lower and upper case as the same letters**.

- `areAnagrams("", "") == TRUE`
- `areAnagrams("aabbccdd", "bbccddee") == FALSE`
- `areAnagrams("TomMarvoloRiddle", "IAmLordVoldemort") == TRUE`

]

---

### Homeworks

- Deadline to submit homeworks 1 - 7: **March 10**

### Midterm Review

- We'll hold a one-hour review via zoom next week.