Strings

]

# Week 7: .fancy[Strings]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.6l-224 96c-14 6-21.8 21-18.7 35.8S16.8 160 32 160v8c0 13.3 10.7 24 24 24H456c13.3 0 24-10.7 24-24v-8c15.2 0 28.3-10.7 31.3-25.6s-4.8-29.9-18.7-35.8l-224-96c-8.1-3.4-17.2-3.4-25.2 0zM128 224H64V420.3c-.6 .3-1.2 .7-1.8 1.1l-48 32c-11.7 7.8-17 22.4-12.9 35.9S17.9 512 32 512H480c14.1 0 26.5-9.2 30.6-22.7s-1.1-28.1-12.9-35.9l-48-32c-.6-.4-1.2-.7-1.8-1.1V224H384V416H344V224H280V416H232V224H168V416H128V224zm128-96c-17.7 0-32-14.3-32-32s14.3-32 32-32s32 14.3 32 32s-14.3 32-32 32z"/></svg> EMSE 4571: Intro to Programming for Analytics
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M272 304h-96C78.8 304 0 382.8 0 480c0 17.67 14.33 32 32 32h384c17.67 0 32-14.33 32-32C448 382.8 369.2 304 272 304zM48.99 464C56.89 400.9 110.8 352 176 352h96c65.16 0 119.1 48.95 127 112H48.99zM224 256c70.69 0 128-57.31 128-128c0-70.69-57.31-128-128-128S96 57.31 96 128C96 198.7 153.3 256 224 256zM224 48c44.11 0 80 35.89 80 80c0 44.11-35.89 80-80 80S144 172.1 144 128C144 83.89 179.9 48 224 48z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M152 64H296V24C296 10.75 306.7 0 320 0C333.3 0 344 10.75 344 24V64H384C419.3 64 448 92.65 448 128V448C448 483.3 419.3 512 384 512H64C28.65 512 0 483.3 0 448V128C0 92.65 28.65 64 64 64H104V24C104 10.75 114.7 0 128 0C141.3 0 152 10.75 152 24V64zM48 448C48 456.8 55.16 464 64 464H384C392.8 464 400 456.8 400 448V192H48V448z"/></svg> March 02, 2023

]

---

# Quiz 4

## Write your name on the quiz!

## Rules:

- Work alone; no outside help of any kind is allowed.
- No calculators, no notes, no books, no computers, no phones.

]

]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. Case conversion & substrings
### 3. Padding, splitting, & merging

### BREAK

### 4. Detecting & replacing

---

# Week 7: .fancy[Strings]

### 1. .orange[Making strings]
### 2. Case conversion & substrings
### 3. Padding, splitting, & merging

### BREAK

### 4. Detecting & replacing

---

## Install the `stringr` library

```r
install.packages("stringr")
```

(Only do this once...and you already did this in HW 2)

]

<br>

## Load the `stringr` library

```r
library(stringr)
```

(Do this every time you use the package)

]

---

## Make a string with 'single' or "double" quotes

```r
cat("This is a string")
```

```
#> This is a string
```

```r
cat('This is a string')
```

```
#> This is a string
```

---

## Use single vs. double quotes where it makes sense

Use double quotes when `'` is in the string

```r
cat("It's great!")
```

```
#> It's great!
```

Use single quotes when `"` is in the string

```r
cat('I said, "Hello"')
```

```
#> I said, "Hello"
```

---

# What if a string has both `'` and `"` symbols?

Example: `It's nice to say, "Hello"`

```r
cat("It's nice to say, "Hello"")
```

```
#> Error: <text>:1:25: unexpected symbol
#> 1: cat("It's nice to say, "Hello
#>                             ^
```

]

```r
cat('It's nice to say, "Hello"')
```

```
#> Error: <text>:1:9: unexpected symbol
#> 1: cat('It's
#>             ^
```

]

---

# "Escaping" to the rescue!

### Use the `\` symbol to "escape" a literal symbol

```r
cat("It's nice to say, \"Hello\"") # Double quote
```

```
#> It's nice to say, "Hello"
```

```r
cat('It\'s nice to say, "Hello"') # Single quote
```

```
#> It's nice to say, "Hello"
```

---

## Commonly escaped symbols

New line: `\n`

```r
cat('This\nthat') 
```

```
#> This
#> that
```

Tab space: `\t`

```r
cat('This\tthat') 
```

```
#> This	that
```

]

Backslash: `\\`

```r
cat('This\\that')
```

```
#> This\that
```

]

---

## **String constants**: Sets of common strings

```r
letters
```

```
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
```

```r
LETTERS
```

```
#>  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
```

---

## **String constants**: Sets of common strings

```r
month.name
```

```
#>  [1] "January"   "February"  "March"     "April"     "May"       "June"      "July"      "August"    "September" "October"   "November"  "December"
```

```r
month.abb
```

```
#>  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
```

---

## The **stringr** library has a few _longer_ string constants:
### `fruit`, `words`, `sentences`

```r
length(fruit)
```

```
#> [1] 80
```

```r
fruit[1:4]
```

```
#> [1] "apple"   "apricot" "avocado" "banana"
```

```r
length(words)
```

```
#> [1] 980
```

```r
words[1:4]
```

```
#> [1] "a"        "able"     "about"    "absolute"
```

]]

```r
length(sentences)
```

```
#> [1] 720
```

```r
sentences[1:4]
```

```
#> [1] "The birch canoe slid on the smooth planks."  "Glue the sheet to the dark blue background." "It's easy to tell the depth of a well."      "These days a chicken leg is a rare dish."
```

]]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. .orange[Case conversion & substrings]
### 3. Padding, splitting, & merging

### BREAK

### 4. Detecting & replacing

---

## Case conversion & substrings

|Function         |  Description                            |
|:----------------|:----------------------------------------|
|`str_to_lower()` | converts string to lower case           |
|`str_to_upper()` | converts string to upper case           |
|`str_to_title()` | converts string to title case           |
|`str_length()`   | number of characters                    |
|`str_sub()`      | extracts substrings                     |
|`str_locate()`   | returns indices of substrings           |
|`str_dup()`      | duplicates characters                   |

---

# Case conversion

```r
x <- "Want to hear a joke about paper? Never mind, it's tearable."
```

```r
str_to_lower(x)
```

```
#> [1] "want to hear a joke about paper? never mind, it's tearable."
```

```r
str_to_upper(x)
```

```
#> [1] "WANT TO HEAR A JOKE ABOUT PAPER? NEVER MIND, IT'S TEARABLE."
```

```r
str_to_title(x)
```

```
#> [1] "Want To Hear A Joke About Paper? Never Mind, It's Tearable."
```

---

# Comparing strings

Case matters:

```r
a <- "Apples"
b <- "apples"
a == b
```

```
#> [1] FALSE
```

]

Convert case _before_ comparing if you want to compare the string text without casing:

```r
str_to_lower(a) == str_to_lower(b)
```

```
#> [1] TRUE
```

```r
str_to_upper(a) == str_to_upper(b)
```

```
#> [1] TRUE
```

]

---

# Get the number of characters in a string

The `length()` function returns the _vector_ length:

```r
length("hello world")
```

```
#> [1] 1
```

]

To get the # of characters, use `str_length()`:

```r
str_length("hello world")
```

```
#> [1] 11
```

```r
str_length(" ") # Spaces count
```

```
#> [1] 1
```

```r
str_length("")  # Empty string
```

```
#> [1] 0
```

]

---

## Access characters by their index with `str_sub()`

Indices start at 1:

```r
str_sub("Apple", 1, 3)
```

```
#> [1] "App"
```

Negative numbers count backwards from end:

```r
str_sub("Apple", -3, -1)
```

```
#> [1] "ple"
```

]

Modify a string with `str_sub()`:

```r
x <- 'abcdef'
str_sub(x, 1, 3) <- 'ABC'
x
```

```
#> [1] "ABCdef"
```

]

---

## Get the indices of substrings

Extract the substring `"Good"` from the following string:

```r
x <- 'thisIsGoodPractice'
```

**1)**: Use `str_locate()` to get<br>the **start** and **end** indices:

```r
indices <- str_locate(x, 'Good')
indices
```

```
#>      start end
#> [1,]     7  10
```

]

**2)**: Use `str_sub()` to get the substring:

```r
str_sub(x, indices[1], indices[2])
```

```
#> [1] "Good"
```

]

---

# Repeat a string with `str_dup()`

```r
str_dup("holla", 3)
```

```
#> [1] "hollahollaholla"
```

Note the difference with `rep()`:

```r
rep("holla", 3)
```

```
#> [1] "holla" "holla" "holla"
```

---

# `stringr` functions work on vectors

```r
x <- c("apples", "oranges")
x
```

```
#> [1] "apples"  "oranges"
```

Get the first 3 letters in each string:

```r
str_sub(x, 1, 3)
```

```
#> [1] "app" "ora"
```

]

Duplicate each string twice

```r
str_dup(x, 2)
```

```
#> [1] "applesapples"   "orangesoranges"
```

]

---

# Quick practice

Create this string object:

```r
x <- 'thisIsGoodPractice'
```

Then use **stringr** functions to transform `x` into the following strings:

- `'thisIsGood'`
- `'practice'`
- `'GOOD'`
- `'thisthisthis'`
- `'GOODGOODGOOD'`

]

**Hint**: You'll need these:

- `str_to_lower()`
- `str_to_upper()`
- `str_locate()`
- `str_sub()`
- `str_dup()`

]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. Case conversion & substrings
### 3. .orange[Padding, splitting, & merging]

### BREAK

### 4. Detecting & replacing

---

## Padding, splitting, & merging

|Function         |  Description                            |
|:----------------|:----------------------------------------|
|`str_trim()`     | removes leading and trailing whitespace |
|`str_pad()`      | pads a string                           |
|`paste()`        | string concatenation                    |
|`str_split()`    | split a string into a vector            |

---

# Remove excess white space with `str_trim()`

```r
x <- "         aStringWithSpace        "
x
```

```
#> [1] "         aStringWithSpace        "
```

```r
str_trim(x) # Trims both sides by default
```

```
#> [1] "aStringWithSpace"
```

```r
str_trim(x, side = "left") # Only trim left side
```

```
#> [1] "aStringWithSpace        "
```

```r
str_trim(x, side = "right") # Only trim right side
```

```
#> [1] "         aStringWithSpace"
```

---

## Add white space (or other characters) with `str_pad()`

```r
x <- "hello"
x
```

```
#> [1] "hello"
```

```r
str_pad(x, width = 10) # Inserts pad on left by default
```

```
#> [1] "     hello"
```

```r
str_pad(x, width = 10, side = "both") # Pad both sides
```

```
#> [1] "  hello   "
```

```r
str_pad(x, width = 10, side = "both", pad = '*') # Specify the pad
```

```
#> [1] "**hello***"
```

---

# Combine strings into one string with `paste()`

```r
paste('x', 'y', 'z')
```

```
#> [1] "x y z"
```

Control separation with `sep` argument (default is `" "`:

```r
paste('x', 'y', 'z', sep = "-")
```

```
#> [1] "x-y-z"
```

---

# Combine strings into one string with `paste()`

Note the difference with _vectors_ of strings:

```r
paste(c('x', 'y', 'z'))
```

```
#> [1] "x" "y" "z"
```

To make a single string from a vector of strings, use `collapse`:

```r
paste(c('x', 'y', 'z'), collapse = "")
```

```
#> [1] "xyz"
```

---

## Split a string into multiple strings with `str_split()`

```r
x <- 'This string has spaces-and-dashes'
x
```

```
#> [1] "This string has spaces-and-dashes"
```

```r
str_split(x, " ") # Split on the spaces
```

```
#> [[1]]
#> [1] "This"              "string"            "has"               "spaces-and-dashes"
```

```r
str_split(x, "-") # Split on the dashes
```

```
#> [[1]]
#> [1] "This string has spaces" "and"                    "dashes"
```

---

## What's with the `[[1]]` thing?

`str_split()` returns a `list` of vectors

```r
x <- c('babble', 'scrabblebabble')
str_split(x, 'bb')
```

```
#> [[1]]
#> [1] "ba" "le"
#> 
#> [[2]]
#> [1] "scra" "leba" "le"
```

If you're only splitting one string, add `[[1]]` to get the first vector:

```r
str_split('hooray', 'oo')[[1]]
```

```
#> [1] "h"   "ray"
```

---

# Common splits (**memorize these!**)

Splitting on `""` breaks a string into _characters_:

```r
str_split("apples", "")[[1]]
```

```
#> [1] "a" "p" "p" "l" "e" "s"
```

Splitting on `" "` breaks a _sentence_ into words:

```r
x <- "If you want to view paradise, simply look around and view it"
str_split(x, " ")[[1]]
```

```
#>  [1] "If"        "you"       "want"      "to"        "view"      "paradise," "simply"    "look"      "around"    "and"       "view"      "it"
```

---

## Quick practice:

Create the following objects:

```r
x <- 'this_is_good_practice'
y <- c('hello', 'world')
```

Use `stringr` functions to transform `x` and `y` into the following:

- `"hello world"`
- `"***hello world***"`
- `c("this", "is", "good", "practice")`
- `"this is good practice"`
- `"hello world, this is good practice"`

]

**Hint**: You'll need these:

- `str_trim()`
- `str_pad()`
- `paste()`
- `str_split()`

]]

---

## Your turn

1) `reverseString(s)`

Write a function that returns the string `s` in reverse order.

- `reverseString("aWordWithCaps") == "spaChtiWdroWa"`
- `reverseString("abcde") == "edcba"`
- `reverseString("") == ""`

2) `isPalindrome(s)`

Write a function that returns `TRUE` if the string `s` is a [Palindrome](https://en.wikipedia.org/wiki/Palindrome) and `FALSE` otherwise.

- `isPalindrome("abcba") == TRUE`
- `isPalindrome("abcb") == FALSE`
- `isPalindrome("321123") == TRUE`

---

# .fancy[Break]

---

# Week 7: .fancy[Strings]

### 1. Making strings
### 2. Case conversion & substrings
### 3. Padding, splitting, & merging

### BREAK

### 4. .orange[Detecting & replacing]

---

## Detecting & replacing

|Function         |  Description                            |
|:----------------|:----------------------------------------|
|`str_sort()`     | sort a string alphabetically            |
|`str_order()`    | get the order of a sorted string        |
|`str_detect()`   | match a string in another string        |
|`str_replace()`  | replace a string in another string      |

---

## Sort string vectors alphabetically with `str_sort()`

```r
x <- c('Y', 'M', 'C', 'A')
x
```

```
#> [1] "Y" "M" "C" "A"
```

```r
str_sort(x)
```

```
#> [1] "A" "C" "M" "Y"
```

```r
str_sort(x, decreasing = TRUE)
```

```
#> [1] "Y" "M" "C" "A"
```

---

### Detect pattern in string: `str_detect(string, pattern)`

```r
tenFruit <- fruit[1:10]
tenFruit
```

```
#>  [1] "apple"        "apricot"      "avocado"      "banana"       "bell pepper"  "bilberry"     "blackberry"   "blackcurrant" "blood orange" "blueberry"
```

```r
str_detect(tenFruit, "berry")
```

```
#>  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE
```

How many in vector have the string `"berry"`?

```r
sum(str_detect(tenFruit, "berry"))
```

```
#> [1] 3
```

---

## Count number of times pattern appears in string

`str_count(string, pattern)`

Example:

```r
x <- c("apple", "banana", "pear")
str_count(x, "a")
```

```
#> [1] 1 3 1
```

Note the difference with `str_detect()`:

```r
str_detect(x, "a")
```

```
#> [1] TRUE TRUE TRUE
```

---

## Detect if string _starts_ with pattern

Which fruits **start** with "a"?

```r
fiveFruit <- fruit[1:5]
fiveFruit
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

**Wrong**:

```r
str_detect(fiveFruit, "a")
```

```
#> [1]  TRUE  TRUE  TRUE  TRUE FALSE
```

]

**Right**:

```r
str_detect(fiveFruit, "^a")
```

```
#> [1]  TRUE  TRUE  TRUE FALSE FALSE
```

]

---

# Detect if string _ends_ with pattern

Which fruits **end** with an "e"?

```r
fiveFruit
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

**Wrong**:

```r
str_detect(fiveFruit, "e")
```

```
#> [1]  TRUE FALSE FALSE FALSE  TRUE
```

]

**Right**:

```r
str_detect(fiveFruit, "e$")
```

```
#> [1]  TRUE FALSE FALSE FALSE FALSE
```

]

---

## Remember:

### If you _start_ with power (`^`), you'll _end_ up with money (`$`).

```r
fiveFruit
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

```r
str_detect(fiveFruit, "^a") # Start with power (^)
```

```
#> [1]  TRUE  TRUE  TRUE FALSE FALSE
```

```r
str_detect(fiveFruit, "e$") # End with money ($)
```

```
#> [1]  TRUE FALSE FALSE FALSE FALSE
```

---

# Quick practice:

```r
fruit[1:5]
```

```
#> [1] "apple"       "apricot"     "avocado"     "banana"      "bell pepper"
```

Use `stringr` functions to answer the following questions about the  `fruit` vector:

1. How many fruit have the string `"rr"` in it?
2. Which fruit end with string `"fruit"`?
3. Which fruit contain more than one `"o"` character?

**Hint**: You'll need to use `str_detect()` and `str_count()`

---

# Replace matched strings with new string

`str_replace(string, pattern, replacement)`

```r
x <- c("apple", "pear", "banana")
```

```r
str_replace(x, "a", "-") # Only replaces the first match
```

```
#> [1] "-pple"  "pe-r"   "b-nana"
```

```r
str_replace_all(x, "a", "-") # Replaces all matches
```

```
#> [1] "-pple"  "pe-r"   "b-n-n-"
```

---

### Quick practice redux

```r
x <- 'this_is_good_practice'
```

Convert `x` into: `"this is good practice"`

We did this earlier:

```r
paste(str_split(x, "_")[[1]], collapse = " ")
```

```
#> [1] "this is good practice"
```

But now we can do this!

```r
str_replace_all(x, "_", " ")
```

```
#> [1] "this is good practice"
```

---

## Your turn

1) `sortString(s)`: Write the function `sortString(s)` that takes a string `s` and returns back an alphabetically sorted string.

- `sortString("cba") == "abc"`
- `sortString("abedhg") == "abdegh"`
- `sortString("AbacBc") == "aAbBcc"`

2) `areAnagrams(s1, s2)`: Write the function `areAnagrams(s1, s2)` that takes two strings, `s1` and `s2`, and returns `TRUE` if the strings are [anagrams](https://en.wikipedia.org/wiki/Anagram), and `FALSE` otherwise. **Treat lower and upper case as the same letters**.

- `areAnagrams("", "") == TRUE`
- `areAnagrams("aabbccdd", "bbccddee") == FALSE`
- `areAnagrams("TomMarvoloRiddle", "IAmLordVoldemort") == TRUE`

]

---

### [Homework 7](https://p4a.seas.gwu.edu/2023-Spring/hw/7-strings.html)

- Deadline extended to the Wednesday after the midterm: **March 14**

### Midterm

- In class next week. 
- 100 minutes (finish by 2:25pm). 
- You can bring a single 8.5 x 11 sheet of paper (front & back) with anything on it.
- You must turn in your note sheet with your exam (I'll give it back after grading).