Homework 8 - Data Frames
Due: Mar 26 by 11:59pm
Weight: This assignment is worth 4% of your final grade.
Purpose: The purposes of this assignment are:
- To practice creating data frames in R.
- To practice merging and slicing data frames in R.
Assessment: Each question indicates the % of the assignment grade, summing to 100%. The credit for each question will be assigned as follows:
- 0% for not attempting a response.
- 50% for attempting the question but with major errors.
- 75% for attempting the question but with minor errors.
- 100% for correctly answering the question.
The reflection portion is always worth 10% and graded for completion.
Rules:
- Problems marked SOLO may not be worked on with other classmates, though you may consult instructors for help.
- For problems marked COLLABORATIVE, you may work in groups of up to 3 students who are in this course this semester. You may not split up the work – everyone must work on every problem. And you may not simply copy any code but rather truly work together and submit your own solutions.
Readings
The readings from the last week will serve as a helpful reference as you complete this assignment. You can review them here:
1) Staying organized [SOLO, 5%]
Download and use this template for your assignment. Inside the “hw8” folder, open and edit the R script called hw8.R
and fill out your name, GW netID, and the names of anyone you worked with on this assignment.
Using good style
For this assignment, you must use good style to receive full credit. Follow the best practices described in this style guide.
2) Inspect package data [SOLO, 15%]
Write R code to install the dslabs package from CRAN, then write code to load the package. Write some code to preview and inspect the movielens
data frame that gets loaded when you load the package using some of the techniques we saw in class. For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What is this dataset about?
- How many observations are in the data frame?
- What is the original source of the data?
- What type of data is each variable?
- What are the years of the earliest and most recent observations in the data set?
3) Answer questions about the data [COLLABORATIVE, 25%]
For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What is the min, mean, and max rating in the data set?
- How many observations received the maximum rating?
- What percentage of total observations received the maximum rating?
- What is the title of the observation with the longest
title
(in terms of numbers of letters in the title)?
4) Loading and inspecting external data [SOLO, 20%]
Write R code to read in the prisoners2019.csv
file located in the data
folder. Store the object as df
. Write some code to preview and inspect the df
data frame using some of the techniques we saw in class. For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- What do you think this dataset is about?
- How many observations are in the data frame?
- What type of data is each variable?
5) Answer questions about the data [COLLABORATIVE, 25%]
For each of the following questions, write code to find your answer and leave a detailed response in a comment:
- Which states have the highest and lowest total prison population?
- Which states have the highest and lowest total prison population as a percentage of the total state population?
- According to the 2020 U.S. Census, only 12.4% of the U.S. population is black, but some states have imprisoned more black people than any other race. Which states fit this description?
6) Read and reflect [SOLO, 10%]
Read and reflect on the following readings to preview what we will be covering next:
Afterwards, reflect on what you’ve learned while going through these readings and exercises. Is there anything that jumped out at you? Anything you found particularly interesting or confusing?
In a comment (#
) in your .R file, write at least a paragraph about your thoughts, and include at least one question. This can be on what you’ve learned and any questions or points of confusion you have about what we’ve covered thus far. This can be related to this assignment, next week’s readings, things going on in the world that remind you something from class, etc. If there’s anything that jumped out at you, write it down.
Some thoughts you may want to try in your reflection:
- “I used to think ______, now I think ______ 🤔”
- Discuss some of the key insights or things you found interesting in the readings or recent class periods.
- Connect the course content to your own work or projects you’re working on.
Submit
Create a zip file of all the files in your R project folder for this assignment, then submit your zip file on the corresponding assignment submission on Blackboard.