Homework 10 - Data Visualization

Due: Apr 09 by 11:59pm

Weight: This assignment is worth 4% of your final grade.

Purpose: The purposes of this assignment are:

  • To practice exploring and data frames in R using the dplyr package
  • To practice generating charts using the ggplot2 package

Assessment: Each question indicates the % of the assignment grade, summing to 100%. The credit for each question will be assigned as follows:

  • 0% for not attempting a response.
  • 50% for attempting the question but with major errors.
  • 75% for attempting the question but with minor errors.
  • 100% for correctly answering the question.

The reflection portion is always worth 10% and graded for completion.

Rules:

  • Problems marked SOLO may not be worked on with other classmates, though you may consult instructors for help.
  • For problems marked COLLABORATIVE, you may work in groups of up to 3 students who are in this course this semester. You may not split up the work – everyone must work on every problem. And you may not simply copy any code but rather truly work together and submit your own solutions.

Readings

The readings from the last week will serve as a helpful reference as you complete this assignment. You can review them here:

Using AI tools

On this assignment, you are encouraged to use ChatGPT and other AI tools (e.g. Github Copilot). But don’t just blindly copy-paste code. The code provided by these tools is not perfect, and you will likely need to modify it to get the correct solution. If you do use an AI tools, you must include the prompt(s) you used (using a comment with #) to recieve full credit. If you had to change anything to your prompt to get better results, write that down too in your code with a comment. Learn to use tools like ChatGPT as a learning assistant - a tool to help you accomplish the task - rather than just a solutions manual. One version of using it makes you a better and more efficient coder, the other robs you of that.

1) Staying organized [5%]

Download and use this template for your assignment. Inside the “hw10” folder, open and edit the R script called hw10.R and fill out your name, GW netID, and the names of anyone you worked with on this assignment.

2) Choose and load some data [5%]

For this assignment, you will need to find a dataset of your choosing and create three summary visualizations. To keep things manageable, choose one of the following datasets from the following libraries. Note that to load any of these data frames, all you need to do is install and load the package.

dplyr:

  • storms
  • starwars

ggplot2:

  • diamonds
  • economics
  • midwest
  • mpg
  • msleep
  • txhousing

dslabs:

  • gapminder
  • movielens
  • murders
  • stars

3) Inspect your data [10%]

Once you’ve chosen a data set, open your hw10.R file and begin exploring the data (be sure to load the package that contains the dataset at the top of your file). Write some code in code chunks to preview and summarize the data frame using some of the methods we’ve used in class. You should be able to quickly get an understanding of what variables are included and their nature. Consider the following questions in your exploration (you don’t have to write out answers to these questions - just write code to help you answer them by previewing the data in different ways):

  • What is the total size of the data frame?
  • What type of data is each variable (numeric, character, logical, date)?
  • Do any variables have missing values? Why might that be?
  • What are the “boundaries” of each period of observation:
    • For numeric variables, what are the min and max values?
    • For character variables, what are the unique values in the variable?
    • For date variables, what time period do the observations in these data frames span?

Do not brush this step off - the more thoroughly you inspect your dataset, the easier (and better) you data exploration will be. This will be absolutely critical for making your charts. Make sure you take the time to develop an understanding of the variables in your dataset as it is nearly impossible to imagine what different charts might be worth creating otherwise.

4) Make charts [40%]

Now that you have a basic understanding of the dataset, make some charts to explore the variables in the data and their potential relationships. You may use base R plotting functions or the ggplot2 package to make your figures, but you must make at least two different types of figures, including:

  1. A scatterplot of involving at least two variables.
  2. A bar chart involving at least one variable.

You can choose to plot whichever variables you wish, but you must be able to interpret the results of your chart.

5) Interpret your charts [15%]

Below the code for each of your charts, write a description and interpretation of your chart in a comment. Make sure you address at least the following questions:

  1. Describe what variables you are plotting and why.
  2. Describe the primary relationship / trend / information you hope the reader will gain from your visualization.

6) Save your charts [15%]

At the bottom of your hw10.R file, write code to save each of your charts in the plots folder. Save them as .png files.

7) Read and reflect [SOLO, 10%]

Read and reflect on the following readings to preview what we will be covering next:

Afterwards, in a comment (#) in your .R file, write a short reflection on what you’ve learned and any questions or points of confusion you have about what we’ve covered thus far. This can just few a few sentences related to this assignment, next week’s readings, things going on in the world that remind you something from class, etc. If there’s anything that jumped out at you, write it down.

Submit

Create a zip file of all the files in your R project folder for this assignment, then submit your zip file on the corresponding assignment submission on Blackboard.