Due: Sunday, 01-Dec. at 8pm
Rules: This entire assignment is SOLO. You may not work with other classmates, though you may consult instructors for help.
Overview: For this assignment, you will need to find a dataset of your choosing and create a short report that contains a description of the dataset as well as a few key insights about the data, including at least two summary visualizations. You will publish your report online at http://rpubs.com/. Before beginning the assignment, be sure to have reviewed the lessons on Data Frames, Data Wrangling, Data Visualization and Communicating Information.
You will need to select a dataset for your project. To keep things manageable, you must choose one of the following datasets from the following libraries. Note that to load any of these data frames, all you need to do is load the library:
Once you’ve chosen a dataset, start setting up your analysis environment by following these steps:
admissionsdataset in the
dslabspackage, an example title might be: “Summary of the
admissionsdataset from the
.Rfile and save it in your project folder as
explore.R- we’ll use this file to explore your dataset later.
Now that your environment is set up, open up your
explore.R file and begin exploring your dataset. Be sure to load the library that contains the dataset.
Write some code to preview and summarize the dataset using some of the methods we’ve seen in class and in the lessons on data frames and data wrangling. You should be able to quickly get an understanding of what variables are included in the data frame and their nature. Consider the following questions in your exploration:
Do not brush this step off - the more thoroughly you inspect your dataset, the easier (and better) you data exploration will be. This will be important for step 4, and absolutely critical for step 5. Make sure you take the time to develop an understanding of the variables in your dataset as it is nearly impossible to imagine what summary plots might be worth creating otherwise.
Open up your “hw6.Rmd” file and write a paragraph in the section labeled
# Data description describing of your dataset. Include the following:
Now that you have a basic understanding of the dataset, make some plots to explore the variables in the data and their potential relationships. You may use base R plotting functions or the ggplot2 library to make your figures, but you must make at least two figures, including:
You can choose to plot whichever variables you wish, but you must be able to interpret the results of your plot. I recommend that you first make your plots in your
explore.R file to iterate on your code until the plots are in their final form you desire. Then copy the code for your final plots over to the appropriate code chunks in your “hw6.Rmd” file.
Below each figure, write a description and interpretation of your plot. Make sure you address at least the following questions:
Once you have completed your analysis, compile the “hw6.Rmd” file by opening it in RStudio and clicking the “knit” button. In the upper-right of the window that opens showing your compiled hw6.html file, click on the “Publish” button and publish your file to http://rpubs.com/. You should create an account with rpubs if you have not already. Once published, you can update it by making edits to your “hw6.Rmd” file, clicking the “knit” button again, and then clicking the “Republish” button in the upper-right of the RStudio preview window.
For a visual overview of the publishing process, refer to the instructions in the youtube video in the lesson on Communicating Information.
After publishing your “hw6.Rmd” file, create a zip file of all files in your R project folder for this assignment and submit the zip file on Blackboard by the due deadline. Include a link to your published report on rpubs in your Blackboard submission