Assignment 3
PS 3780 Data Literacy & Visualization, Fall 2019
Due Date: Thursday, October 17, 2019 at 11:59 p.m.
Please write complete sentences to answer these questions and include R command you
have used in one .pdf file (use the “save as” function in most word processors). Be sure to
include your name, your teammate’s name if there is anyone, and the assignment number.
Submit the file to Carmen by the due date.
Part I: Getting the Data
Find the health-wealth.csv file on Carmen. It contains the variables examined by Hans
Rosling in his talk: per-capita GDP (in 2010 dollars), life expectancy, total population,
and region for every country in the world in 2010. Load the dataset into a data frame in
R.
1. Summarize the variables in the dataset. What were the median per-capita GDP
and median life expectancies worldwide in 2010 (1 pt)?
2. Create a new variable for each country’s GDP by multiplying per-capita GDP by
total population. What is the standard deviation of this variable (1 pt)?
3. Find the calculated GDPs of the United States and Niger (1 pt).
Part II: Visualization
Design a Hans Rosling-style bubblegraph, with the log of per-capita GDP on the X axis,
life expectancy on the Y axis, dot size proportional to population size, and color keyed
to region. Make sure to have a label for both x and y axes as well as a straight line
going through the cloud of data points based on lm() command to indicate the central
tendency. In designing the graph, be creative. Take a screenshot of the final graph and
write a paragraph to describe what you find about relationships between the four variables
from the graph (4 pts)
A few very important hints to keep in mind:
1
• Rather than creating new variables, you can use mathematical expressions to transform
existing variables inside of most commands. So, for example, if you want to
plot life expectancy against the log of GDP rather than just GDP, you can just
type plot(log(pcGDP), life.expectancy).
• Rosling’s bubbles are not directly proportional to population. You should start
with cex=total.population/100000000 and tweak the numbers until the relative
sizes of the bubbles is what you’re looking for.
In addition to these, you may or may not find the following hints to be of interest,
depending on how you design your plot.
• If you want to tweak axes beyond what plot() allows you to do, use plot(x,
y, axes=FALSE) and then use the box() and axis() commands to customize tick
mark locations and labels, color, size, etc. Type ?box and ?axis for more details.
• In addition to region, the dataset contains region.number, with values from
1–7. An efficient way to assign colors is to create a vector of seven colors, either
by hand or using the brewer.pal() command from the RColorBrewer package,
and then use the value of region.number as an index value—e.g., plot(x,
y, col=my.colors[region.number]). Load the RColorBrewer library and type
?brewer.pal for details on the different palettes available, or type ?colors() to
list built-in system colors.
Part III: Identifying Points
While your plot window is still open and displaying your health-wealth plot, use the
identify() command to enable clicking on data points to display the associated country
name. Try different values for the cex= subcommand until you find a typesize that’s to
your liking. Click-identify at three countries in the plot that you have reasons to believe
they are more interesting than others. Take a screenshot of the final graph with those
three names showing and justify why you choose the three countries (1 pts).
Part IV: Regional Patterns
Stick with Hans Rosling dataset but narrow down your focus on 5 of the 7 regions in the
data: East Asia & the Pacific, Europe & Central Asia, Latin America & the Caribbean,
Middle East & North Africa, and North America (region.num variable is a corresponding
regional index). Draw the same plot that you did for Part II with this subset of data and
add a vertical line on the plot to indicate the median life expectancy of the data. Take
a screenshot of the graph and include it in the PDF (2 pts).
1. Begin by subsetting the Rosling data to include only the 5 regions listed above,
using the subset() function.
2. Use the abline() command to draw a line denoting the median life expectancy on
the plot.
2
Part V: 2016 Presidential Race in Ohio
You have been hired by the Columbus Dispatch to provide a graphic for their story about
the 2016 Presidential race in Ohio. Download the “ohio_polls.csv” file from Carmen and
import the data into R. Create a figure that depicts the competitiveness of the 2016
Presidential election in Ohio in the months leading up to election day. Use the various
customizations options available for the plot() command to create a plot worthy of
publication in a newspaper. Specifically, you want to do the following as the minimal:
1. Create a new variable in your data frame to measure the competitiveness of the
election for a given time period. The competitiveness of the election can be thought
of as the difference in levels of support for Clinton and for Trump (1 pt).
2. Plot competitiveness over time and add a lowess curve to the plot to show the
general trend (1 pt).
3
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。