联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-30 10:18

Assignment 3

PS 3780 Data Literacy & Visualization, Fall 2019

Due Date: Thursday, October 17, 2019 at 11:59 p.m.

Please write complete sentences to answer these questions and include R command you

have used in one .pdf file (use the “save as” function in most word processors). Be sure to

include your name, your teammate’s name if there is anyone, and the assignment number.

Submit the file to Carmen by the due date.

Part I: Getting the Data

Find the health-wealth.csv file on Carmen. It contains the variables examined by Hans

Rosling in his talk: per-capita GDP (in 2010 dollars), life expectancy, total population,

and region for every country in the world in 2010. Load the dataset into a data frame in

R.

1. Summarize the variables in the dataset. What were the median per-capita GDP

and median life expectancies worldwide in 2010 (1 pt)?

2. Create a new variable for each country’s GDP by multiplying per-capita GDP by

total population. What is the standard deviation of this variable (1 pt)?

3. Find the calculated GDPs of the United States and Niger (1 pt).

Part II: Visualization

Design a Hans Rosling-style bubblegraph, with the log of per-capita GDP on the X axis,

life expectancy on the Y axis, dot size proportional to population size, and color keyed

to region. Make sure to have a label for both x and y axes as well as a straight line

going through the cloud of data points based on lm() command to indicate the central

tendency. In designing the graph, be creative. Take a screenshot of the final graph and

write a paragraph to describe what you find about relationships between the four variables

from the graph (4 pts)

A few very important hints to keep in mind:

1

? Rather than creating new variables, you can use mathematical expressions to transform

existing variables inside of most commands. So, for example, if you want to

plot life expectancy against the log of GDP rather than just GDP, you can just

type plot(log(pcGDP), life.expectancy).

? Rosling’s bubbles are not directly proportional to population. You should start

with cex=total.population/100000000 and tweak the numbers until the relative

sizes of the bubbles is what you’re looking for.

In addition to these, you may or may not find the following hints to be of interest,

depending on how you design your plot.

? If you want to tweak axes beyond what plot() allows you to do, use plot(x,

y, axes=FALSE) and then use the box() and axis() commands to customize tick

mark locations and labels, color, size, etc. Type ?box and ?axis for more details.

? In addition to region, the dataset contains region.number, with values from

1–7. An efficient way to assign colors is to create a vector of seven colors, either

by hand or using the brewer.pal() command from the RColorBrewer package,

and then use the value of region.number as an index value—e.g., plot(x,

y, col=my.colors[region.number]). Load the RColorBrewer library and type

?brewer.pal for details on the different palettes available, or type ?colors() to

list built-in system colors.

Part III: Identifying Points

While your plot window is still open and displaying your health-wealth plot, use the

identify() command to enable clicking on data points to display the associated country

name. Try different values for the cex= subcommand until you find a typesize that’s to

your liking. Click-identify at three countries in the plot that you have reasons to believe

they are more interesting than others. Take a screenshot of the final graph with those

three names showing and justify why you choose the three countries (1 pts).

Part IV: Regional Patterns

Stick with Hans Rosling dataset but narrow down your focus on 5 of the 7 regions in the

data: East Asia & the Pacific, Europe & Central Asia, Latin America & the Caribbean,

Middle East & North Africa, and North America (region.num variable is a corresponding

regional index). Draw the same plot that you did for Part II with this subset of data and

add a vertical line on the plot to indicate the median life expectancy of the data. Take

a screenshot of the graph and include it in the PDF (2 pts).

1. Begin by subsetting the Rosling data to include only the 5 regions listed above,

using the subset() function.

2. Use the abline() command to draw a line denoting the median life expectancy on

the plot.

2

Part V: 2016 Presidential Race in Ohio

You have been hired by the Columbus Dispatch to provide a graphic for their story about

the 2016 Presidential race in Ohio. Download the “ohio_polls.csv” file from Carmen and

import the data into R. Create a figure that depicts the competitiveness of the 2016

Presidential election in Ohio in the months leading up to election day. Use the various

customizations options available for the plot() command to create a plot worthy of

publication in a newspaper. Specifically, you want to do the following as the minimal:

1. Create a new variable in your data frame to measure the competitiveness of the

election for a given time period. The competitiveness of the election can be thought

of as the difference in levels of support for Clinton and for Trump (1 pt).

2. Plot competitiveness over time and add a lowess curve to the plot to show the

general trend (1 pt).

3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp