联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-30 10:19

Assignment 4

PS 3780 Data Literacy & Visualization, Fall 2019

Due Date: Thursday, October 31, 2019 at 11:59 p.m.

Please write complete sentences to answer these questions and include R command

you have used in one .pdf file (use the “save as” function in most word processors). Be

sure to include your name, your teammate’s name if there is anyone, and the assignment

number. Submit the file to Carmen by the due date.

Part I: Hans Rosling Boxplot

Find the health-wealth.csv file on Carmen (which contains the variables examined by

Hans Rosling in his talk: per-capita GDP, life expectancy, total population, and region

for every country in the world in 2010 that we used before) and load it to R. Produce

a figure of five boxplots to show the variation in life expectancy across 5 of the 7 regions

in the data: East Asia & the Pacific, Europe & Central Asia, Latin America &

the Caribbean, Middle East & North Africa, and North America (make sure to display

region names below the horizontal axis). Describe the variation across regions by using

the terms of mean, maximum, minimum, and quartiles (3 pts).

Some hints:

1. Begin by subsetting the Rosling data to include only the 5 regions listed above,

using the subset() function. Create a new data frame with the subsetted data,

and use this new data frame to create your boxplot.

2. Create a boxplot using the function boxplot(). Use the "~" symbol to divide the

boxplots up by region number, e.g. "life.expectancy ~ region".

3. You can add your own axes labels to the x and y axes by setting "axes=FALSE" in

the plot command (boxplot() in this case), and by then designing x and y axes

using the axis() command. The most important step here is to specify the axis

(either 1 or 2), to list the values for tick marks (at=c(1,2,3,etc.)) and then to

list the labels by name (labels=c(), with a list of the names of the regions in the

parentheses).

4. Your final boxplot should have 5 boxes, one for each of the regions, and each of these

boxes should be labeled by region name (not number) along the x-axis. Your graph

should also have a descriptive title, and informative labels for the x and y axes.

1

Finally, it should include a horizontal line indicating the median life expectancy

across the 5 regions included in the plot.

Part II: API and World Bank

Apply World Bank API to extract female life expectancy data. Display the data of all

countries from 1970 to 2015, and highlight the United States and the rest of the World in

different colors. Make sure that you write one short paragraph to describe the plot and

that the plot has labels of axes and a title. (4 pts).

Some hints:

1. This assignment follows Lecture 16a fairly closely.

2. Use WDI() command from WDI package to implement the World Bank API, and set

indicator = “SE.SCH.LIFE.FE” in the parentheses. You can also truncate data by

setting “start = ” and “end = ”.

3. Use xyplot() command from lattice package to display space-time variations.

4. You want to customize a color scheme in which the United State is assigned to a

different color before doing xyplot().

Part III: Dreamland

Sam Quinones, the author of Dreamland: The True Tale of America’s Opiate Epidemic,

is coming to Ohio State to discuss his book and the topic of opiate addiction. News of

your mad data visualization skills has spread, and you have been asked to come up with

graphics for the poster that will be used to advertise the event.

Downland the drug poisoning mortality data from Carmen and read it to R. Geographically

link it to the county name data from the maps() library in R. Then create two

county-level maps of drug poisoning mortality in the United States, one for 2004 and the

other for 2014.

Some hints:

1. This part follows Lecture 14c fairly closely.

2. R sometimes reads data in as factors rather than text. Factors are vectors of integer

values with corresponding sets of character values to use when the factor

is displayed. They are also incredibly confusing because they look like text but

they don’t act like text. This dataset has variables that will be read in as factors

unless you use the stringsAsFactors = FALSE subcommand in your read.csv()

command. We highly recommend doing so.

3. The variable of interest has the annoyingly long name Estimated Age-adjusted

Death Rate, 16 Categories (in ranges). It also, as the name implies, contains

ranges (0-2, 2.1-4, etc.) rather than actual numbers. Keeping in mind that you’ll

2

want to plot colors later, you’ll want to create a new variable in the dataset that

takes a value of 1 when age-adjusted death rate is 0-2, 2 when it’s 2.1-4, and so on.

4. The death rates are measured as deaths per 100,000 population.

5. You’ll need to specify a color scheme for your map, and very few of the spectra

in RColorBrewer can handle 16 colors, which is what you’ll need if you want to

represent all of the categories in the data. You probably want your color scheme

to be a gradient from a lighter color to a darker color. The best way to do this is

to pick a lighter color and a darker color from an online color-to-hex converter and

then use colorRampPalette() to generate the gradient from one to the other.

The final product of this part will be two maps of the United States, one for drug poisoning

mortality in 2004 and one for drug poisoning mortality in 2014. Each map should color

each county by drug poisoning mortality rate for the relevant year and also write a short

paragraph to explain what different colors indicate in the two maps (5 pts).

3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp