联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2020-05-28 10:09

Department of Statistics

STATS 782 Statistical Computing

Assignment 3(2020.FC)

Total: 50 marks Due: 2:00 pm NZST, Friday 29 May 2020

1. Please read these instructions carefully. Further instructions might be posted on the class

webpage.

2. Upload your soft copy (assignment source) to Canvas: the file should end in .Rmd, or possibly

.R or .Rnw. The marker may run or knit your R code, so include your name and ID in all

files. The file names should contain your UPI. RMarkdown is strongly recommended.

3. Also upload your .pdf to Canvas too. Note the time difference between countries.

4. Coversheet: please make sure you do one of the following else your assignment will not be

marked:

(a) Sign the Cover Sheet and combine with your assignment document (pdf or Word) into

a single file before submission, OR

(b) Type or write for the following at the beginning of your assignment: Your name (as it

appears in Canvas), your UPI, and the following statement: “I have read the declaration

on the cover sheet and confirm my agreement with it.”

5. Include everything in your report: R code (tidied up), outputs (including error/warning messages),

and your explanations (if any). Please comment on almost all of your output, especially

parts that need human interpretation, else marks will be deducted. That is, you need

to convince the marker that you understand what the data or solution is saying.

6. Print some intermediate results to show how your code works step by step, if not obvious.

Comment your code if appropriate, e.g., for functions, blocks of code, and key variables.

7. Type help.start() when you open R. You need to use the online help to find details and

functions that may not be covered directly in the coursebook. This requires maturity; we

cannot cover everything in class or the coursebook.

8. Your mark for this assignment will depend on getting the right answer, the elegance/efficiency

of your approach, and the tidiness and documentation of your code/report. The R Tidyverse

Style Guide or R Google Style Guide is recommended. Marks (up to 7) will be deducted

for messy code, etc.

9. This PDF file may contain colour that is important to see.

1. [16 marks] The Ministry of Health of New Zealand provides daily updates of the status of

COVID-19 cases in the country. The basic data consists of the date of report and the number

of probable and confirmed COVID-19 cases reported that day. The data reported on April 19

is provided in the file covid19-apr19.csv.

(a) The ministry published the following plot on April 26 showing the total of reported cases

per day (confirmed + probable):

Re-create the graphic using R as closely as sensible. Start with the same basic type of

plot in R then adjust colors, line widths and labels. Finally address the axes. If there are

any visual differences, describe them and explain which version you think is better, yours

or the original and why. Note that there are small differences between the available data

and the plot1

. [4 marks]

(b) In addition to the plot above, the ministry also publishes a plot of all cases known up to

a given date:

Re-create the graphic using R. Discuss any drawbacks of the rendition of the graphic.

[4 marks]

(c) Change the graphic from (a) in a way that it allows to distinguish probable from confirmed

cases. Explain your decisions and which comparisons can be directly performed visually

in the plot. Give at least one example of a comparison which cannot be done using this

plot. [4 marks]

1The dataset file is more detailed in that it counts actual cases filed on the reported day whereas the daily report

plots count new cases known at a given time of that day which may include cases filed earlier.

2

(d) We can modify the plot from (c) such that we can directly compare the relative proportion

of confirmed cases to total each day while keeping the modifications to a minimum as

follows:

Mar 01 Mar 15 Apr 01 Apr 15

Proportion of confirmed cases

Date of report

Proportion (in %)

Re-create that plot type. Did you have to sacrifice information that was available in (c)

but is no longer visible? If so, what was it? Interpret the resulting plot. [4 marks]

3

2. [11 marks] Consider the following plot illustrating an optical illusion:

The plot is composed of squares that are all aligned at the same y coordinate, although our

eyes makes us believe that the lines are not straight. Each row is shifted by 1/4 square relative

to the adjacent rows, but the direction changes every two steps.

(a) Re-create the plot using R. [5 marks]

(b) Create a function taking n as a parameter which determines how many rows of squares

there will be. Run it for values of 9, 11 and 15. [3 marks]

(c) Enhance the function from (b) by adding an argument cols which is a vector of the two

colours to be used to fill the boxes. Call it with f(n=11, cols=c("red","yellow")) and

show the resulting plot. Does the effect still work? [3 marks]

4

3. [23 marks] The dataset temp-cities.csv contains the daily low and high temperatures

for seven cities in the world over last 20 years.

(a) Read the dataset and restrict it to the subset as follows: city Auckland and records from

the year 2019. Create one plot which shows both the lows and highs for every day of

the year 2019 in Auckland. Use blue colour for the lows and red colour for the highs.

[4 marks]

(b) Based on the 2019 Auckland subset, compute the weekly average for both lows and highs

respecitvely. For this purpose the first week are the first 7 days in 2019, second week are

the next 7 days etc. Superimpose the averages over the plot obtained in (a). [4 marks]

(c) Create a matrix of plots such that each plot shows all the data for one city. Make sure

that it is possible to compare values between the plots. Justify the layout you used. The

purpose of this plot is exploratory data analysis, not presentation, so you do not need

to worry about removing axes that are superfluous or labels (other than the city) at this

point. Do you see any obvious issues in the data? [4 marks]

(d) Plot a matrix of scatterplots of highs vs lows for each city. Describe what can you learn

from the plots. Do you see any technical issues with the data? [3 marks]

(e) Compute the average low and high temperature for each city and week of the year. This is

similar to (b), but you want to averge over the years as well, i.e., the average for the first

week2 will be computed from temperatures on 1-7 January of all the years 2000, . . . , 2019.

Do not worry about special handing of leap years.

Plot the results. How can you interpret the resulting shapes? [4 marks]

(f) Take the plot from (e) and improve it by removing superfluous axes and margins. Use

axes only along the outer edge left, bottom and right of the entire matrix as illustrated in

figure 1. [4 marks]

2

If you don’t want to split years by hand (which you can), you may find as.POSIXlt(date)$yday useful.

Figure 1: Weekly average temperature lows and highs for 2000-2019 in 7 world cities.

6


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp