联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-12-18 10:54

Applications of Data Science and Statistical Modelling

Assignment 4

29/11/2019

The dataset SubstationRPD.RData contains real power delivered (KW) for each 10-minute period, of every

day during June and July, for 410 substations in the southwest of Wales, UK. The aim of this assignment is

to understand how the power demand changes throughout the day, identify any weekly/monthly patterns if

present, and using this information fit a GAM which allows us to predict future demands. Note that in order

to fit a GAM you’ll need to have the mgcv package installed.

1. [3 marks] Produce summaries of the dataset SubstationRP D.RData and produce histograms showing

the distributions of real power delivered for the 410 substations. Comment on the distributions of

real power delivered, and any variations between those distributions between substations. (E.g. You

could choose specific 10 minute intervals - say the 10 minute window after midnight, and plot the

distribution of the power demand across the substations, or look at average daily demands, maximum

daily demands...)

2. [3 marks] For each substation, calculate the average demand for each 10 minute period (that is you

should average over the days) and then plot these on the same plot, using a different colour for each

substation. Add a thick, black line showing the overall mean for the demand of all of the substations.

Comment on the variability in patterns between substations. Does the overall mean seem a reasonable

summary of all the data? (Hint: Since we are plotting 410 separate curves, you might want to suppress

the legend, which can be done using the ggplot option ‘theme(legend.position = "none")‘).

0

100

200

300

400

00:00 04:00 08:00 12:00 16:00 20:00 23:50

Time

Average Daily Demand

All days

3. [3 marks] Split your plot in Question 2 into four separate plots representing; 1) All days, 2) Weekdays,

3) Saturdays and 4) Sundays. Are there any differences in patterns between days? (Hint: You might

find the ‘weekdays‘ function useful.)

Now that we understand how the demand changes throughout the day, and have identified some seasonal

patterns, the next step is to fit a GAM to our data:

4. [2 marks] First, reformat the SubstationRPD.RData dataset so that each row is the average of all

demand data for each substation. That is each row corresponds to one day, and in each column you

should have the average demand (across all substations) for the corresponding 10 minute period.

1

5. [10 marks] Add a column with the day of the month, and another one with the month of the year. Note

that you can access these using the following R code:

as.numeric(substr(Date,9,10)) # day

as.numeric(substr(Date,6,7)) # month

Next collapse the data, so that the previously calculated mean power demands are in a single column, instead

of separate rows. By this point you should have a dataset similar to the following:

# A tibble: 6 x 6

# Groups: Date, weekdays [1]

Date weekdays minute.int mean day month

<date> <chr> <dbl> <dbl> <dbl> <dbl>

1 2012-06-01 Friday 1 56.7 1 6

2 2012-06-01 Friday 2 57.0 1 6

3 2012-06-01 Friday 3 56.6 1 6

4 2012-06-01 Friday 4 55.7 1 6

5 2012-06-01 Friday 5 55.5 1 6

6 2012-06-01 Friday 6 54.9 1 6

Fit and plot a GAM which accounts for the underlying seasonal pattern in demands (you should decide which

seasonal patterns are appropriate to include - daily (use the minute.int column in the above dataset), weekly -

(use the day column in the above dataset), monthly - (use the month column in the above dataset)). Comment

on the fit of the model. What are the (effective) degrees of freedom, and what does this tell us about the

complexity of the model that has been fit?

6. [4 marks] Choose an appropriate model, with which predict the demand for the 21st to the 28th of July.

Take the daily average demand, and produce a plot showing these mean predictions against time. You

can use the following code to create a new dataset for the prediction. Note that depending on how you

named the columns of your dataset you might have to modify the column names in the following code:

new.data <- data.frame(matrix(c(rep(1:144,8),rep(21:28,144),

rep(7,1152)),nrow=1152,ncol=3,byrow=FALSE))

new.data$Date <- rep(seq(as.Date("2012-07-21"),

as.Date("2012-07-28"),"days"),144)

names(new.data) <- c("minute.int","day","month","Date")

All the exercises should be solved using R. A pdf document with your answers, (commented)

R code and its outputs/plots should be submitted via ELE by Noon (12pm), 18th December.

Note that late submissions will be penalised.

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp