联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-04-03 10:02

REGRESSION MODELLING

(STAT2008/STAT4038/STAT6014/STAT6038)

Assignment 1 for Semester 1, 2019

INSTRUCTIONS:

This assignment is worth 15% of your overall marks for this course.

Please submit your assignment on Wattle. When uploading to Wattle you must submit the following,

combined into a single document:

1. Your assignment/report in a pdf document.

2. An ‘.R’ le containing the R code you have used for the assignment. Failure to upload the

R code will result in a penalty.

Assignments should be typed. Your assignment may include some carefully edited computer output

(e.g. graphs, tables) showing the results of your data analysis and a discussion of these results,

as well as some carefully selected code. Please be selective about what you present and only include

as many pages and as much computer output as necessary to justify your solution. It is important

to be be concise in your discussion of the results. Clearly label each part of your report with the

part of the question that it refers to.

Unless otherwise advised, use a signi cance level of 5%.

Marks may be deducted if these instructions are not strictly adhered to, and marks will certainly be

deducted if the total report is of an unreasonable length, i.e. more than 10 pages including graphs

and tables. You may include an appendix that is in addition to the above page limits; however the

appendix will not be assessed. It will only be used if there is some question about what you have

actually done.

You may ask me (Abhinav Mehta) questions about this assignment up to 24 hours before the

submission time. This will allow me enough time to respond to your questions.

Late submissions will attract a penalty of 5% of your mark for each day of delay. No assignments

will be accepted 10 days beyond the due date.

Extensions will usually be granted on medical or compassionate grounds on production of appropriate

evidence, but must have my permission by no later than 24hours before the submission

date. If you are granted an extension and submit your assignment after the extended deadline then

the late submission penalty will still apply.

Assignment 1 - Sem 1, 2019 Page 1 of 3

Question 1 [50 Marks]

Data on eruptions of Old Faithful Geyser, in October 1980 was collected and stored in a .csv le

‘oldfaithful’. Variables are the duration in seconds of the current eruption, and the interval time in

minutes to the next eruption. Data was not collected between approximately midnight and 6 AM.

It is suspected that Duration is associated with the Interval

(a) [5 marks] Conduct an exploratory data analysis to assess whether the two variables are associated.

Is there a statistically signi cant correlation between the variables?

Use the cor.test() function to conduct a suitable hypothesis test. Clearly specify the hypotheses

you are testing and present and interpret the results.

(b) [20 marks] Fit a simple linear regression (SLR) model with Interval as the response variable

and Duration as the predictor. Construct a plot of the residuals against the tted values, a

normal Q-Q plot of the residuals, a bar plot of the leverages for each observation and a bar plot

of Cook’s distances for each observation. Use these plots (and other means) to comment on

the model assumptions and on any unusual data points.

(c) [10 marks] Produce the ANOVA (Analysis of Variance) table for the SLR model and interpret

the results of the F-test. What is the coecient of determination for this model and how should

you interpret this summary measure?

(d) [10 marks] What are the estimated coecients of the SLR model in part (b) and the standard

errors associated with these coecients? Interpret the values of these estimated coecients and

perform t-tests to test whether or not these coecients di er signi cantly from zero. What do

you conclude as a result of these t-tests?

(e) [5 marks] If there is a eruption which lasted for 120 seconds then what will be the interval of

time before the next eruption, as predicted by your model? Construct an appropriate interval

estimate for the length of this interval.

Assignment 1 - Sem 1, 2019 Page 2 of 3

Question 2 [50 Marks]

On March 1, 1984, the Wall Street Journal published a survey of television advertisements conducted

by Video Board Test, Inc., a New York ad-testing company that interviewed 4000 adults. These

respondents were regular product users who were asked to cite a commercial they had seen for that

product category in the past week. In this case, the response is the number of millions of retained

impressions per week (return). The predictor, (spend), is the amount of money (in $ millions) spent

by the rm on advertising. The data is available on wattle in .csv le called advertising.

(a) [10 marks] Is there a linear association between the two variables? You may want to experiment

with some transformations, like the natural log (log()) and the square root transformation

(sqrt()) to one or both of your variables to assess the linear association. Make a choice at this

stage, for your transformed variables and provide justi cation for this choice.

(b) [15 marks] With your chosen transformations, t a simple linear regression (SLR) model. Construct

a plot of the residuals against the tted values, a normal Q-Q plot of the residuals, a bar

plot of the leverages for each observation and a bar plot of Cook’s distances for each observation.

Use these plots (and other means) to comment on the model assumptions and on any

unusual data points.

(c) [10 marks] Produce the ANOVA (Analysis of Variance) table for the SLR model and interpret

the results of the F-test. What is the coecient of determination for this model and how should

you interpret this summary measure?

(d) [15 marks] Based on the model t in part (b), write the mathematical expression for the regression

model in the original untransformed variables. Interpret the e ect of coe cients on the

response variable. In particular, for every $1 million increase in spending how much increase is

expected in the retained impressions, based on your chosen model t?

Assignment 1 - Sem 1, 2019 Page 3 of 3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp