联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2022-10-27 08:40

3AS/3AS4: Applied Statistics

Assignment 1 Date: October 5, 2022

To be submitted by 5pm, November 03, 2022

1. In the data set igfdata.csv, measurements on age, sex and insulin-like growth

factor (igf) for a group of people are available. The data set can be downloaded

from canvas. The original source is: J. Clin. Endocrinol. Metab. 78(3): 744–752,

March 1994. Each row in the data set corresponds to one individual. You need to

download the file in your computer in a suitable folder of your choice. Then start

RStudio and set that folder containing the data as your working directory from the

“Session” menu.Finally, import the data in R using the following:

igfdata = read.csv("igfdata.csv", header=T)

For all the following questions, include your R codes, plots, and outputs in the

solution.

(a) Make a suitable plot for the distribution of igf and discuss your findings.

(b) Compare the igf for males and females using boxplots. Discuss your findings.

(c) Make a scatterplot of igf against age. Comment on how igf changes with

age.

(d) Fit a simple linear regression to predict igf using age. Is age a significant

variable in this regression? Justify your answer.

(e) Report the mean and standard deviation of igf for males and females sepa-

rately.

(f) Using linear regression or otherwise check if there is a significant difference in

mean igf for males and females. Use level of significance α = 0.05.

(g) Consider the subset of the data with age less than or equal to 15 years. For

this subset of people, use linear regression with age and sex as predictors to

predict igf. Comment on the significance of the variables age and sex.

(h) Use residual plot to check if the nonlinearity assumption is violated and if so,

use an alternative model to fit the data.

(i) Fit a similar model to predict igf for the subset of people with age greater

than 15 years.

(j) Is there a significant difference in the models for people with age less than or

equal to 15 years and for people with age greater than 15 years.

2. I collect a set of data (n = 100 observations) containing a single predictor and a

quantitative response. I then fit a linear regression model to the data, as well as a

separate cubic regression, i.e. Y = β0 + β1X + β2X

2 + β3X

3 + .

(a) Suppose that the true relationship between X and Y is linear, i.e. Y = β0 +

β1X + . Consider the training residual sum of squares (RSS) for the linear

regression, and also the training RSS for the cubic regression. Would we expect

one to be lower than the other, would we expect them to be the same, or is

there not enough information to tell? Justify your answer.

1

(b) Answer (a) using test rather than training RSS.

(c) Suppose that the true relationship between X and Y is not linear, but we

don’t know how far it is from linear. Consider the training RSS for the linear

regression, and also the training RSS for the cubic regression. Would we expect

one to be lower than the other, would we expect them to be the same, or is

there not enough information to tell? Justify your answer.

(d) Answer (c) using test rather than training RSS.

3. Consider a simple linear regression model yi = β0 + β1xi + i for i = 1, . . . , n.

Assume E(i) = 0, V ar(i) = σ

2 and E(ij) = 0 for i 6= j. Let β?0 and β?1 be the

least squares estimator of β0 and β1 and y?0 be the predicted value of y for a new

observation x = x0.

(a) Show that y?0 is a linear estimator, that is, y?0 =

∑n

i=1 ciyi for some constants

ci depending on x1, . . . , xn.

(b) Derive the bias and the variance of y?0.

(c) Compare the bias and the variance of y?0 if the true model is y = β0 + β1x +

β2x

2 + .

2


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp