联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-03-23 09:08

STAT 4620/5620 WINTER 2023

Assignment 4: Due Thursday March 23 2023

1. The following questions explore the fundamentals of nonparametric statis-

tics:

(a) [3] Describe smoothing and give two examples of popular smoothers.

(b) [2] Consider the generalized additive model (GAM) framework. What is

the most significant departure from the GLM framework?

(c) [3] Explain how model estimation proceeds for GAMs.

(d) [4] Suppose that you find yourself in a situation where both a GLM and

a GAM initially seem appropriate for your data. Explain the criteria you

would use to determine which of the two methods to recommend.

2. This question re-examines the hubble data.

(a) [6] Fit the model:

Vi = f(Di) + ?i

to the Hubble data, where f is a smooth function and the ?i are i.i.d.

N(0, σ2). Does a straight line model appear to be most appropriate?

How would you interpret the best fitting model?

(b) [4] Examine appropriate residual plots and refit the model with more

appropriate distributional assumptions. How are your conclusions

from part (a) modified?

3. Read and provide a one page summary of the lme4 documentation. [5]

4. The data frame Gun (library nlme) is from a trial examining methods for fir-

ing naval guns. Two firing methods were compared, with each of a number

of teams of 3 gunners; the gunners in each team were matched to have

similar physique (Slight, Average, Heavy). The response variable rounds

is rounds fired per minute, and there are 3 explanatory factor variables,

Physique (levels Slight, Medium and Heavy); Method (levels M1 and M2)

and Team with 9 levels. The main interest is in determining which method

and/or physique results in the highest firing rate and in quantifying team-

to-team variability in firing rate.

(a) [2] Identify which factors should be treated as random and which as

fixed, in the analysis of these data.

(b) [4] Write out a suitable mixed model as a starting point for the analysis

of these data.

(c) [6] Analyse the data using lme in order to answer the main questions

of interest and report your conclusions.

1

5. The Carseats dataset from the R package ISLR is a simulated dataset of

carseat sales at 400 different stores. Full information on the variables in

this dataset can be found using help(Carseats) after loading the package.

(a) [4] Create a new factor variable for the Carseats representing whether

or not Sales is greater than 8. Randomly split the dataset into a testing

and training set. On the training set grow a classification tree using the

R rpart package to classify whether a store had high carseat sales or not

(Hint: Remove the Sales variable). Report the classification accuracy

you got on the testing data set and on the training set.

(b) [4] Prune the tree you grew in part a. Report the pruned tree’s classi-

fication accuracy on the testing data set and on the training set. Why

might pruning have improved the classification accuracy on the testing

set? Why might it have reduced accuracy on the training set?

(c) [4] Grow a random forest using the randomForest package the same

way you did the tree. Is performance on the testing set better than the

classification trees? Why might that be the case?

(d) [4] Briefly outline the similarities and differences between CARTs and

random forests.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp