代做MA308、代写Python程序语言、代做ISLR 编程、代写ANOVA -代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

代做MA308、代写Python程序语言、代做ISLR 编程、代写ANOVA

日期：2019-12-31 07:20

MA308: Statistical Calculation and Software

Assignment 3 (Dec 24, 2019 - Jan 02, 2020)

3.1 For the “weightgain” dataset from HSAUR3 package, the data arise from an experiment to study the gain in weight of rats fed on four different diets, distinguished by

amount of protein (low and high) and by source of protein (beef and cereal). Ten

rats are randomized to each of the four treatments and the weight gain in grams

recorded. The question of interest is how diet affects weight gain.

(a) Summarize the main features of the data by calculating group means and standard deviations, use the plotmeans() function in the gplots package to produce

an interaction plot of group means and their confidence intervals.

(b) Use interaction2wt() function in the HH package to produce a plot of both

main effects and two-way interactions for any factorial design of any order.

Explain whether there exists interaction between source and type.

terms respectively, explain the corresponding results.

(d) What are the assumptions that our data need to satisfy when we implement

one-way ANOVA? Now if we use one-way ANOVA to examine the difference of

weightgain between different source of protein, are these assumptions satisfied?

(e) Carry out the permutation test version of the two-way factorial ANOVA analysis

of weightgain∼source*type with the lmPerm package, compare the result with

that in 3.1(c).

3.2 For the “planets” dataset from HSAUR3 package,

(a) Apply complete linkage and average linkage hierarchical clustering to the planets

data. Compare the results with the K-means (K=3) clustering results in the

lecture notes.

(b) Construct a three-dimensional drop-line scatterplot of the planets data in which

the points are labelled with a suitable cluster label, K-means (K=3) method

can be used for clustering.

mixture model for the eccen variable in the planet data. (Hint: refer to the

“Mixture distribution estimation” section in Chapter 6)

(d) In fact, package mclust offers high-level functionality for estimating mixture

models, apply Mclust to estimate normal mixture model for the eccen variable

in the planet data. Compare the result with that in 3.2(c).

(e) Implement principal component analysis on the planet data, find out the coefficients for the first two principal components and the principal component

scores for each planet.

(f) Apply K-means (K=3) clustering to the first two principal components of the

planet data. Compare the clustering result with that based on the original

data mentioned in 3.2(a).

3.3 For the “Default” dataset from ISLR pacakge, we consider how to predict default for

any given value of balance and income. In particular, we will now compute estimates

for the standard errors of the income and balance logistic regression coefficients in

two different ways: (1) using the bootstrap, and (2) using the standard formula for

computing the standard errors in the glm() function. Do not forget to set a random

seed before beginning your analysis.

(a) Using the summary() and glm() functions, determine the estimated standard

errors for the coefficients associated with income and balance in a multiple

logistic regression model that uses both predictors.

(b) Write a function, boot.fn() , that takes as input the Default data set as well

as an index of the observations, and that outputs the coefficient estimates for

income and balance in the multiple logistic regression model.

standard errors of the logistic regression coefficients for income and balance.

(d) Comment on the estimated standard errors obtained using the glm() function

and using your bootstrap function.

3.4 For the “Default” dataset from ISLR pacakge, we consider how to predict default for

any given value of balance and income.

(a) Split the sample set into a training set (70%) and a validation set (30%). Fit a

multiple logistic regression model (default ∼ balance + income) using only the

training observations. Obtain a prediction of default status for each individual

in the validation set by computing the posterior probability of default for that

individual, and classifying the individual to the default category if the posterior

probability is greater than 0.5. Compute the validation set error, which is the

fraction of the observations in the validation set that are misclassified.

[10 points]

(b) Apply Classical Decision Tree and Conditional Inference Tree on the Default

dataset. Use the plotcp() function to plot the cross-validated error against the

complexity parameter and choose the most appropriate tree size.

variables to create a large number of decision trees. Implement random forest

algorighm based on traditional decision trees and conditional inference trees

respectively. Use the random forest models built to classify the validation

sample and compare the predictive accuracy of the two models.

(d) Fit a support vector machine classifier to the Default dataset. Use tune.svm()

function to choose a combination of gamma and cost which may lead to a more

effective model. Compare the sensitivity, specificity, positive predictive power

and negative predictive power of the svm, random forest and logistic regression

classifiers.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：CMT307留学生代做、代写Python程序设计、代做Machine Learning、Python语言...

【下一篇】：CMT307留学生代做、代写Python程序设计、代做Machine Learning、Python语言...

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代做MA308、代写Python程序语言、代做ISLR 编程、代写ANOVA

日期：2019-12-31 07:20

相关文章