联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-02-27 08:58

STA 138 Winter 2019

Homework 5 - Due Friday, Feb 22nd

Book Portion (does not require R)

Note: This may be hand written or typed. Answers

should be clearly marked. Please put your name in

the upper right corner.

1. A study is trying to predict if someone will get the flu shot

or not, with the following dataset:

Column 1: shot (Y ): If the subject got a flu shot (y = 1),

or not (y = 0)

Column 2: age (X1): The age of the subject in years.

Column 3: aware (X2): The health awareness score, where

a higher score indicates a higher level of awareness.

Column 4: gender (X3): M or F

The estimated regression function is:

1.1772+0.0728X1 0.0990X2 0.4340X3,M

(a) Interpret the exponential of the β associated with

awareness score.

(b) Interpret the exponential of the β associated with gender.

(c) Estimate the probability that a male subject aged 50

with awareness score 60 would not get a flu shot.

(d) Estimate the odds that a female subject aged 30 with

awareness score 50 would get the flu shot.

2. Continue with problem 1. The estimated standard errors

for the β coefficients follow:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.1772 2.9824 -0.3947 0.6931

age 0.0728 0.0304 2.3959 0.0166

aware -0.0990 0.0335 -2.9567 0.0031

genderM 0.4340 0.5218 0.8317 0.4056

(a) Based on the above, with an α of 0.05, does it appear

that gender was a significant predictor for the probability

of getting the flu shot? Explain your answer.

(b) Which coefficient appears to be the most useful in predicting

if a subject gets the flu shot? Explain your

answer.

(c) Find the 95% corrected confidence interval for the β

associated with age, assuming you are making g = 3

confidence intervals.

(d) What does your interval from (c) suggest about retaining

or removing the age variable from the model?

Explain your answer.

3. Continue with problem 1. The error matrix for this model

is (where the cutoff used was 0.50):

Predicted : y = 0 Predicted : y = 1

Truth : y = 0 130 5

Truth : y = 1 18 6

(a) Estimate the sensitivity, specificity, and overall error

rate.

(b) The 95% confidence interval for AUC is :

(0.7308,0.9139). Do you believe that the model

is predicting Y = 1 well? Explain your answer.

(c) Explain why you might be interested in AUC over the

error matrix.

(d) Explain why a standardized residual with a value over

3 may be concerning.

4. A study was performed to examine what effects the probability

of using birth control in women.

Column 1: con (Y ): If the subject uses birth control,

where Y = 1 indicated they do, and Y = 0

indicated they do not.

Column 2: age (X1): The age of the subject in years.

Column 3: edu (X2): The level of education of the subject,

with A (advanced), G (graduate or above), M

(high school), L (below highschool).

Column 4: working (X3): N (they are not working) or Y

(they are working). The purpose of the study

was to examine contraceptive use in married

women.

The estimated coefficients (β’s) and their standard errors

are:

Estimate Std. Error

(Intercept) 0.3392 0.5364

X1 -0.0095 0.0151

X2,G 0.8300 0.2964

X2,L -0.7679 0.4669

X2,M -0.1119 0.3370

X3,Y -0.0320 0.2888

(a) Write down the model for each of the categories corresponding

to X2. This should give four models.

(b) Estimate the probability that a subject with an advanced

degree who is not working and is age 30 uses

birth control.

(c) Interpret the value exp(0.8300) in terms of the problem.

(d) The log-likelihood for the model that includes all X

variables is: -195.3582 and the log-likelihood for the

model which includes only X1 and X2 is: -195.3644.

Use these to test to see if X3 can be dropped from

the model. State the null, alternative, test-statistic,

p-value, and conclusion.

5. Continue with problem 4. The estimated, corrected 95%

confidence intervals for the model with X1 and X2 in it

follow:

β

age

1

-0.0474 0.0280

β

edu

2,G 0.0938 1.5791

β

edu

2,L -1.9989 0.3666

β

edu

2,M -0.9586 0.7315

(a) Does this suggest a significant difference in the odds

of success for education level L vs. A? Explain your

answer.

(b) Does this suggest a significant difference in the odds

of success for education level G vs. A? Explain your

answer.

(c) What would adding an interaction term between age

and education level do? What would be the practical

effect, in other words?

(d) What would your recommendation for the final model

for this data be? Explain your answer.

6. Continue with problem 4. Assume we are using the model

with both X1 and X2 in it.

(a) The five-number summary for the standardized residuals

are below:

Min First Quartile Median Third Quartile Max

-2.0220 -0.9455 -0.0008 0.9874 2.1746

Does this suggest there may be outliers in the data?

Explain why or why not.

(b) The error matrix with the cutoff of 0.50 follows:

Predicted: Y=0 Predicted: Y=1

Truth: Y=0 63 68

Truth: Y=1 50 119

Estimate the sensitivity, specificity, and overall error

rate.

(c) The error matrix with the cutoff of 0.70 follows:

Predicted: Y=0 Predicted: Y=1

Truth: Y=0 108 23

Truth: Y=1 130 39

Estimate the sensitivity, specificity, and overall error

rate.

(d) Which cutoff would you suggest using, and why?

7. Answer the following questions as True or False:

(a) In logistic regression, the larger the value of DFbeta,

the more influential the corresponding row of your data

was.

(b) In logistic regression, the intercept does not always

have a practical interpretation.

(c) In logistic regression, the larger the absolute value of

βi

, the more the corresponding X effects ?π.

R Portion (requires some use of R)

Note: You do not have to use R Markdown to turn

in the homework, but the homework must be turned

in in a reasonable format. The answers to the questions

should be in the body of the homework, and the

code used to obtain those answers should be in an appendix.

There should be no code in the body of the

homework. You can accomplish this in R, Word, LaTex,

Google Docs, etc. This portion should be printed

out and turned in with the hand-written portion.

I. Online under “Files” you will find the dataset

internet.csv, which has the following columns:

Column 1. Newbie: 1 the subject identified themselves

as “new to the Internet”, 0 otherwise.

Column 2. Age: The age of the subject

Column 3. Gender: 1 indicates the subject was male, 0

indicates female.

Column 4. Educational.Attainment: With levels

“High School“, “College”, “Masters”, and

“Doctoral”.

Column 5. score: The corresponding score for the

Educational.Attainment column, where 1

= High School, 2 = College, 3 = Masters,

and 4 = Doctoral.

The goal is to predict whether someone considers themselves

as “new to the Internet“.

(a) Fit a logistic regression model with Newbie as your

response variable, and Age, Gender, and Score as

your explanatory variables. Write down the estimated

logistic regression function.

(b) Interpret the value of exp β associated with Age in

terms of the problem.

(c) Interpret the value of exp β associated with Gender

in terms of the problem.

(d) Interpret the value of exp β associated with score

in terms of the problem.

II. Continue with problem I.

(a) Find and report the 99% profile likelihood confi-

dence intervals for all values of β.

(b) Using (a), which of your explanatory variables do

you believe significantly effect if someone identifies

themselves as “new to the Internet“? Explain.

(c) Predict the probability that a female, aged 28, with

a doctoral degree identifies themselves as “new to

the Internet“.

(d) Are there any unusual observations in your dataset?

Explain.

III. Online under “Resources” you will find the dataset

work.csv, which has the following columns:

Column 1. obese: 1 the subject was obese, 0 otherwise.

Column 2. gender: with levels male, female.

2

Column 3. age: the age of the subject.

Column 4. marriage: With levels married, widowed, divorced,

never married.

Column 5. min: Minutes of Sedentary Activity per

Week

The goal is to predict whether a subject is obese or not.

(a) Fit and report the estimated logistic regression

model with coefficients for gender, age, and the categories

for the marriage variable.

(b) Write down the estimated logistic regression model

for people who have never been married.

(c) Write down the estimated logistic regression model

for people who are divorced.

IV. Continue with problem III.

(a) Display the Wald Test-statistics and p-values for

testing if each coefficient is zero or not.

(b) Based on the above, which variables would you retain

in your model, and why? Assume α = 0.10.

(c) Fit the estimated logistic regression model with only

the variables you chose from (b).

(d) Interpret the coefficients of the estimated regression

model you chose in (c).

V. Continue with problem III and IV.

(a) Predict the probability that a married women aged

28 who has 400 sedentary minutes per week is obese

using the full model (all first order predictors, no

interactions).

(b) Predict the probability that a married women aged

28 who has 400 sedentary minutes per week is obese

using the model suggested in IV(c).

(c) Using the LR-ratio test, test to see if you can drop

the coefficient for gender from the model. Assume

the “full model” is: logit(π) = α + β1x

gender +

β2x

min. Assume α = 0.05.

Report back the test-statistic, conclusion on

the test, and p-value.

(d) Using the LR-ratio test, test to see if you can drop

the coefficient for min from the model. Assume the

“full model” is: logit(π) = α + β1x

gender + β2x

min.

Assume α = 0.05.

Report back the test-statistic, conclusion on

the test, and p-value.

VI. Continue with problem IV, and use the “best model”

suggested.

(a) Find the value of AUC, the 95% confidence interval

for AUC, and plot the ROC.

(b) Does this value of AUC suggest that the model has

fit the data well? Explain your answer.

(c) Fit the full model (including all predictors) and repeat

(a) for the full model.

(d) What does (c) suggest AUC and adding predictors,

if anything?


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp