联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-04-13 11:54

STAT 3032 Homework5 Instruction (Spring 2019)

Due Wednesday, April 17 @ 11:59pm in Canvas

50 points in total

Please show your work on each problem for full credit. A correct answer, unsupported by the

necessary explanation, R code or output will receive very little if any credit. Your work needs

to be organized in a reasonably neat and coherent way, and submitted as a pdf file on

Canvas.

You are welcome to discuss with your classmates, but you must write up your homework

individually!

Problem 1

This problem is based on the Bridge Construction dataset (bridge.txt) from the textbook

website: http://gattonweb.uky.edu/sheather/book/docs/datasets/bridge.txt

Before construction begins, a bridge project goes through the design stage. Predicting the

design time is helpful for budgeting and scheduling purposes. Information on 45 bridge projects

was compiled. We will use the following variables (Note that the dataset includes more than

these 3 variables):

Response Variable Time = design time in person-days

Predictor Variables CCost = Construction cost (in $1000)

Dwgs = Number of structural drawings

(a).Import the data into R and provide a scatterplot matrix for the variables (the response and

predictor variables) mentioned above.

(b).Fit the model Time ~ 1 + CCost + Dwgs and provide the diagnostic plots. Do you see

any violation of the linear regression assumptions?

(c). You can see from the scatterplot matrix in (a) that the predictor variables are not linearly

related. This indicates that we need to transform the predictor variables. Find appropriate

transformations for the predictor variables. Hint: Use the R function powerTransform( ) from

the library alr4 or car.

1

(d). After transforming the predictor variables based on (c), find an appropriate transformation

for the response variable, Time. Hint: use the R function boxcox( ) from the library MASS.

(e). Regardless of your answers in (c) and (d), we will use log transformation for the response

variable and the predictor variables. Draw the scatterplot matrix for the transformed variables.

Now the predictor variables should look linearly related.

(f). Fit the model log(Time) ~ 1 + log(CCost) + log(Dwgs)and interpret the slope

coefficient of log(CCost)by filling in the blanks below.

When the construction cost increase by 1%, the design time _________________________ .

(g). Common sense says that design time should increase as the construction cost goes up. Is

your estimated coefficient value consistent with this common sense? Please answer yes or no.

________ .

(h).Use R to calculate the VIF values for the model log(Time) ~ 1 + log(CCost) +

log(Dwgs). Use 5 as the threshold, does the model has collinearity issue? Hint: use the R

function vif( ).

(i). Draw the diagnostic plots of the model log(Time) ~ 1 + log(CCost) + log(Dwgs).

Do you see any violation of the linear regression assumptions? Compare and contrast with the

diagnostic plots in (b).

(j). Use model log(Time) ~ 1 + log(CCost) + log(Dwgs) to predict the design time

for the next bridge to be built in Hennepin County. This bridge has 6 structural drawings and

has a budget of 300 thousand dollars for construction cost. Estimate the design time and

provide the 95% prediction interval for the estimated design time.

2

Problem 2:

For logistic regression with p predictor variables, the model is specified as

log( ) x x .. x .

E(Y |X)1E(Y |X) = β0 + β1 1 + β2 2 + . + βp p

Derive the formula to show that(β +β x +β x +...+β x ) 0 1 1 2 2 p p

Problem 3:

On April 15, 1912, during her maiden voyage, the ship Titanic sank after colliding with an

iceberg, killing many passengers and crew. Here, we will use a subset of the data to analyze the

survival rates for different groups of people. Please download the dataset

TitanicPartial.csv from Canvas and work through the following questions.

Variables:

Survival: 1 (survived) or 0 (dead)

Sex: “male” or “female”

Pclass: passenger class, 1, 2 or 3.

Age: age in years.

(a). Based on the scatterplots above, which group (combination of Sex and Pclass) has the

lowest odds for survival? Hint: look at the relative number of survival and non-survival in each

group.

3

(b). Fit the following two logistic regression models and provide the summary output for each

model. Hint: Use R function glm( )

mod1: Survived ~ 1 + as.factor(Pclass)

mod2: Survived ~ 1 + as.factor(Pclass) + Age + Sex

(c) Write down the fitted model of mod1. You may use either format (log odds or probability).

(d). What happens if we don’t apply the as.factor( ) function to Pclass? Try fitting mod1

without as.factor( ). You can call this new model mod3. How do the summary outputs of

mod1 and mod3 differ?

(e) According to mod2, holding Age and Sex constant, which passenger class has the highest

odds of survival and which passenger class has the lowest odds of survival? Please explain.

(f). Interpret the estimated slope of Age in mod2 with respect to the probability of survival.

(g). Interpret the estimated coefficient of Sexmale in mod2 with respect to the probability of

survival.

4

(h). In the 1997 movie Titanic, the lead character Jack was a 20-year-old male passenger in the

3rd class. Please predict his probability of survival based on mod2. Please use the formula of

the fitted model to compute the probability.

(i).Redo part (h). This time, please use the predict( ) function. You should get a very similar

answer if not identical.

(j). Since mod1 is nested within mod2, we can compare these two models using the deviance.

Which model do you prefer? Hint: use R function anova( )

(k) Redo Part (j). But this time, you are not allowed to use anova( ) function! Instead, reply

on the summary output of mod1 and mod2 in (a), and the pchisq( ) function. You should get

a very similar p value if not identical.

5


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp