Homework 7 – due April 4
Note:You must provide necessary R code with your answers.You will not receive any credit if you don’t show the code for the part you areanswering.
1. (6 total points) Use the FG2013 datafile posted on Canvas to analyze whataffects the probability of making a field goal in football.
(a) (2 points) Write out the logistic regression model using Yards asthe explanatory variable and outcome as the response.
> FG2013<-read.csv(file ="~/Desktop/FG2013.csv")
> logit.reg<- glm(Outcome ~ Yards, data = FG2013, family =binomial(link="logit"))
> summary(logit.reg)
Call:
glm(formula = Outcome ~ Yards, family = binomial(link= "logit"),
data =FG2013)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6626 0.2421 0.3803 0.5883 1.3412
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.85106 0.50126 11.673 <2e-16 ***
Yards -0.09731 0.01121 -8.683 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be1
Nulldeviance: 799.91 on 1015 degrees of freedom
Residual deviance: 705.71 on 1014 degrees of freedom
AIC: 709.71
Number of Fisher Scoring iterations: 5
(b) (4 points) Provide a plot of the logistic regression model fromabove with the 95% confidence bands.
> # Plot logistic model
> curve(expr = predict(logit.reg,data.frame(Yards=x), type="response"), col = "red",
+ xlim = c(min(FG2013$Yards), max(FG2013$Yards)), ylab =expression(hat(pi)),
+ xlab = "Yards", main = "Estimated probability of making afield goal in football",
+ panel.first = grid())
2. (12 total points) The next aspect is tovalidate the model. Take the placekick data set used in class and remove all PATs (i.e. all extra points)from the data set so that only field goals remain. The code used below showshow to do so.
> placekick <- read.csv(file ="~/Placekick.csv")
> kick <- placekick[placekick$PAT == 0,]
(a) (2 points) Fit the logisticregression model where distance is the only explanatory variable.
(b) (2 points) Calculate therelative change in the regression coefficients for each model.
(c) (4 points) Assume a relativechange of over 50% is considered large. If the relative change is large,explain why this is not desirable.If the relative change is small, explain why this isdesirable
(d) (4 points) Construct ONE plotthat has the model obtained from the placekick data with the model obtainedfrom the FG2013 data. Thus, your plot should have two logistic regression curvesthat depict the logistic regression model from each data set. Make sure yourplot also has a legend in order to differentiate the two curves. Has the probability of asuccessful field goal increased or decreased by 2013?
3. (42 total points) From the FG2013data set, use the variables Yards, PointsAhead, and Quarter to predict the probability of a successful field goal. Assume that Quarter isquantitative.
(a) (3 points) Write out thelogistic regression model.
(b) (4 points) Estimate theprobability that a 40 yard field goal is successful when the game is tied andin the 4th quarter. Do the same for 20 and 30 yard field goals.
(c) (3 points) Show how tocalculate the probability of making a 40 yard field in a tied game in the 4thquarter. Do NOT use R to answer this question!
(d) (3 points) Construct a 95%confidence interval for the probability of making a 40 yard field goal when thegame is tied in the 4th quarter ANDinterpret the interval.
(e) (6 points)
(i) Interpret the estimated oddsfor a 10 yard decrease in distance (i.e. use c = -10).
(ii) Interpret the estimated oddsfor a 3 point increase in lead (i.e. use c = 3)
(iii) Interpret the estimated oddsafter quarter increases by 1 (i.e. use c= 1)
(f) (15 points) Obtain and interpret the 95% confidence intervals foreach of the three odds ratios calculated above. Discuss which intervals do ordo not contain 1 and why this is meaningful in terms of the problem.
(g) (3 points) Conduct a formalhypothesis test using the anova() function to see if points ahead and quarter need to be included inthe model. Write out the hypotheses and give a conclusion.
(h) (2 points) Does your answer inpart (g) confirm your conclusions about the importance of points ahead andquarter that you obtained in part (f)? Explain why or why not.
(i) (3 points) Fit a model that includesthe three variables from above as well as an interaction between points aheadand quarter. Based on this new model, should points ahead and quarter beincluded in the model?
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。