联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2018-11-15 10:30

Statistics 272 Take Home Exam #2

PLEDGE

I pledge my honor that during this exam I have neither given nor received assistance and that I have seen no

dishonest work.

Signed:

I have intentionally NOT signed the pledge.

For this exam, you may use our course textbook (Stat2), class notes, material on Moodle, material on the R

server, and R. No other resources (books, electronic resources, other students, etc) may be used. You may

not discuss any aspect of this exam with any person other than me. Ask me if you have any questions.

This assignment is due at 4:00 PM on November 20. You will turn in a printed, paper document of your

knitted RMarkdown file to my office by the deadline as well as the considered pledge at the top of this page.

Your submission should be well-formatted, well-written, and easy to read. The RMarkdown file that you

used for your exam should be made available in the Submit folder on the R server. Do not edit this file

after the submission deadline.

The Pima are a group of Native Americans living in Arizona and Mexico. The Pima have one of the highest

prevalence of type 2 diabetes in the world. You will be working with the dataset pimas.csv, which contains

measurements on 768 Pima women. This file is on the R server. Answer each question clearly and concisely,

justify each answer, and check assumptions where appropriate.

Variable Name Description

pregnant # pregnancies (0 = 0-1, 1 = 2 or more)

glucose plasma glucose concentration (glucose tolerance test)

pressure diastolic blood pressure (mm Hg)

triceps triceps skin fold thickness (mm)

mass body mass index ((weight in kg)/(height in m)2

)

age age

diabetes diabetes status (neg = no diabetes, pos = diabetes)

1

We are interested in using the Pimas dataset to construct a model that predicts the probability of diabetes.

1. Perform an exploratory data analysis to determine how each explanatory variable is related to the

response. For quantitative variables, produce conditional density plots and summary statistics by

diabetes status. For categorical explanatory variables, produce an appropriate table showing the

relationship with diabetes status. Summarize each relationship in a brief sentence.

2. Construct a two-way table with diabetes and pregnancy. Find the (unadjusted) odds ratio and provide

an interpretation in the context of the problem.

Next, fit logistic regression models with diabetes as the response and the following sets of explanatory

variables.

1: pregnant

2: pregnant, mass, age, triceps

3: pregnant, mass, age

4: pregnant, mass, age, pressure, glucose

5: pregnant, mass(centered), age(centered), glucose(centered)

6: pregnant, mass, age, glucose, pregnant:age

1. Use Model 1 to find the unadjusted odds ratio. Compare this answer to your answer in 2 above and

explain any differences or similarities.

2. Construct the 95% confidence interval for the odds ratio in Model 1 and provide an interpretation in

the context of the study.

3. Is there evidence that we can remove triceps from the models? Justify your answer using an appropriate

test.

4. The predictors pregnant, mass, and age can all be obtained from medical records, but the predictors

glucose and pressure require an in-person measurement. Should these in-person explanatory variables

be included in the model? Conduct an appropriate test and justify your answer.

5. Provide an interpretation of all of the coefficients in Model 3.

6. Considering Model 3, for a woman with no previous pregnancies and with the average age, what does

her mass need to be in order to have a predicted greater than 0.50? Justify your answer by hand (not

using R).

7. Using Model 4, find the predicted probability of diabetes for a 55 year-old woman who has had 3

pregnancies and has a mass of 32.

8. Provide an interpretation of all of the coefficients in Model 5.

9. Provide an interpretation of the interaction term in Model 6.

The next few questions are on the bootstrap. We will continue with the Pima data. Suppose we want to

construct a 95% confidence interval for the IQR of glucose. Generate a bootstrap distribution of IQRs for

samples of size n from the original sample.

1. In a brief paragraph, describe how to use the bootstrap, why it works, and why it is useful.

2. Plot a histogram of the bootstrap distribution and report the standard deviation. What does this

distribution allow us to do?

3. Using your bootstrap distribution, select an appropriate method to construct a bootstrap confidence

interval for the population IQR. Interpret your interval in a brief sentence.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp