联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2025-02-11 05:09

Division of Biostatistics

Qualifying Exam, PhD in Biostatistics

Comprehensive Exam, MS in Biostatistics

June 17, 2024

9:00am – 3:00pm

The dataset Osteoporosis.csv will be used for problems 1 and 2.

These data come from a study of the demographic, lifestyle, and dietary factors with bone mineral density (BMD) and osteoporosis (Chaudhari et al., 2019, PMCID: PMC6556264). The study included 169 participants aged at least 50 years old, seen in hospitals of Kathmandu, Nepal. The participants were administered questionnaires, and had their bone density measured via dual-energy X-ray absorptiometry (DEXA scans) in three locations: lumbosacral spine, right femur, and left femur. The variables in the dataset are:

age   age in years

sex   participants’ sex (male, female)

occupation   participants’ occupation

ethnicity   participants’ ethnicity (Brahmin chhetri, Janjati, Newar, other)

bmi   body mass index (BMI), in kg/m2

bmd   bone mineral density (BMD) T-score, in standard deviations, compared to healthy 25-35 year olds of same sex and ethnicity. Computed as the lowest T-score from the three DEXA scan locations (lumbosacral spine, right femur, left femur)

diagnosis   osteoporosis (BMD ≤ −2.5), osteopenia (-2.5 < BMD ≤ −1), or normal (BMD > −1)

op   osteoporosis indicator (1 if BMD ≤ −2.5, 0 otherwise)

smoking   smoking status (yes, no)

alcohol   alcohol consumption (yes, no)

exercise   daily exercise (yes, no)

tea   tea consumption (yes, no)

calcium   estimated dietary daily calcium intake, in mg

vitamind   estimated dietary daily vitamin D intake, in IU

l.femur   BMD T-score measured in the left femur

r.femur   BMD T-score measured in the right femur

lumbosacral   BMD T-score measured in the lumbosacral region of the spine

Please use appropriate plots, statistics and explanations in your answers below.

1. Problem 1: Osteoporosis

Consider the relationship between the response bone mineral density (BMD) and the predictor BMI.

(a) Describe the linear association between BMD and BMI quantitatively in a simple sentence. However, show that model diagnostics suggest that the association may not be linear throughout the BMI range.

(b) A BMI ≥ 25 defines overweight. A careful analysis suggests that in this range, the benefit of BMI is greatly diminished. Fit a linear spline (broken-stick) model with a knot at BMI = 25. Is there statistical evidence for different BMI slopes when BMI < 25 and when BMI ≥ 25?

(c) Fit the model BMD ∼ sex + age + sex × age. On a single plot show the relationship between BMD and age, separately for men and women. Use different colors for men and women in the plot.

(d) What is the age slope for men? Include a 95% CI and p-value. Does BMD decline with age for men?

(e) What is the age slope for women? Include a 95% CI and p-value. Does BMD decline with age for women?

The BMD variable was computed as the minimum T-score of the DEXA scans at three locations for each individual (lumbosacral spine, left femur, right femur). The BMD from the individual locations are available in the dataset.

(f) Are there any systematic differences in mean BMD between the three DEXA scan locations? Produce an appropriate graph. Demonstrate the pairwise differences statistically, if there are any.

(g) Is a correction for multiple comparison appropriate in the context of this analysis comparing the BMD at the three DEXA scan locations? If so, does this affect your results?

(h) Instead of calculating BMD as the minimum of the three DEXA measurements (analysis A), another approach is using the first principal component without centering (analysis B), while a third approach is using the average of the three measurements (analysis C). Compare these three BMD definitions graphically and quantitatively in terms of their values, the weights applied to the three individual scan locations, and interpretability. Do all three BMD definitions have the same units of measure?

2. Problem 2: Osteoporosis (Continued)

Please be sure you answer the questions which ask you to ’summarize’ or ’interpret’ or ’discuss briefly’- please demonstrate your ability to communicate what you have done and what it means!! Credit will be given for all reasonable answers, even if not exactly as intended. :)

We use the data from the Osteoporosis study to conduct a brief study of the association of alcohol use with a diagnosis of osteoporosis.

We consider the biological variables age, sex and BMI, because they are well-known to be strongly associated with osteoporosis (younger age has lower risk of osteoporosis, low BMI has higher risk of osteoporosis because of associated hormonal changes). We also consider the dietary factors calcium intake and vitamin D intake (low is bad) and the modifiable health behaviors alcohol consumption (presumably bad), exercise (presumably good), and smoking (presumably bad). The varables are listed below for your convenience.

(a) First, we investigate the association of alcohol use with osteoporosis, without adjusting for any other variables. Write a short paragraph that summarizes the number of alcohol drinkers and non-alcohol drinkers in the study; the prevalence of osteoporosis among the drinkers and the non-drinkers; and gives the difference in risk of osteoporosis between alcohol drinkers and non-drinkers. Include confidence intervals where appropriate. Does this difference in risk demonstrate that alcohol consumption leads to a reduction in the risk of osteoporosis?

(b) Next, we investigate multivariate models which study the joint effects of predictors on the risk of osteoporosis. Create appropriate factors as defined below, and set the reference level for all predictors to the presumed or known low-risk category. Please use categorical versions of the variables for all of Question 2.

• Age60, an indicator for age 60 years or older

• LowBMI, an indicator of BMI < 25 (i.e. not overweight)

• LowCalc, an indicator of dietary calcium < 500 mg

• LowD, an indicator of vitamin D intake < 600 iu

Fit a model with only the known biological predictors of age (categorical version), sex, and BMI (cate-gorical version). (Main effects only- let’s keep it simple! :) ). Do these variables appear to be strongly associated with a diagnosis of osteoporosis, as expected? Explain briefly.

(c) Now we investigate the association of alcohol use with osteoporosis, adjusting for potential confounding factors. We consider the known risk factors age, sex, and BMI, and the potential confounders of dietary calcium, dietary vitamin d, smoking, and exercise. Before you begin your analysis, describe your model selection strategy (Main effects only- keeping it simple! :) ). Tell me why you don’t recommend allowing alcohol to be considered for inclusion/exclusion as part of any variable selection algorithm.

(d) Now, carry out your model selection strategy above. Call the result Model 1. Present Model 1 as a table of estimated coefficients on the odds ratio scale, with confidence intervals and p-values. Briefly summarize your results in a short paragraph. Be sure to report your main conclusion regarding alcohol use and risk of osteoporosis. (To keep the exam simple, do **NOT** present model diagnostics! In fact, the model fits pretty well.)

(e) Next we explore whether Model 1 has enough information to usefully distinguish between high risk and low risk subjects, using the easily observed variables which are included in the model (age, sex, BMI, smoking, etc.).

Use Model 1 to compute the estimated risk of osteoporosis, with appropriate confidence interval, for an hypothetical extremely high risk subject (e.g. an older woman smoker, non-drinker, with low BMI and low calcium) and for an hypothetical extremely low risk subject. Then compute the range and the quartiles of estimated risks for subjects actually observed in the data. Why do the hypothetical and the observed ranges differ? Do you think this model could potentially be useful in identifying people at high risk and at low risk of osteoporosis, for a similar population of patients ? Discuss briefly.

(f) The association of alcohol use with lower osteoporosis risk may be surprising. However, in many obser-vational studies of health outcomes, moderate alcohol use is associated with better outcomes. On the one hand, in some settings (eg red wine and heart disease) some people argue that there is a real causal effect, and on the other hand we know that there are often systematic differences between drinkers and non-drinkers. This motivates us to conduct further analysis.

Recalling a result from Problem 1, add the interaction of age and sex to Model 1, and call the result Model 2. Compare Model 1 and Model2. Does this analyses increase your confidence that the apparent protective effect of alcohol use is real? Why might this be considered an exploratory analysis, rather than the primary analysis in your study? Explain in a few sentences.

(g) Briefly summarize your analysis, and discuss the extent to which these data provide evidence for a protective effect of drinking alcohol for osteoporosis prevention. Be clear, professional, and quantitative in your answer.

Figure 1: Mean composite driving score over time, for each treatment arm (and 95% confidence intervals).

3. Problem 3: Driving Miss Mary Jane

The file Driving.csv includes data from a double-blind, placebo-controlled parallel randomized clinical trial conducted at the UCSD Center for Medicinal Cannabis Research, aiming to determine effect of cannabis on driving performance. A total of N = 190 cannabis users were asked to smoke at least 4 puffs from a cigarette containing either placebo, low dose THC, or high dose THC, according to the randomly assigned treatment arm. The participants completed computer-based driving simulations pre-smoking (baseline, 0 minutes) and at 4 timepoints after smoking: 0.5 hours (h), 1.5h, 3.5h, and 4.5h.

The outcome is composite driving score (CDS), measuring the driver’s overall performance, with higher values indicating worse performance. CDS is a standardized score with no units of measure. A score of 0 reflects average driving ability.

The given dataset is in the long format. The variables in the dataset are:

• pid: participant ID

• treatment (3 levels): placebo, low THC, or high THC

• THC : indicator of THC-containing treatment, 0 if placebo, 1 if low or high THC dose

• time_min: time since smoking in minutes (0 is pre-smoking)

• CDS: composite drive score

• frequent_user : 0 if current cannabis use < 4 times/week, 1 if ≥ 4 times/week

• age: participant’s age in years

• education: participants’s years of education

• gender : Male or Female

• miles_past_year : estimated self-reported number of miles participant drove in past year

(a) Add two additional time-related variables: occasion, a categorical version of the time variable; and an index variable that indexes the measurement time points in increasing order, from 1 (baseline), 2 (30 minutes), . . . , to 5 (270 minutes post-smoking).

Produce a boxplot of CDS as a function of occasion. Is the response distribution approximately normal?

Is there a data transformation that is obviously necessary?

(b) Consider modeling the CDS as a function of time using a longitudinal general linear model. What is a good choice of covariance structure for this general linear model? Use the chosen covariance structure in subsequent modeling.

(c) Choose an appropriate mean model for this randomized clinical trial: consider a response profile model (RPM), and parametric time models with either linear, quadratic, or cubic time effects. Discuss whether there is any advantage in this case in using a parametric model for time, instead of a response profile model.

(d) Based on the RPM, test whether overall there are any differences in the response profiles of the three study arms.

(e) The test at the previous question confirms what Figure 1 suggests: overall, there are significant differ-ences in the trajectories of the three groups.

Follow-up this analysis by testing for pairwise differences in the CDS trajectories between the three treatment arms. Are the results consistent again with what Figure 1 suggests? For each comparison state the null hypothesis in terms of the vector of mean coefficients β.

(f) Interestingly, in Figure 1 the low THC group has apparently worse driving scores than the high THC group. One hypothetical explanation is that the participants in the high THC group can feel the strong THC and do not smoke the entire cigarette. In any case, the previous analysis shows no significant difference in driving scores between the two THC groups. This suggests combining the two THC groups into a single group (see the variable THC in the dataset).

Fit an appropriate response profile model for the trajectories of the two groups over time: THC and placebo, and show that these trajectories differ.

(g) Following up on the previous question, estimate the causal effect of smoking THC on driving at each time point. Include 95% confidence intervals and p-values.

(h) Are the effects of smoking THC on driving large? To put this in perspective, in a cross-sectional study a Cohen’s d effect size d = 0.5 is considered moderate, and d = 0.8 is considered large.

Finally, summarize in a few sentences the study findings regarding the effects of smoking cannabis on driving abilities.




版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp