联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2020-10-20 10:53

Assignment 2

In 2009, the state of North Carolina released to the public a large data set containing

information on births recorded in this state. This data set has been of interest to medical

researchers who are studying the relation between habits and practices of expectant

mothers and the birth of their children.

In this assignment, we will focus on studying how smoking affects the birthweight of a

newborn infant. Instead of providing the entire data set, we will work with a sample of

1,936 observations. The data set is available on Blackboard (2009Births). The following

variables were recorded:

Bmonth Birth month

Bday Birth day of the month

Gender Gender of baby

Fage Father’s age (years)

Mage Mom’s age (years)

Feduc Father’s education (years)

Meduc Mother’s education (years)

TotPreg Total number of pregnancies (number of pregnancies including current)

Visits Pre-delivery doctor visits

Marital Marital status (0=married, 1=unmarried)

Hispmom Hispanic mom

Hispdad Hispanic dad

Smokes Mom’s smoking habits (0=nonsmokers, 1=smokers)

BirthWeight Weight of baby at birth (grams)

a) What is the treatment variable? What is the outcome variable? How many covariates

are involved in the dataset? Is this study a randomize experiment or an

observational study?

b) Let us first visualize univariate balance of the data. Compare the histograms of

the variable ”Meduc” in the treated and control group. You may want to use

par(mfrow=c(2,1)) in R to stack the two histograms to make a clear comparison.

Also, the two histograms should have the same range on the values of ”Meduc” to

be comparable. What can you conclude from the comparison of the two histograms?

Consider such comparison for the variables ”Bmonth”, ”Mage”, ”Hispmom” separately.

What can you conclude for those variables?

c) Use a table to show some measures of balance on all covariates (all variables that

are not the treatment assignment or the outcome vectors). The measures should

1

include: 1. the mean and log standard deviations of each covariate in the treated

and control group; 2. the normalized difference (?); 3. the log ratio of standard

deviations (Γ); and 4. ?π

0.05

c and ?π

0.05

t

. These measures are exactly the same as the

those in Table 14.4 on the textbook and our lecture notes. What can you conclude

from the table?

d) Now consider the estimation of the propensity score. Start with basic covariates

”Marital” and ”Meduc”, use logistic regression and likelihood ratio test to select

important linear terms of the covariates for the estimation of the propensity score.

What are your selected covariates?

e) For the covariates you selected in (d), consider their second-order terms (pure

quadratic terms and interactions). Which terms are significant for the estimation of

the propensity score?

f) Now use the significant terms you selected in (d) and (e) to estimate the propensity

score via logistic regression. Provide two histograms to show the distribution of the

linearized propensity scores in the control and treated groups. From the comparison

of the two histograms, do you think the data generally have good balance between

the treated and control groups?

g) What are et = mini:Wi=1 e?(Xi) and ec = maxi:Wi=0 e?(Xi)? Trim off the units with

estimated propensity scores less than et or greater than ec. Name the trimmed data

as trimmed.dat in your code. How many control and treated units are included in

the trimmed data? Provide two histograms to show the distribution of the linearized

propensity scores in the control and treated groups in the trimmed data.

h) Use the iterative blocking method provided in class to block the trimmed data. How

many blocks do you obtain?

i) Use a table to show the comparison of the values of normalize difference (?) for

the full data, the trimmed data, and the blocked data (similar to Table 17.1 on the

textbook and lecture slides). Does trimming or blocking improve the balance of the

data?

j) For you blocked data obtained in (h),

(j)[1/Nc(j) + 1/Nt(j)]

in a Q-Q plot, where j denotes the jth block and k represents the kth covariate.

k) Provide the minimum, maximum, and standard deviation of the weights in the

Horvitz-Thompson and subclassification estimators for the blocked data. You can

arrange your results in a table like Table 17.8 on the textbook and lecture notes.

l) Provide the estimation of treatment effect using the Horvitz-Thompson and subclassification

estimators. Conduct a hypothesis test based on the subclassification

estimator and conclude whether smoking affects the birthweight of a newborn infant.

2

m) Provide the estimated bias, sampling variance, and MSE for the Horvitz-Thompson

and subclassification estimators for the blocked data. You can arrange your results

in a table like Table 17.10 on the textbook and lecture notes.

3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp