联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-18 11:05

STAT 3701 Homework 3

Show all work. Submit your solutions in a pdf document on Canvas. Include your R code (which must be

commented and properly indented) in the pdf file. Also submit one text file with all your R code (comments

and all) clearly labeled with the problem it goes with. This must be properly indented. Before every solution

with random sampling use set.seed(3701).

Question 1 (15 points).

Let’s consider a slightly different data generating model for paired data. Suppose we want to compare

students scores of a standardized test before and after taking a prep course. There are n students that will be

taking the test. Let X be the yet-to-be observed random variable of the score before the course and Y be the

yet-to-be observed random variable of the score after the course. Moreover, we let X and Y be the linear

combinations of two common latent variable, i.e., the model for the ith student is Xi = µ1 + a1Ai + b1Bi

Yi = µ2 + a2Ai + b2Bi,

where µj , aj , bj , j = 1, 2 are constants. Ai

, Bi are independent both within the same subject and across

different subjects. Ai’s are iid ∼ N(0, σ2A), Bi

is are iid N(0, σ2B).

(a) (4 points) What are the distributions of Xi and Yi

, respectively? If we let Zi = Xi −Yi

, does this data

generating model still satisfy the assumptions for paired t-test? Why?

(b) (5 points) Let µ1 = 68, µ2 = 70, σ2

A = 2, σ2

B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2. It can

be show that E[X] = 68, E[Y ] = 70. Use simulation to produce a 99% confidence interval for

Var(X) = E[(X−68)2

] and a 99% confidence interval for Var(Y ) = E[(Y −70)2

]. Set reps = 105

.

(c) (3 points) Now we are interested in paired t-test for

H0 : µ1 = µ2

Ha : µ1 ̸= µ2,

with data assumed to be coming from our data generating model. Create a function named mypaired.pval

that generate realizations of the random p-value of this test. Your function should have the following

arguments

• mu1, mean of X

• mu2, mean of Y

• a1,b1,a2,b2 as defined in our data generating model

• sigmaA, standard deviation of A

• sigmaAB, standard deviation of B

1

• n, sample size (number of pairs)

• reps, number of replications.

The function should output a vector containing the realizations of the random p-value.

(d) (4 points) Use µ1 = 68, µ2 = 68.5, σ2

A = 2, σ2

B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2, α = 0.05. Set

reps = 1000. Test your function by creating a simulation based estimation of the power curve with

the sample size n ∈ {20, 25, . . . , 200}. Produce a pointwise 95% confidence interval for the power

curve with the Clopper-Peason CI and include the curves for the lower bound and upper bound of CI

in the same plot of the estimated power curve.

Question 2 (15 points)

For two-independent-samples model, we assume that Xi’s are iid N(µ1, σ2

) for i = 1, . . . , n1, Yj ’s are iid

N(µ2, σ2

) for j = 1, . . . , n2 and all observations are independent. We provided the 100(1−α)% confidence

interval for µ1 − µ2 as

X¯ − Y¯ ± t1−α/2,n1+n2−2Sp

In this question, we are interested in simultaneous confidence intervals for both µ1 and µ2, i.e., instead of

a CI for µ1 − µ2, we want one CI [L1, U1] for µ1 and another CI [L2, U2] for µ2 such that [L1, U1] and

[L2, U2] should capture the corresponding means simultaneously with probability larger than 1 − α. Here

L1, L2, U1, U2 are some lower bounds and upper bounds calculated using the data and 1−α is our significance

level.

(a) (3 points) A very natural (but could be wrong potentially) idea would be

[L1, U1] = X¯ ± t1−α/2,n1−1

is the sample variance for {Xi} and S

is the sample variance for {Yi}. Calculate

the coverage probability for these simultaneous CIs, that is, calculate

(b) (4 points) Let µ1 = 68, µ2 = 70, σ = 3, n1 = n2 = 20, α = 0.05. Use simulation to estimate the

coverage probability defined in (a). Set reps = 105

. Create a 95% conservative CI for this coverage

probability. Is 1 − α in this CI?

(c) (4 points) Now consider a Bonferroni corrected simultaneous confidence interval,

Show that this set of simultaneous CIs has a coverage probability larger than 1 − α. You may want

apply the Bonfferoni inequality P(A ∩ B) ≥ 1 − (1 − P(A)) − (1 − P(B)).

·2·

(d) (4 points) Let µ1 = 68, µ2 = 70, σ = 3, n1 = n2 = 20, α = 0.05. Use simulation to estimate the

coverage probability defined in (a). Set reps = 105

. Create a 95% conservative CI for this coverage

probability.

Question 3 (20 points)

We have explored two sample procedures for the means, but in this problem we will write a function to do

these procedures for proportions. Consider two binomial random variables V and W. More specifically,

V ∼ Binom(n1, θ1) and W ∼ Binom(n2, θ2), where θ1, θ2 ∈ (0, 1). We are interested in testing

H0 : θ1 = θ2

Ha : θ1 ̸= θ2.

Let θb1 = V /n1, θb2 = W/n2 and θb = (V + W)/(n1 + n2). Then the test statistic for the test is

Under the null hypothesis, T ∼ N(0, 1) approximately.

(a) (5 points) Let θ1 = θ2 = 0.4, use simulation to evaluate the distribution of T for the cases

• n1 = n2 = 15

• n1 = n2 = 100

In each case, create a QQ-plot to compare the distribution of T with N(0, 1) and comment on the plot.

Set reps = 10000. You may use qnorm function when calculating the percentiles of the standard

normal distribution. Comment on the plots. You are not allowed to use rbinom to generate from

binomial distribution.

(b) (5 points) Besides the test, we can also provide an approximate 100(1 − α)% CI for θ1 − θ2 using the

formula

Create an R function myprop.test that generates the data, calculate the test statistic T and then

output the p-value of the test and the CI for θ1 −θ2. The function should take the following arguments

• n1, number of trials for V

• n2, number of trials for W

• theta1, success probability for V

• theta2, success probability for W

• alpha, significance level

• reps, number of replications

and the function outputs a list with three components:

·3·

• pval, vector of p-values of each replication

• upper, vector of upper bounds of CIs in each replication

• lower, vector of lower bounds of CIs in each replication

(c) (5 points) Test your function by estimating the coverage probability of the CI when θ1 = 0.6, θ2 =

0.4, n1 = n2 = 20, α = 0.05. Use reps=105

. Create a 95% conservative CI for the coverage

probability. Does it contain 1 − α?

(d) (5 points) Use your function to create a power curve when we increase the distance between θ1

and θ2. More specifically, let n1 = n2 = 100, θ2 = 0.52, α = 0.05 and θ1 takes values in

{0.52, 0.525, . . . , 0.680}. Set reps = 1000. Make a plot of the estimated power curve and also

include the lower and upper bounds of a 95% Clopper-Pearson confidence interval using dashed lines.

·4·


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp