
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-03-22 09:05

Statistics 1 assignment - 2019

March 11, 2019

This computer practical counts 10% towards your final mark and is due

on Friday 22nd March by 5pm.

It should be handed in in the dedicated blue box “Probability and Statistics”

by the entrance of the main building.

Do get started on week 19 and go to the drop in session in the computer

lab in order to get help.

You should use R Markdown for your code, output and associated comments

and print the corresponding pdf file. Remember to make clear which

question you are answering and include your name at the beginning of the


Use pen and paper to answer the questions not involving code or numerical


Stapple the two documents together and make sure that your name appears

clearly on the first page.

In Chapter 2 we have seen how a QQplot

or a probability plot can be useful

to assess whether a sample is distributed according to a specific probability

distribution. Although useful we would like to complete this graphical method

with a statistical hypothesis test which would lead to a more objective and

principled decision. Numerous tests have been proposed in the literature (in

particular in order to test normality) and we focus here on the Anderson-Darling

test. For an observed sample x1,...,xn the Anderson-Darling (AD) test statistic

is given by

T(x1,...,xn) = nZ +1


FX(y; ))2

FX(y; )(1

FX(y; ))fX(y; )dy,

where FX(y) is the hypothesised cumulative distribution for the data, fX(y) is

the corresponding probability density and

Fn(y) = #{i 2 {1,...,n}: xi y}n ,

is the empirical distribution function of the observed sample.


1. (2 marks) State, in words and at most two sentences, the null and alternative

hypotheses in the present scenario.

2. (3 marks) Briefly explain why the AD statistic may be useful to achieve

our goal? In particular briefly comment on the roles played by the three

terms, (Fn(y; )

FX(y; ))2, FX(y; )(1

FX(y; )) and fX(y; ).

3. (2 marks) Describe the form of a critical region, give the theoretical

formula for the type I error and the theoretical formula for the pvalue

for this test and an observed statistics tobs. You should precisely state

the probability distribution of any random variable you may use and can

assume ? to be known.

While the expression above leads to an intuitive interpretation of what the

statistic can achieve, a more useful expression is given by

T(x1,...,xn) = n,

where x(1), x(2),...,x(n) is the order statistic of the sample, as defined in Chapter

1. Most often is unknown and must be estimated from the observed

sample and tobs can then be computed. From now on assume that we want to

test whether a sample is drawn from a normal distribution. The two datasets

x1 and x2 referred to below can be downloaded using


4. (4 marks) Write a function compute.ad.test(xs) which takes in a vector

of observations xs and returns the Anderson-Darling statistics. You

should test your function on the two datasets x1 and x2.

[Hint: the ad.test function in the nortest R library (which is not installed

by default), may be a source of inspiration for your code and may

be used to check that your own code produces plausible values (you will

not get marks for using it but some of you may find it useful/reassuring).

You can see the code of the function by simply typing ad.test. Note that

ad.test renormalizes the data and that you should not do this here.]

To complete the statistical procedure we require computing the pvalue.


when is assumed known, the distribution of T(X1, X2,...,Xn) under the null

hypothesis is not tractable and it is unlikely that it will be when ? is estimated.

The numerical method below works in both scenarios.

5. (3 marks) Write pseudo-code describing an algorithm, based on simulation

and similar to the procedure used in Section 4.3 of the lecture notes

to compare the sampling distributions of three estimators, to compute the


for an observed statistics tobs.

6. (3 marks) Write the R code corresponding to your pseudo-code to compute

the pvalues

corresponding to x1 and x2, assuming that the empirical

mean and variance are used to estimate ?. For each of x1 and x2


plot the histogram of the simulated statistics and draw a vertical line for

the position of the observed test statistic and on separate graphs plot the

corresponding QQ-plots (you may use the functions qqnorm and qqline).


The approach is also often referred to as a Monte Carlo method. Note that

statistical tables and approximate formulae have been constructed and derived

for this test: as indicated in [Stephens 1974] these are based on Monte Carlo

simulations. Such approximate formulae, are used in the ad.test function in

the nortest R library.

7. (3 marks) Explain in a few lines how you would adapt your code in order

to test whether a sample is sampled from an exponential distribution.

What is your conclusion about the generality of the approach?

[1] Stephens, M. A. “EDF Statistics for Goodness of Fit and Some Comparisons.”

Journal of the American Statistical Association 69, no. 347 (1974):

730-37. doi:10.2307/2286009.

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图
