联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-03-14 10:56

Assignment #3 STA355H1S

due Friday March 22, 2019

Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF files

only) – the deadline is 11:59pm on March 22. You are strongly encouraged to do problems

3 through 6 but these are not to be submitted for grading.

Problems to hand in:

1. Suppose that D1, · · · , Dn are random directions – we can think of these random variables

as coming from a distribution on the unit circle {(x, y) : x

2 + y

2 = 1} and represent each

observation as an angle so that D1, · · · , Dn come from a distribution on [0, 2π).

(Directional or circular data, which indicate direction or cyclical time, can be of great interest

to biologists, geographers, geologists, and social scientists. The defining characteristic of such

data is that the beginning and end of their scales meet so, for example, a direction of 5 degrees

is closer to 355 degrees than it is to 40 degrees.)

A simple family of distributions for these circular data is the von Mises distribution whose

density on [0, 2π) is

f(θ; κ, μ) = 1

2πI0(κ)

exp(κ cos(θ μ))

where 0 ≤ μ < 2π, κ ≥ 0, and I0(κ) is a 0-th order modified Bessel function of the first kind:

I0(κ) = 12πZ 2π0

exp[κ cos(θ)] dθ.

(In R, I0(κ) can be evaluated by besselI(kappa,0).) Note that when κ = 0, we have a

uniform distribution on [0, 2π) (independent of μ).

(a) Show that the MLE of μ satisfies

cos(μbXni=1

sin(Di) sin(μb)Xni=1

cos(Di) = 0

and give an explicit formula for μb in terms of Xni=1

sin(Di) and Xni=1

cos(Di). (Hint: Use the

formula sin(x y) = sin(x) cos(y) cos(x) sin(y). Also verify that your estimator actually

maximizes the likelihood function.)

(b) Suppose that we put the following prior density on (κ, μ):

π(κ, μ) = λ2π

exp(λκ) for κ > 0 and 0 ≤ μ < 2π.

Show that the posterior (marginal) density of κ is

π(κ|d1, · · · , dn) = c(d1, · · · , dn)

exp(λκ)I0(rκ)

[I0(κ)]n

where

(Hint: To get the marginal posterior density of κ, you need to integrate the joint posterior

over μ (from 0 to 2π). The trick is to write

Xni=1

cos(di μ) = r cos(θ μ)

for some θ.)

(c) The file bees.txt contains dance directions of 279 honey bees viewing a zenith patch

of artificially polarized light. (The data are given in degrees; you should convert them to

radians. The original data were rounded to the nearest 10o

; the data in the file have been

“jittered” to make them less discrete.) Using these data, compute the posterior density of

κ in part (b) for λ = 1 and λ = 0.1. How do these two posterior densities differ? Do these

posterior densities “rule out” the possibility that κ = 0 (i.e. a uniform distribution)?

Note: To compute the posterior density, you will need to compute the normalizing constant

– on Quercus, I will give some suggestions on how to do this. A simple estimate of κ is

where r is as defined in part (b). The posterior density should be largest for values of κ

close to κb; you can use this as a guide to determine for what range of values of κ you should

compute the posterior density.

(d) [Bonus] The model that we’ve used to this point assumes that κ is strictly positive

and thus rules out the case where κ = 0 (which implies that D1, · · · , Dn are uniformly

distributed). We can actually address the general model (i.e. κ ≥ 0) by putting a prior

probability θ on the model κ = 0 (which implies that we put a prior probability 1 ? θ on

κ > 0). The prior distribution on the parameter space {(κ, μ) : κ ≥ 0, 0 ≤ μ < 2π} has

a point mass of θ at κ = 0 and the prior density over {(κ, μ) : κ > 0, 0 ≤ μ < 2π} is

(1 ? θ)π(κ, μ). (This is a simple example of a “spike-and-slab” prior, which are used in

Bayesian model selection.) The posterior probability that κ = 0 is then

Prob(κ = 0|d1, · · · , dn) = θ(2π)

nθ(2π)n + (1 θ)Z ∞0Z 2π0

L(κ, μ)π(κ, μ) dμ dκ

Using the prior π(κ, μ) in part (b) with λ = 1, evaluate Prob(κ = 0|d1, · · · , dn) for θ =

0.1, 0.2, 0.3, · · · , 0.9. (This is easier than it looks as you’ve done much of the work in part

(c).)

2. Suppose that F is a continuous distribution with density f. The mode of the distribution

is defined to be the maximizer of the density function. In some applications (for example,

when the data are “contaminated” with outliers), the centre of the distribution is better

described by the mode than by the mean or median; however, unlike the mean and median,

the mode turns out to be a difficult parameter to estimate. There is a document on Quercus

that discusses a few of the various methods for estimating the mode.

The τ -shorth is the shortest interval that contains at least a fraction τ of the observations

X1, · · · , Xn; it will have the form [X(a)

, X(b)

] where b a + 1 ≥ τn. In this problem, we will

consider estimating the mode using the sample mean of values lying in the 1/2-shorth – we

will call this estimator the half-shorth-mean. The half-shorth-mean is similar to a trimmed

mean but different: For the trimmed mean, we trim the r smallest and r largest observations

and average the remaining n2r observations; the τ -shorth excludes approximately (1τ )n

observations but the trimming is not necessarily symmetric.

(a) For the Old Faithful data considered in Assignment #2, compute the half-shorth-mean,

using the function shorth, which is available on Quercus (in a file shorth.txt). Does this

estimate make sense?

(b) Using Monte Carlo simulation, compare the distribution of the half-shorth-mean to that

of the sample mean for normally distributed data for sample sizes n = 50, 100, 500, 1000, 5000.

On Quercus, I have provided some simple code that you may want to modify. We know the

variance of the sample mean is σ

2/n (where σ2

is the variance of the observations). If we

assume that the variance of the half-shorth-mean is approximately a/nγ for some a > 0 and

γ > 0, use the results of your simulation to estimate the value of a and γ.

(Hint: If Var( d μbn) is a Monte Carlo estimate of the variance of the half-shorth-mean for a

sample size n, then

ln(Var( d μbn)) ≈ ln(a) ? γ ln(n),

which should allow you to estimate a and γ.)

(c) It can be shown that Xˉ

n is independent of μbn Xˉ

n where μbn is the half-shorth-mean.

(This is a property of the sample mean of the normal distribution – you will learn this in

STA452/453.) Show that we can use this fact to estimate the density function of μbn from

the Monte Carlo data by the “kernel” estimator

where σ

2 = Var(Xi), φ(t) is the N (0, 1) density, and D1, · · · , DN are the values of μbn ?

n from the Monte Carlo simulation. Use the function density (with the appropriate

bandwidth) to give a density estimate for the half-shorth-mean for n = 100.

Supplemental problems (not to hand in):

3. Suppose that X1, · · · , Xn are independent random variables with density or mass function

f(x; θ) and suppose that we estimate θ using the maximum likelihood estimator bθ; we

estimate its standard error using the observed Fisher information estimator

(x; θ), `00(x; θ) are the first two partial derivatives of ln f(x; θ) with respect to θ.

Alternatively, we could use the jackknife to estimate the standard error of bθ; if our model

is correct then we would expect (hope) that the two estimates are similar. In order to

investigate this, we need to be able to get a good approximation to the “leave-one-out”

estimators {bθi}.

(a) Show that bθi satisfies the equation

(b) Expand the right hand side in (a), in a Taylor series around bθ to show that

(You should try to think about the magnitude of the approximation error but a rigorous

proof is not required.)

(c) Use the results of part (b) to derive an approximation for the jackknife estimator of the

standard error. Comment on the differences between the two estimators - in particular, why

is there a difference? (Hint: What type of model – parametric or non-parametric – are we

assuming for the two standard error estimators?)

(d) For the air conditioning data considered in Assignment #1, compute the two standard

error estimates for the parameter λ in the Exponential model (f(x; λ) = λ exp(λx) for

x ≥ 0). Do these two estimates tell you anything about how well the Exponential model fits

the data?

4. Suppose that X1, · · · , Xn are independent continuous random variables with density

f(x; θ) where θ is real-valued. We are often not able to observe the Xi’s exactly rather only

if they belong to some region Bk (k = 1, · · · , m); an example of this is interval censoring in

survival analysis where we are unable to observe an exact failure time but know that the

failure occurs in some finite time interval. Intuitively, we should be able to estimate θ more

efficiently with the actual values of {Xi}; in this problem, we will show that this is true (at

least) for MLEs.

Assume that B1, · · · , Bm are disjoint sets such that P(Xi ∈ ∪m

k=1Bk) = 1. Define independent

discrete random variables Y1, · · · , Yn where Yi = k if Xi ∈ Bk; the probability mass function

of Yi is

p(k; θ) = Pθ(Xi ∈ Bk) = Z

x∈Bk

f(x; θ) dx for k = 1, · · · , m.

Also define

IX(θ) = Varθ

f(x; θ) dx

IY (θ) = Varθθ ln p(k; θ)

#2

p(k; θ).

Under the standard MLE regularly conditions, the MLE of θ based on X1, · · · , Xn will have

variance approximately 1/{nIX(θ)} while the MLE based on Y1, · · · , Yn will have variance

approximately 1/{nIY (θ)}.

(a) Assume the usual regularity conditions for f(x; θ), in particular, that f(x; θ) can be

differentiated with respect to θ inside integral signs with impunity! Show that IX(θ) ≥ IY (θ)

and indicate under what conditions there will be strict inequality.

Hints: (i) f(x; θ)/p(k; θ) is a density function on Bk.

(ii) For any function g,

(iii) For any random variable U, E(U

2

) ≥ [E(U)]2 with strict inequality unless U is constant.

(b) Under what conditions on B1, · · · , Bm will IX(θ) ≈ IY (θ)?

5. In seismology, the Gutenberg-Richter law states that, in a given region, the number of

earthquakes N greater than a certain magnitude m satisfies the relationship

log10(N) = a b × m

for some constants a and b; the parameter b is called the b-value and characterizes the seismic

activity in a region. The Gutenberg-Richter law can be used to predict the probability of

large earthquakes although this is a very crude instrument. On Blackboard, there is a file

containing earthquakes magnitudes for 433 earthquakes in California of magnitude (rounded

to the nearest tenth) of 5.0 and greater from 1932–1992.

(a) If we have earthquakes of (exact) magnitudes M1, · · · , Mn greater than some known m0,

the Gutenberg-Richter law suggests that M1, · · · , Mn can be modeled as independent random

variables with density

f(x; β) = β exp(?β(x ? m0)) for x ≥ m0.

where β = b × ln(10). However, if the magnitudes are rounded to the nearest δ then they

are effectively discrete random variables taking values xk = m0 + δ/2 + kδ for k = 0, 1, 2, · · ·

with probability mass function

p(xk; β) = P(m0 + kδ ≤ M < m0 + (k + 1)δ)

= exp(βkδ) exp(β(k + 1)δ)

= exp(β(xk m0 δ/2)) {1 exp(?βδ)} for k = 0, 1, 2, · · ·.

If X1, · · · , Xn are the rounded magnitudes, find the MLE of β. (There is a closed-form

expression for the MLE in terms of the sample mean of X1, · · · , Xn.)

(b) Compute the MLE of β for the earthquake data (using m0 = 4.95 and δ = 0.1) as well

as estimates of its standard error using (i) the Fisher information and (ii) the jackknife. Use

these to construct approximate 95% confidence intervals for β. How similar are the intervals?

6. In genetics, the Hardy-Weinberg equilibrium model characterizes the distributions of

genotype frequencies in populations that are not evolving and is a fundamental model of

population genetics. In particular, for a genetic locus with two alleles A and a, the frequencies

of the genotypes AA, Aa, and aa are

Pθ(genotype = AA) = θ2, Pθ(genotype = Aa) = 2θ(1?θ), and Pθ(genotype = aa) = (1?θ)2

where θ, the frequency of the allele A in the population, is unknown.

In a sample of n individuals, suppose we observe XAA = x1, XAa = x2, and Xaa = x3

individuals with genotypes AA, Aa, and aa, respectively. Then the likelihood function is

L(θ) = {θ2}

x1 {2θ(1 θ)}

x2 {(1  θ)2}x3.

(The likelihood function follows from the fact that we can write

Pθ(genotype = g) = {θ2}

I(g=AA)

{2θ(1 θ)}

I(g=Aa)

{(1 θ)2}I(g=aa);

multiplying these together over the n individuals in the sample gives the likelihood function.)

(a) Find the MLE of θ and give an estimator of its standard error using the observed Fisher

information.

(b) A useful family of prior distributions for θ is the Beta family:

π(θ) = Γ(α + β)

Γ(α)Γ(β)1

(1 θ)β1

for 0 ≤ θ ≤ 1

where α > 0 and β > 0 are hyperparameters. What is the posterior distribution of θ given

XAA = x1, XAa = x2, and Xaa = x3?


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp