联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2022-02-26 10:11

Generalized Linear Models MATH 523

McGill University, Winter Term 2022

Assignment 2 due on February 16 at noon.

Q1 Lecture 5a

Consider a Poisson GLM with the log link and linear predictor of the form

ηi = β1 + β2ai, i ∈ {1, . . . n},

where ai is the value of a factor predictor with two levels, such that ai = 1 for

i ∈ {1, . . . n1} and ai = 0 for i ∈ {n1 + 1, . . . , n}.

Suppose that at the beginning of the t-th iteration of the Fisher Scoring algorithm

(formulated as iterative reweighted least squares), we get

β(t+1) = (β

(t+1)

1 , β

(t+1)

2 ) = (log yˉ2, log yˉ1 ? log yˉ2),

where yˉ1 = 1n1

∑n1

i=1 yi, and yˉ2 =

1

n2

∑n

i=n1+1

yi.

(1) Calculate the remaining part of the iteration step of the algorithm: η(t+1), μ(t+1),

z(t+1), W (t+1), D(t+1), and u(t+1).

(2) Does the algorithm terminate after this iteration? Justify your answer.

(3) Did the algorithm find the exact solution after this iteration? Justify your answer.

Q2 Lecture 6a

Consider the Gamma GLM (viz. page 2 of Lecture 3a) with a linear predictor of the

form

ηi = β1 + β2xi,

where xi is the value of a continuous predictor corresponding to the ith response.

(1) Calculate the standard error of β?1 using (a) the reciprocal link (g(μ) = 1/μ) and

(b) the identity link (g(μ) = μ).

(2) Calculate the deviance. Does it depend on the link function? Explain.

Q3 R excercise

Consider the data from the German General Social Survey on the number of children.

This data set contains 3548 observations on the following 6 variables: child (num-

ber of children), age (age of the woman in years), dur (years of education), nation

(nationality of the woman; 0 = German, 1 = otherwise), god (Belief of the woman

in God: 1 = Strong agreement, 2 = Agreement 3 = No definite opinion, 4 = Rather

no agreement, 5= No agreement at all 6= Never thougt about it), and univ (whether

the woman visited university: 0 = no, 1 = yes).

The dataset is available in the catdata library in R and can be loaded as follows (after

having installed the catdata library):

Johanna G. Ne?lehová

Generalized Linear Models MATH 523

McGill University, Winter Term 2022

Assignment 2 due on February 16 at noon.

library(catdata)

## Loading required package: MASS

data(children)

attach(children)

head(children)

## child age dur nation god univ

## 6 2 33 9 0 6 0

## 10 2 80 7 0 1 0

## 11 1 63 8 0 1 0

## 12 2 82 7 0 1 0

## 13 2 49 8 0 1 0

## 14 1 54 9 0 5 0

The data can be used to investigate the effect of age, dur, nation, god and univ on

the number of children. Therefore, we will treat child as a response in the analysis

below.

(1) Fit a Poisson GLM with the canonical link to the data with child as a response,

and age, dur, nation, god and univ as main effects but no interactions and

display the summary of the fit.

(2) List the predictors of the model in part (1). For each age, dur, nation, god

and univ, decide whether it is a factor or a continuous predictor and determine

whether it is significant at the 5% level using appropriate Wald tests.

(3) Using the model fitted in part (1), estimate the expected number of children

of a German woman aged 44 with 12 years of education, who did not attend

university and is in agreement with the statement that she beliefs in God, along

with a two-sided 95% approximate (large-sample) confidence interval.

(4) Fit a Poisson GLM to these data using only the intercept and dur predictors,

and the identity link. Why do you think the glm function produced an error mes-

sage? Try to fix the problem by supplying starting values to glm using glm(....

start=c( , )).

Due on February 16 at noon.Johanna G. Ne?lehová

Generalized Linear Models MATH 523

McGill University, Winter Term 2022

Assignment 1 due on January 28 at noon.

Q1 Lecture 2b

Consider the exponential distribution with density

f(y;λ) =

1

λ

e?x/λ, x ≥ 0,

and parameter λ > 0.

(1) Determine whether the exponential family of distributions is an exponential dis-

persion family. If it is not, explain why. If it is, identify the canonical and the

dispersion parameters, and the functions a, b, and c.

(2) Using the methods discussed in Lecture 2b, calculate the mean and variance of

an exponential random variable.

(3) Determine the mean-variance relationship.

(4) Consider the location extension of the exponential distribution, viz.

f(y;λ, μ) =

1

λ

e?(x?μ)/λ, x ≥ μ.

and parameters λ > 0 and μ ∈ R. Is this family an exponential dispersion family?

Why or why not?

Q2 Lecture 3a

Consider the Negative Binomial distribution with parameters μ > 0 and θZ > 0; the

corresponding probability mass function is given by

f(y;μ, θz) =

Γ(y + θz)

Γ(y + 1)Γ(θz)

(

θz

μ+ θz

)θz ( μ

μ+ θz

)y

, y = 0, 1, . . . ,

where Γ(·) denotes the Gamma function. Assume throughout that θZ, the “number of

successes until the experiment is stopped”, is known.

(1) Show that the Negative Binomial family with known θZ is an exponential disper-

sion family. Identify the functions a, b, and c, and the canonical and dispersion

parameters.

(2) Using the formulas derived in class, calculate E(Y ) and var(Y ) of a Negative

Binomial random variable Y .

(3) Find the mean-variance relationship.

(4) Find the canonical link for a Negative Binomial GLM and discuss its pros and

cons.

Johanna G. Ne?lehová

Generalized Linear Models MATH 523

McGill University, Winter Term 2022

Assignment 1 due on January 28 at noon.

(5) Can you think of another link function that might be more appropriate than the

canonical link?

Q3 Lecture 4a Consider a Negative Binomial GLM with known θZ , viz. Q2. Suppose

the model contains the intercept and one factor predictor, A, with three levels, viz.

A ∈ {1, 2, 3}. This means that

g(μi) = α + β11(Ai = 2) + β21(Ai = 3).

To simplify notation, suppose that Ai = 1 for i = 1, . . . , n1, Ai = 2 for i = n1 +

1, . . . , n1 + n2 and Ai = 3 for i = n1 + n2 + 1, . . . , n for some n1, n2 ∈ {1, . . . , n} such

that n1 + n2 < n.

(1) Write down the log-likelihood for α, β1, β2 when (i) the canonical link is used and

(ii) when the log link is used.

(2) Write down the likelihood equations for α, β1, β2 when (i) the canonical link is

used and (ii) when the log link is used.

(3) Solve the likelihood equations in part (2) explicitly (it’s indeed possible to do

this in this case) when (i) the canonical link is used and (ii) when the log link is

used.

Q4 Lecture 4b

Consider again the Negative Binomial GLM with known θZ and one factor predictor

with three levels described in Q3.

(1) Calculate the Fisher Information Matrix when (i) the canonical link is used and

(ii) when the log link is used.

(2) Calculate the Hessian when (i) the canonical link is used and (ii) when the log

link is used.

Johanna G. Ne?lehová

Generalized Linear Models MATH 523

McGill University, Winter Term 2022

Assignment 1 due on January 28 at noon.

Q5 R Excercise

Load the data crabs2.txt available on myCourses in the Assignments unit under

Content. These data were collected with the goal to explore the effect of various

characteristics of a female horseshoe crab on the number of her satellites, i.e., male

mates attached to her nest. The data contain the following variables:

satell: number of satellites

color: color of the female crab, with values 1=light, 2=light medium, 3=medium,

4=dark medium, 5=dark

spine: condition of the two spines of the female horseshoe crab, with values

1=both good, 2=one worn or broken, 3=both worn or broken

width: carapace width of the female horseshoe crab in cm

weight: weight of the female horseshoe crab in g

Analyze these data with linear regression models, using satell as the response, using

the following steps:

(1) Explain which explanatory variables are factors and which are continuous. Cal-

culate the correlation between width and weight and explain why it is advisable

to keep only one of these variables in the model (and keep width henceforth).

(2) Using satell as the explanatory variable and color, spine, and width as inputs,

build the most suitable linear regression model for these data. Don’t forget that

you can include interactions between the inputs.

(3) Redo the analysis in part (2), but treating color and spine as continuous ex-

planatory variables this time. Explain why this makes sense. Do you obtain a

different model than in part (2)?

(4) Using your analyses in parts (2) and (3), single out a linear regression model that

you find the most appropriate for these data. Using various model diagnostics,

comment on the quality of the fit. Interpret your final model and formulate which

drawbacks it has, in your opinion.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp