联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-01-30 10:59

STA465/STA2016: Week 2 Questions and Homework 1

(UPDATED)

Daniel Simpson

2019-01-18

Please submit your answers to Question 1 (Parts A-D) as a single pdf document via Quercus. It will be

Due Wednesday 30 January, 2019 at 12pm (Midday). Late submissions will be heavily penalized.

Question 1 (For assessment)

There has been a lot of work done in Bayesian statistics on the effect of prior distributions. In this question,

we’re going to explore a framework for constructing priors that contain, in some sense, minimal information.

These priors have the advantage of being automatic (there are no parameters that need to be set by considering

the problem). On the other hand, they can have poor empirical behaviour for many problems. Nevertheless,

they are commonly used for parameters that are well-identified by the data, such as the (fixed) intercept

term in a linear regression model.

One of the most popular examples of this type of prior is called the Jeffreys’ Prior defined for a univariate

parameter θ in the model y | θ ~ p(y | θ) as

(Health warning: Although the Jeffreys priors mentioned in this question sometimes do nice things, they

should only be used with extreme care in real analyses. They can do very weird things if you’re not careful!

Nevertheless it’s important to know they exist and get a feeling for how they work.)

Part A. If y | μ N(μ, σ2) for some fixed value of σ2

, show that the Jeffreys’ prior for μ in this model is the

improper uniform prior p(μ) ∝ 1.

Part B. Consider the model yi| μ ~ N(μ, σ2)

p(μ) ∝ 1,

where σ2

is fixed and known. Show that although p(μ) ∝ 1 is not a proper prior distribution (as it does not

integrate to one), show that formally applying the Bayesian machinery leads to a proper posterior distribution.

Compute this distribution.

Part C. Consider the similar model

2 are fixed and known. Compute the posterior distribtuion p(μ | y) under this model and

write a sentence or two comparing it to the posterior computed using the Jeffreys prior for μ. Are they ever

the same?

Part D. Do you expect an improper prior to always yield a proper posterior?

1

Question 2 (Not for assessment)

In this question we will look at a very simple example that will help you become familiar with the R-INLA

software. We will look at a very simple non-spatial models. The aim of this question is to walk you through

an analysis with INLA.

Please work through this question step by step. There are three questions at the end.

To start with, we need to load the library

library(INLA)

In the event that you don’t already have INLA installed, you can install it but running this command.

install.packages("INLA", repos=c(getOption("repos"),

INLA="https://inla.r-inla-download.org/R/testing"),

dep=TRUE)

For this example we’re going to look at classical experiment that attempted to work out the effect that a

root extract had on the germination probability of certain types of seeds. More details can be found here:

http://www.openbugs.net/Examples/Seeds.html

The experimental set up is a 2x2 factorial design split up over 21 growing plates (the (0,1) experiment is

repeated six times, while the others are repeated four times).

The data looks like this.

data(Seeds)

head(Seeds)

## r n x1 x2 plate

## 1 10 39 0 0 1

## 2 23 62 0 0 2

## 3 23 81 0 0 3

## 4 26 51 0 0 4

## 5 17 39 0 0 5

## 6 5 6 0 1 6

We model the number of seeds that germinate ri out of ni seeds as

ri ~ Binomial(ni, pi).

We want to see how the probability of germination varies with the two treatments (labelled x1 and x2 in the

data). Because pi

is constrained to be between zero and one (it’s a probability!) we need to first transform it

to an unconstrained real number as this will allow us to do regression. The classical transformation is the

logistic transformβ1x1i + β2x2i + β12x1ix2i.

If each individual experiment was performed identically, we might just want to fit this model. However, each

repeated experiment took place on a different growing plate and it is possible that this introduces some extra

2

variation into the data. To account for this extra variation, we add a random effect ui so our model is

2 ~ Inverse Gamma(1, 1e 5)

The priors specified in this model are the default priors in INLA. For the moment we will keep them.

We can specify this model in INLA using the formula

formula_seeds = r ~ 1 + x1 + x2 + x1*x2 + f(plate, model="iid")

The formula in INLA works the same way as it does in lm with just one little twist. So the r on the left

hand side of ~ is the response variable, while the first four terms on the right hand side denote the regression

model. (NB: All of these terms could be replaced with x1*x2 as this is automatically expanded to include

the other terms.)

The one little twist is the f() function, which is what we use to tell INLA that there is something special. In

this case, it’s the random effect ui

. The arguments to f are : - The name of the variable in the data (this

must be unique) - The structure of the random effect, which is model = "iid". This option tell us that each ui

iid~ N(0, τ 2). We will become familiar with other options.

(There are a pile of other optional arguments that f can take, and we’ll discuss them as appropriate.)

With the formula in hand, we can fit the model using the following command:

fit_seeds = inla(formula_seeds, data=Seeds, family="binomial", Ntrials=n)

The first two arguments to the inla function specify the formula and the data (just like with lm). The

family argument tells inla that the data has a Binomial distribution (it automatically uses the logit

transformation), while the Ntrials argument tells inla where in the data to find the first argument of the

Binomial distributions (recall, ri ~ Binomial(ni

, pi)).

This code runs fairly fast and produces the following output.

summary(fit_seeds)

##

## Call:

## c("inla(formula = formula_seeds, family = \"binomial\", data =

## Seeds, ", " Ntrials = n)")

## Time used:

## Pre = 2.09, Running = 0.216, Post = 0.121, Total = 2.43

## Fixed effects:

## mean sd 0.025quant 0.5quant 0.975quant mode kld

## (Intercept) -0.557 0.129 -0.813 -0.557 -0.306 -0.555 0

## x1 0.143 0.227 -0.307 0.144 0.586 0.146 0

## x2 1.321 0.182 0.968 1.320 1.682 1.318 0

## x1:x2 -0.782 0.312 -1.395 -0.781 -0.169 -0.781 0

##

## Random effects:

## Name Model

## plate IID model

3

##

## Model hyperparameters:

## mean sd 0.025quant 0.5quant 0.975quant mode

## Precision for plate 19386.16 19788.68 153.07 13211.66 72720.49 9.78

##

## Expected number of effective parameters(stdev): 4.41(1.41)

## Number of equivalent replicates : 4.77

##

## Marginal log-Likelihood: -72.08

One thing that we see here is that the precision for the random effect is very high (median 13212). This

means that the median of the standard deviation is around 0.009, which is very small.

Question: In the previous paragraph, I used the median rather than the mean because medians are easy to

transform. Show that if a random variable Z has median m, then f(Z) has median f(m) as long as f(·) is a

monotone function. Is this true for a general quantile?

It is also possible to plot the posteriors by running plot(fit_seeds). This produces multiple plots that you

can scroll through using the arrows above the plotting window in RStudio.

If you look at the object fit_seeds you will see it’s a list of lists. This contains all of the information that

you need to do things with the model in relative well-named places.

Question: Find the following quantities in the fit_seeds object:

The mean of the intercept (Hint: covariates are also called “fixed effects” and the mean is a summary)

The median of the coefficient of x1

Find the full distribution for x1 in fit_seeds and plot it (hint: it’s called a “marginal distribution”)

Question: How does this model differ from the model without the random effect?

Question 3 (Not for assessment)

For this question we will look at the PM2.5 data, which can be downloaded from the quercus or with the

following command (which may not work on all systems)

load(url("https://github.com/jgabry/bayes-vis-paper/blob/master/bayes-vis.RData?raw=True"))

class(GM)

## [1] "SpatialPointsDataFrame"

## attr(,"package")

## [1] "sp"

#Expected output:

# [1] "SpatialPointsDataFrame"

# attr(,"package")

# [1] "sp"

As that command shows, this is a special type of data frame called a SpatialPointsDataFrame. This is

inconvenient for today’s analysis (although it will become convenient later), so for the moment we will extract

out the “slot” of the SpatialPointsDataFrame that contains the stuff we want. (We access a slot using an @

rather than a $ because R has it’s own internal logic)

pm25_data = GM@data

There are three tasks for this question:

1. Use INLA to fit the linear model log(pm25) ~ log(sat_2014).

4

2. By modifying the exercise you did in Question 2, use INLA to fit a model with a random intercept

using the default priors ie

log(pm25ij ) | σ, β, μj ~ N(μj + β log(sat_2014ij ), σ2)

μj | μ, τ ~ N(μ, τ 2).

Using the summary.fixed and summary.random entries in the output list, find an estimate for the median

intercept in each region. (Hint: Use the numeric vector super_region rather than the Factor vector

super_region_name to index the random intercept.)

3. Fit the full multilevel model using INLA (with the default priors). This model is

log(pm25ij ) | σ, βj , μj ~ N(μj + βj log(sat_2014ij ), σ2)

μj | μ, τμ ~ N(μ, τ 2μ)

βj | β, τβ ~ N(β, τ 2β).

One of the challenges you will have with this third exercise is that a formula cannot use the same

variable in the data more than once. To get around this, I suggest copying the variable as follows:

pm25_data_part3 = data.frame(pm25 = pm25_data,

sat_2014 = pm25_data$sat_2014,

super_region_intercept = pm25_data$super_region,

super_region_sat = pm25_data$super_region)


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp