Stat-3503代写、代做R编程语言、R设计代写、program留学生代做-代写Algorithm 算法作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

Stat-3503代写、代做R编程语言、R设计代写、program留学生代做

日期：2019-09-23 11:25

Stat-3503 McAlinn/Fall-19

problem set no. 1 — due Monday 9/23 before lecture starts

learning objectives. compute likelihoods, both for a generic sample, i.e., (x1, ..., xn),

and for a specific sample, i.e., (2, 3, 6, 4, 8, 5, 6, 2, 3, 6, 5); write some short programs in R

to generate fake data sets from a given model and plot the corresponding likelihoods.

problem 1. set-up: you are interested in studying the writing style of a popular Time

Magazine contributor, FZ. you collect a simple random sample of his articles and count how

many times he uses the word however in each of the articles in your sample, (x1, ..., xn).

In this set-up, xi

is the number of times the word however appeared in the i-th article.

question 1.1. (10 points) define the population of interest, the population quantity of

interest, and the sampling units.

question 1.2. (10 points) what are potentially useful estimands for studying writing style?

(hint: you are interested in comparing FZ writing style to that of other contributors.)

question 1.3. (10 points) model: let Xi denote the quantity that captures the number

of times the word however appears in the i-th article. let’s assume that the quantities

X1, ...Xn are independent and identically distributed (IID) according to a Poisson distribution

with unknown parameter λ,

p(Xi = xi

| λ) = Poisson(xi

| λ) for i = 1, ..., n.

using the 2-by-2 table of what’s variable/constant versus what’s observed/unknown, declare

what’s the technical nature (random variable, latent variable, known constant or unknown

constant) of the quantities involved the set-up/model above: X1, ..Xn, x1, ...xn, λ and n.

question 1.4. (10 points) write the data generating process for the model above.

question 1.5. (10 points) define the likelihood L(λ) = p(· | ·) for this model and set-up at

the highest level of abstraction.

question 1.6. (10 points) write the likelihood L(λ) for a generic sample of n articles,

(x1, ..., xn).

question 1.7. (10 points) write the log-likelihood `(λ) for a generic sample of n articles,

(x1, ..., xn).

question 1.8. (10 points) write the log-likelihood `(λ) for the following specific sample of 7

articles (12, 4, 5, 3, 7, 5, 6).

Stat-3503 McAlinn/Fall-19

question 1.9. (10 points) plot the log-likelihood `(λ) in R for the same specific sample of

7 articles (12, 4, 5, 3, 7, 5, 6). What is the maximum value of λ (approximately)?

question 1.10. (10 points) draw a graphical representation of this model, which explicitly

shows the random quantities and the unknown constants only.

Extra credit mmmh ... something is amiss. the articles FZ writes have different lengths.

if we model the word occurrences in each article as IID Poisson random variables with rate

λ, we are implicitly assuming that the articles have the same length. why? (10 points;

extra credit) and if that is true, what is the implied common length? (10 points; extra

credit)

problem 2. set-up: you collect another random sample of articles penned by FZ and

count how many times he uses the word however in each of the articles in your sample,

(x1, ..., xn). you also count the length of each article in your sample, (y1, ..., yn). In this

set-up, xi

is the number of times the word however appeared in the i-th article, as before,

and yi

is the total number of words in the i-th article.

question 2.1. (10 points) model: let Xi denote the quantity that captures the number

of times the word however appears in the i-th article. let’s assume that the quantities

X1, ...Xn are independent and identically distributed (IID) according to a Poisson distribution

with unknown parameter ν ·

1000 ,

p(Xi = xi

| yi

, ν, 1000) = Poisson(xi

| ν ·

1000

) for i = 1, ..., n.

using the 2-by-2 table of what’s variable/constant versus what’s observed/unknown, declare

what’s the technical nature (random variable, latent variable, known constant or unknown

constant) of the quantities involved the set-up/model above: X1, ..Xn, x1, ...xn, y1, ...yn, ν

and n.

question 2.2. (10 points) what is the interpretation of yi

1000 in this model? explain.

question 2.3. (10 points) what is the interpretation of ν in this model? explain.

question 2.4. (10 points) write the data generating process for the model above.

question 2.5. (10 points) define the likelihood L(ν) = p(· | ·) for this model and set-up at

the highest level of abstraction.

question 2.6. (10 points) write the likelihood L(ν) for a generic sample of n articles,

(x1, ..., xn), and n article lengths, (y1, ..., yn).

Stat-3503 McAlinn/Fall-19

question 2.7. (10 points) write the log-likelihood `(ν) for a generic sample of n articles,

(x1, ..., xn), and n article lengths, (y1, ..., yn).

question 2.8. (10 points) Simulate the number of occurrences of the word however for 5

articles using the data generating process. Assume ν = 10 and coresponding article lengths

y = (1730, 947, 1830, 1210, 1100). Record the number of occurrences of however in each

article.

question 2.9. (10 points) write the log-likelihood `(ν) for the following the specific sample

of occurrences you generated in the previous question and their corresponding 5 article

lengths (1730, 947, 1830, 1210, 1100).

question 2.10. (10 points) Plot the log-likelihood from the previous question in R. Does

the maximum occur near ν = 10?

question 2.11. (10 points) draw a graphical representation of this model, which explicitly

shows the random quantities and the unknown constants only.

OK, that was a more reasonable model. but FZ writes about different topics. our model

is not capturing that. is FZ more prone to offering his own opinions when he writes about

politics than when he writes about other topics? let’s investigate.

problem 3. set-up: you collect a random sample of articles penned by FZ and count how

many times he uses the certain word I in each of the articles in your sample, (x1, ..., xn).

In this set-up, xi

is the number of times the word I appeared in the i-th article.

question 3.1. (10 points) model: let Xi denote the quantity that captures the number of

times the word I appears in the i-th article. let Zi

indicate whether the i-th article is

about politics, denoted by Zi = 1, or not, denoted by Zi = 0. let’s assume that the quantities

X1, ..., Xn are independent of one another conditionally on the corresponding values

of Z1, ..., Zn. let’s assume that the quantities Z1, ..., Zn are independent and identically

distributed (IID) according to a Bernoulli distribution with parameter π,

p(Zi

| π) = Bernoulli(zi

| π) for i = 1, ..., n.

let’s further assume that the number of occurrences of the word I in an article about

politics follows a Poisson distribution with unknown parameter λP olitics,

p(Xi = xi

| Zi = 1, λP olitics) = Poisson(xi

| λP olitics) for i = 1, ..., n,

and that the number of occurrences of the word I in an article about any other topic follows

a Binomial distribution with size 1000 and unknown parameter θOther,

p(Xi = xi

| Zi = 0, 1000, θOther) = Binomial(xi

| 1000, θOther) for i = 1, ..., n.

Stat-3503 McAlinn/Fall-19

using the 2-by-2 table of what’s variable/constant versus what’s observed/unknown, declare

what’s the technical nature (random variable, latent variable, known constant or unknown

constant) of the quantities involved the set-up/model above: X1, ..Xn, x1, ...xn, Z1, ..Zn,

z1, ...zn, π, λP olitics, θOther and n.

question 3.2. (10 points) write the data generating process for the model above.

question 3.3. (10 points) simulate 1000 values of Xi

in R from the data generating process

assuming pi = 0.3, λP olitics = 30 and θOther = 0.02. Plot the values of Xi

|Zi = 1 and

|Zi = 0 as two histograms on the same plot. Color the histograms by the value of Zi so

the two populations can be distinguished.

question 3.4. (10 points) write the likelihood for 1 article, Li(λP olitics, θOther) = p(Xi =

| λP olitics, θOther).

question 3.5. (10 points) write the likelihood L(λP olitics, θOther) for a generic sample of n

articles, (x1, ..., xn).

question 3.6. (10 points) write the log-likelihood `(λP olitics, θOther) for a generic sample of

n articles, (x1, ..., xn).

question 3.7. (10 points) write the log-likelihood `(λP olitics, θOther) for the following specific

sample of 8 articles (12, 4, 8, 3, 3, 10, 1, 9).

question 3.8. (10 points) draw a graphical representation of this model, which explicitly

shows the random quantities and the unknown constants only.

Extra credit wait, but is it reasonable to assume that the rate λ is an unknown constant

in all of our models? it seems like a stretch. (10 points; if you agree)

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：FIT2086留学生代做、代写R语言、代做Moodle site、R程序设计调试

【下一篇】：FIT2086留学生代做、代写R语言、代做Moodle site、R程序设计调试

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

Stat-3503代写、代做R编程语言、R设计代写、program留学生代做

日期：2019-09-23 11:25

相关文章