联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-03-11 12:12


ECMT1020 Introduction to Econometrics Week 2, 2023S1

Lecture 2: Distributions, Samples, and Estimators

Instructor: Ye Lu

Please read Chapter R.5每R.8 of the textbook.

Contents

1 Four Important Probability Distributions 1

1.1 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 t distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 F distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Samples and Estimators 4

2.1 Sampling and double structure of a sampled random variable . . . . . . . . . 4

2.2 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.4 Loss functions and mean squared error . . . . . . . . . . . . . . . . . . . . . . 8

3 Exercises 10

1 Four Important Probability Distributions

In the previous lecture, we discussed discrete and continuous random variables and their

probability distributions. Before today*s review on sampling, estimators, and hypothesis

testing, we first review/introduce four probability distributions, all continuous, that turns

out to be important in the statistical inference that we will use. They are the normal

distribution, the t distribution, the F distribution, and the chi-squared (聿2) distribution.

1.1 Normal distribution

The normal distribution is the most commonly-used distribution in econometrics. The prob-

ability density function (pdf) of the normal distribution is symmetric and beautifully bell-

shaped. It is fully determined by the mean/expectation 米 ﹋ R and variance 考2 > 0 of the

distribution, and has the form. (1)

The structure of the normal distribution is shown in Figure 1. Think about below questions:

What are the mean, median, and mode of the normal distribution?

How will the shape of the normal pdf change when 考2 becomes larger or smaller?

1

Figure 1: Structure of the normal distribution (Figure R.12 in the textbook)

When 米 = 0 and 考 = 1 the normal distribution is called the standard normal distribution,

and the pdf of the standard normal distribution is usually denoted as ?(x):

where f(x|米, 考2) is the normal pdf defined in (1). Note that we have the relationship

f(x|米, 考2) = 1.

Therefore, every normal pdf can be considered as derived from the standard normal pdf by

the following three steps:

1. relocate the center of standard normal pdf from 0 to 米;

2. stretch/scale the whole domain of the standard normal pdf by a factor of 考;

3. multiply the pdf function by 1/考. (This is to ensure the pdf integrates to 1.)

For this reason (the steps 1 and 2 above in specific), 米 is called the location parameter and

is called the scale parameter of the normal distribution.

We call a random variable, say X, as a &normal random variable* or &normally distributed*

if it follows a normal distribution. Given the location parameter 米 and scale parameter 考,

we write

X ‵ N(米, 考2).

It is clear that E(X) = 米 and Var(X) = 考2.

1.2 t distribution

The t distribution, or Student*s t distribution, arises in statistics when estimating the pop-

ulation mean of a normally distributed random variable in situations where the sample size

is small and the population variance is unknown. The Student*s t distribution was named

after English statistician William Sealy Gosset under the pseudonym of &Student*.

2

The pdf of t distribution is1,

where 忙(﹞) is the gamma function, and the parameter 糸 > 0 is called the &degrees of freedom*

of the t distribution.

The pdf of t distribution is also symmetric and bell-shaped, however, it has &fatter tails*

than normal distribution. Note there are two special cases of t distribution:

When 糸 = 1, the t distribution becomes the well-known Cauchy distribution which

does not have a well-defined (because it is infinity) expectation/mean.

When 糸 ↙﹢, the t distribution becomes the standard normal distribution2.

The notation of a random variable X following t distribution with degrees of freedom 糸

is

X ‵ t(糸) or X ‵ t糸 .

1.3 Chi-squared distribution

The chi-squared (or sometimes chi-square or 聿2) distribution with k &degrees of freedom*,

denoted as 聿2(k) or 聿2k, is the distribution of a sum of the squares of k independent standard

normal random variables. Here the parameter k is a positive integer.

In other words, if Z1, . . . , Zk are independent, standard normal random variables, then

the random variable

X :=

k﹉

i=1

Z2i = Z

2

1 + ﹞ ﹞ ﹞+ Z2k

follows the chi-squared distribution with k degrees of freedom, and the notation is

X ‵ 聿2(k) or X ‵ 聿2k.

Clearly, a chi-squared random variable can only take values on [0,﹢). The chi-squared

distribution is commonly used for critical values for &asymptotic* tests3.

1You don*t need to remember this, but I have it here for completion.

2This is because lim糸↙﹢ f(x|糸) = ?(x) where ?(x) is the pdf of the standard normal distribution defined

in (2). For your curiosity of why this is the case, first note below two results on the limits of functions:

As one way of defining exponential function, we have lim糸↙﹢(1 + x/糸)糸 = ex for any x.

By using the Stirling*s approximation of gamma functions, we have lim糸↙﹢

((糸+1)/2)﹟

糸忙(糸/2)

= 1/

2.

Then we have, as 糸 ↙﹢,

f(x|糸) = 忙((糸 + 1)/2)﹟.

3The &asymptotic* tests are the hypothesis tests conducted when the sample size is large enough to be

approximated as infinity.

3

1.4 F distribution

The F distribution with two parameters, 糸1 and 糸2, indicating &degrees of freedom* is the

distribution of a random variable X defined as

X :=

Q1/糸1

Q2/糸2

,

where

Q1 ‵ 聿2(糸1) and Q2 ‵ 聿2(糸2);

Q1 and Q2 are independent.

The notation is

X ‵ F (糸1, 糸2).

The F distribution was tabulated by a 1934 paper by Snedecor who introduced the notation

F as the distribution is related to Sir Ronald Fisher*s work on the analysis of variance.

2 Samples and Estimators

The unifying methodology of modern econometrics was articulated by Trygve Haavelmo

in his seminar paper ※The probability approach in econometrics§ (1944, [link]). In this

paper, Haavelmo argued that quantitative economic models must necessarily be &probability

models* instead of deterministic models, because the latter are blatently inconsistent with

observational economic quantities. Once we acknowledge that

1. an economic model is a probability model, and

2. observational4 economic data are &realizations* of some random variables whose popu-

lation distributions are not fully known,

it follows naturally that an appropriate way to quantify, estimate, and conduct inferences

about the economic phenomena should be through the powerful theory of mathematical

statistics.

2.1 Sampling and double structure of a sampled random variable

In a certain application, to infer some population characteristics of a random variable or

to infer the relationship among a set of random variables, an econometrician uses a set of

repeated measurements on these variables. For example, in a labor application the variables

could include weekly earnings, years of education, age, gender, among others. We call these

measurements the data, dataset, or sample. We use the term observations to refer to distinct

repeated measures on the variables.

An individual observation may corresponds to a specific economic unit, such as the

income of a person, household, firm, city, country, etc. ↙ cross-sectional observations

4Most economic data are &observational* instead of &experimental* which is more common in natural science.

This is because conducting experiments in social and economic studies is oftentimes condemned as immoral

or simply impossible. The constraint of having only observational data makes the inference of &causality*

particularly hard in econometrics.

4

An individual observation may also corresponds to a measurement at a point in time,

such as quarterly GDP or a daily stock price. ↙ time-series observations

Now let*s formulate things mathematically. Let X be the random variable we are inter-

ested in, and we want take to a sample of n observations to infer, say the population mean

of X. A subtle but important point here is how we understand these n observations in our

sample.

Before (Pre) the sample is generated, the n potential observations of X are considered

as a set of n random variables which follow the same distribution as that of X. Fol-

lowing our convention of using upper case Roman letters to denote random variables,

we denote the n observations of X in a sample as

{X1, X2, . . . , Xn}.

In particular, we call {Xi : i = 1, . . . , n} a random sample if they are (1) mutually

independent, and (2) identically distributed (i.i.d.) across i = 1, . . . , n. In the following,

unless mentioned otherwise, the samples we will discuss are random samples.

After (Post) the sample is generated, the observations of X become n specific numbers.

We denote these numbers as

{x1, x2, . . . , xn},

using lower case letters. A statistician would refer to {x1, . . . , xn} as a realization of

the random variable X.

Understanding such &double structure* of a sampled random variable before and after the

sample is generated is crucial for understanding the (pre-sample) analysis of the properties

of estimators and the procedure of hypothesis testing.

2.2 Estimators

An estimator can be, in general, considered as a function of the sample. It takes all the

observations in the sample, X1, . . . , Xn, as inputs, and produce an output quantity (based

on a particular rule) to estimate a certain population characteristic of the random variable

X. For example, suppose we have a random sample {X1, . . . , Xn} of X and a random sample

{Y1, . . . , Yn} of Y :

If we want to estimate the (population) mean 米X of X, then we may consider the

sample mean* If we want to estimate the (population) variance 考2X of X, then we may consider the

sample variance*

If we want to estimate the (population) covariance 考XY of X and Y , then we may

consider the &sample covariance*

Question: why do we divide by n 1 rather than n in the formulas of 考?2X and 考?2XY ?

(Read textbook R.7, pages 33-34)

If we want to estimate the (population) correlation coefficient 老XY of X and Y , then

we may consider the &sample correlation coefficient*.

Note that all the quantities above, namelyX, 考?2X , 考XY , and 老XY are estimators. Two questions:

1. Can we talk about the probability distributions of these estimators? Why?

2. If yes, then what are the mean and variance of X for example? How do the mean and

variance of X depend on the sample size n?

The key here is, again, the distinction between the potential distribution of the estimator

before the sample is generated, and the actual realization after the sample is generated.

Before the sample is generated, an estimator is a function of the observations in the

sample(s) which are considered as random variables. Therefore, an estimator is also a

random variable in the pre-sample analysis.

After the sample is generated, the random variables (Xi or Yi) in the formula of the

estimator can be replaced by their actual realizations (xi or yi). We describe the

realized value of the estimator as the estimate, which is just a specific number.

See Figure 2 for an illustration of this &double structure* of estimator inherited from the

double structure of a sampled random variable.

Having fixed the idea that an estimator is a random variable which follows certain

probability distribution in the pre-sample analysis, we can now talk in general about the

mean/expectation and the variance of an estimator. The study of these two characteristics

in the distribution of an estimator leads us to the analysis of the two important properties

of an estimator, namely unbiasedness and efficiency.

2.3 Bias and Variance

Let*s adopt some generic notations. Let Z = Z(X1, . . . , Xn) be an estimator for the value of

a population characteristic (say mean, variance, etc.), denoted as 牟. We say Z is an unbiased

estimator if

E(Z) = E[Z(X1, . . . , Xn)] = 牟. (3)

If equation (3) does not hold, then we say Z is a biased estimator, and the bias is E(Z)? 牟.

6

Figure 2: Sample and estimator (Table R.5 in the textbook)

If the bias is negative, or E(Z) < 牟, then there is an under-estimation bias.

If the bias is positive, or E(Z) > 牟, then there is an over-estimation bias.

Taking sample mean as an example. Suppose X = 1n

n

i=1Xi is used to estimate the

unknown population mean 米X of X. We say X is unbiased if.

Questions:

1. Can you show that X is unbiased for 米X? What if the observations X1, . . . , Xn are

not mutually independent? Does it matter?

2. Is X the only unbiased estimator for 米X?

If there are more than one unbiased estimators, how do we compare them? See Figure 3.

The idea of efficiency comparison5 is that we prefer the estimator to have as high probability

as possible of giving a close estimation of the population characteristic. ↙ pdf as concen-

trated as possible around the true value. Another way to put it is that we want the variance

of the estimator to be as small as possible.

5Note that efficiency is a comparative concept, and you should use the term only when comparing different

estimators rather than summarizing changes in the variance of a single estimator.

7

Figure 3: Two unbiased estimators and efficiency comparison (Figure R.8 in the textbook)

Mathematically, suppose we have two unbiased6 estimators Z1 = Z1(X1, . . . , Xn) and

Z2 = Z2(X1, . . . , Xn) for the population characteristic 牟; we say Z1 is more efficient than Z2

if

Var(Z1) < Var(Z2),

and vice versa. Note that in the definition, we requite Z1 and Z2 to use the same amount

information: X1, . . . , Xn as observations on random variable X. This is for fair comparison.

In the textbook (R.45)每(R.49), it shows that the sample mean is the most efficient es-

timator for the population among all the estimators of the weighted average kind, with a

simple illustration where the sample size n = 2.

2.4 Loss functions and mean squared error

Clearly, both unbiasedness and mininum variance are desirable properties of an estimator.

But sometimes there can be conflicts between these two properties when we choose among

estimators. See Figure 4.

There is not a sure answer to the question of which estimator to choose. It all depends

on the circumstances and what criterion one would like to use. In the decision theory of

statistics, a &loss function* denoted as ?(Z, 牟) is often introduced to quantify the cost of using

an estimator, say Z, to estimate a target parameter 牟. The loss function can be very general

as long as it satisfies:

? ?(Z, 牟) ≡ 0 for any Z.

? ?(Z, 牟) = 0 if Z = 牟.

Given the loss function, the &optimal* estimator is considered as the one which minimizes the

expected loss E[?(Z, 牟)].

Just to name a few examples of the loss function:

6In the textbook, the comparisons of efficiency are mostly confined to unbiased estimators.

8

Figure 4: Which estimator to choose? (Figure R.9 in the textbook)

quadratic/squared loss: ?(Z, 牟) = (Z ? 牟)2

linear/absolute loss: ?(Z, 牟) = |Z ? 牟|

Hubor loss: quadratic for small values of |Z ? 牟| and linear for large values of |Z ? 牟|

In particular, the quadratic loss function is the most commonly used. The expected loss

when quadratic loss function is used is known as the mean squared error (MSE):

MSE of estimator Z = E[(Z ? 牟)2].

So the estimator that minimizes the expected loss when quadratic loss function is used is the

estimator that has the smallest mean squared error.

Next we show a useful decomposition of MSE:

MSE of an estimator := Variance of the estimator + Bias of the estimator squared . (4)

In mathematical form, let Z be the estimator for 牟, and let 米Z and 考

2

Z denote the mean and

variance of Z, respectively. We decompose the MSE of Z as follows:

MSE(Z) = E[(Z ? 牟)2]

= E[(Z ? 米Z + 米Z ? 牟)2]

= E[(Z ? 米Z)2 + (米Z ? 牟)2 + 2(Z ? 米Z)(米Z ? 牟)]

= E[(Z ? 米Z)2] + E[(米Z ? 牟)2伴 佞佞 伴

constant

] + 2E[(Z ? 米Z) (米Z ? 牟)伴 佞佞 伴

constant

]

= E[(Z ? 米Z)2]伴 佞佞 伴

2Z

+( 米Z ? 牟伴 佞佞 伴

=E(Z)?牟

)2 + 2(米Z ? 牟) E(Z ? 米Z)伴 佞佞 伴

=E(Z)?米Z=0

= Var(Z) + Bias2(Z).

Because of this decomposition, the MSE is sometimes used to generalize the concept of

9

efficiency to cover comparisons of biased as well as unbiased estimators.

Example: MSE of the sample variance as the estimator of population variance and the

idea of shrinkage in statistics.

3 Exercises

The below questions are from Exercises R.15, R.19-23, R.30-33 in the textbook. Note that

R.23 and R.30-33 are related to hypothesis testing that you are supposed to have learned in

a pre-requisite course. You should take these exercises as a review together with reading the

corresponding textbook sections (Chapter R.9每R.13).

R.15 For the special case 考2X = 1 and a sample of two observations X1 and X2, calculate the

variance of the generalized estimator, Z = 竹1X1+ 竹2X2 with 竹1+ 竹2 = 1, of the population

mean. Using the fact that

21 + 竹

2

2 = 竹

2

1 + (1? 竹1)2 = 2竹21 ? 2竹1 + 1,

obtain the variance of Z with values of 竹1 from 0 to 1 at steps of 0.1, and plot it in a diagram.

Is it important that the weights 竹1 and 竹2 should be exactly equal?

R.19? In general, the variance of the distribution of an estimator decreases when the sample

size is increased. Is it correct to describe the estimator as becoming more efficient?

R.20 If you have two estimators of an unknown population parameter, is the one with the

smaller variance necessarily more efficient?

R.21? Suppose that you have observations on three variables X,Y , and Z, and suppose

that Y is an exact linear function of Z:

Y = 竹+ 米Z

where 竹 and 米 are positive constants. Show that 老?XZ = 老?XY . (This is the counterpart of

Exercise R.14.)

R.22 A scalar multiple of a normally-distributed random variable also has a normal dis-

tribution. A random variable X has a normal distribution with mean 5 and variance 10.

Sketch the distribution of Z = X/2.

R.23 Suppose that a random variable with hypothetical mean 10 may be assumed to have

a normal distribution with variance 25. Given a sample of 100 observations, derive the

acceptance and rejection regions for X, (a) using a 5 percent significance test, (b) using a 1

percent test.

R.30 A drug company asserts that its course of treatment will, on average, reduce a per-

son*s cholesterol level by 0.8 mmol/L. A researcher undertakes a trial with a sample of 30

individuals with the objective of evaluating the claim of the drug company. What should he

report if he obtains the following results:

(a) a mean decrease of 0.6 units, with standard error 0.2 units;

(b) a mean decrease of 0.4 units, with standard error 0.2 units;

(c) a mean increase of 0.4 units, with standard error 0.2 units?

10

R.31 When a local sales tax was abolished, a survey of 20 households showed that mean

household expenditure increased by $160 and the standard error of the increase was $60.

What*s the 99 percent confidence interval for the effect of the sales?

R.32 Determine the 95 percent confidence interval for the effect of an increase in the min-

imum wage on employment, given the data in Exercise R.29, for each part of the exercise.

How do these confidence intervals relate to the results of the t tests in that exercise?

R.33? Demonstrate that the 95 percent confidence interval defined by equation

X ? tcrit,2.5% ℅ s.e.(X) ≒ 米 ≒ X + tcrit,2.5% ℅ s.e.(X)

has a 95 percent probability of capturing 米0 if H0 : 米 = 米0 is true.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp