联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-11-22 10:40

Econ 325 (004)

Winter Session, Term 1, 2019

M. Vaney

Lab 2 - Demonstration of the Central Limit Theorem

Due: Monday November 25. Submit your work online.

Purpose

In this lab R is used to demonstrate the Central Limit Theorem, a theorem that provides a

theoretical basis for estimation and inference even for underlying populations that are not

normally distributed. The lab reinforces the use of .do Öles as an e¢ cient way to execute a

series of commands and the use of loops to automate repetitive tasks. The lab also introduces

a few additional R commands.

Central Limit Theorem

Given a random sample of size n from underlying distribution f(x) with 1

< E fXg < 1

(Önite mean) and 0 < 2 < 1 (Önite variance), the sample mean will be distributed as

approximately normal with X. This can also be expressed as limn!1 X.

One implication of this for estimation is that even if the underlying distribution is not normally

distributed, by appealing to the Central Limit Theorem we may treat the sample mean,

X

n; as an approximately normally distributed random variable. The following Ögure shows

the underlying distribution of a random variable X as a solid line. Clearly X is not normally

distributed. The random variable X has realizations only over the interval [0; 3] rather than

(1;1);

X is not symmetric, X is not uni-modal. However, taking random samples of

size n and computing the sample mean for each di§erent random sample we see that the

distribution of the sample mean (red dashed line) has many of the features characteristic of

a normally distributed random variable (uni-modal, symmetric, bell-shaped).

How closely the sample mean conforms to a normal distribution will depend on features of

the underlying distribution and the sample size. The larger the sample size the more closely

the distribution of the sample mean will resemble a normally distributed random variable.

Data and Methodology

A number of ëpopulationsíare provided. In order to demonstrate the CLT it will be necessary

to describe the distribution of the sample mean for each of the populations.

Data

The Öle lab2-variables.csv contains N = 700 observations for each of 5 random variables

(called x1; : : : ; x5). Each of these can be thought of as a di§erent Population with a given

underlying distribution f(x1); g(x2); : : : ; k(x5).

Methods

Use R to carry out the following tasks:

1. (a) Generate summary statistics and create histograms for each of the 5 variables.

(b) Draw 1000 random samples of size n = 4; 25 and 144 for each of the random

variables (without replacement). Compute the sample mean for each random

sample and construct a histogram of the sample means..

R commands

This lab will make use of some commands that are found two additional packages available

in R: dplyr and ggplot2. Both of these packages must be loaded in R. You can check to

see which packages are loaded by selecting the packages tab in the lower right corner of the

screen. If a package has not been installed in the console the following command can be

entered:

install.packages("ggplot2")

the ggplot2 package will be installed (it may take a minute or two)

In order to make use of the additional commands available in a package your script Öle

must refer to the packages through a library commnad. It is best to start the script with

speciÖcation of the required packages:

library(ggplot2)

library(dplyr)

The dplyr package has a number of commands that are useful for re-organizing data. The

command that we will use in this lab is sample_n(data, sample size)

The ggplot2 package is used for making various graphs and Ögures. A very useful resource

for creating histograms in ggplot2 can be found at the link provided in the Lab folder on

Canvas.

The sample_n() command will draw a single random sample (of rows of a dataset) of a

speciÖc size, n. To generate 1000 random samples, sample_n() command along with a

2

command to take the mean can be embedded in the command replicate() which will repeat

these commands a speciÖed number of times.

Results and Discussion

Present and provide some discussion of the following:

Submit your .do Öle for this lab. Do not submit raw data.

1. (a) Consider the summary statistics and graphics for the underlying populations. Do

the underlying distributions appear to be Normally distributed? Comment on

the apparent distributions of each of the variables (symmetric, skewed, number

of modes,di§erence between mean and median, etc.).

(b) Discuss how changing the size of the sample alters the distribution of the sample

mean for each of the di§erent variables. Do the results conform with the prediction

of the Central Limit Theorem?

3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp