联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2020-11-06 11:30

PS3 Part 2

by Your Name (your SFU student ID number)

insert date of submission

Part 2 - R Exercises

You are required to use the free software R to complete these questions. Please ``knit’’ the Rmarkdown file to a word document and submit both the .Rmd file as well as the resulting .docx file to Canvas. Make sure that your answers and notes in your code would allow someone to look at your submission and know exactly what is going on / be able to replicate your answers. Replicability is a very important part of programming.

This assignment will be written as a .Rmd file and posted along with the knit .docx document on Canvas. You are welcome to download the .Rmd file and insert your answers below each question. Make sure to put your own name at the top of the document.

1.(3pts) Replicate the simulation exercise by Ariel Muldoon discussed in this blog post but modify the exercise in the following ways:

–Create the data so that there are 50 observations in each group (so you’ll have to change nrep=10 to nrep=50)

–Estimate 10,000 simulations rather than 1000

–Only create the plot for the estimated coefficient (don’t worry about the plot for the estimate of the variance).

Make sure to insert your own comments throughout your code chunks so that you (and we) know that you know what you are doing in each step.

2.(2pts) Address the following questions based on your results in Q1:

–How does your simulation relate to the property of unbiasedness? Write down the definition of an unbiased estimator for this particular setting and describe how your simulation in Q1 helps (or doesn’t help) to demonstrate this property.

–What is the proportion of models that correctly reject a null hypothesis that the true coefficient is equal to zero in your simulation? How does this number compare to the simulation from the blog post? Relate your comparison with the blog post result to the concept of consistency.

3.(2pts) Return to the simulation exercise from Q1. Modify the true value of the coefficient measuring the difference between group1 and group2 to be equal to zero b1=0 and modify the number of replications to be equal to 100 (instead of 10,000 in Q1 or 1,000 in the blog post). Create datasets containing the simulation results from 6 different scenarios where you vary the total number of observations for the groups. Start with 10 obervations per group nrep=10 and gradually increase this as follows: (10, 100, 1000, 10000, 100000, 1000000). So in the final simulation, you should have 1 million observations in each group. In the most clear and efficient way that you can come up with (fewer figures is better if possible), show how the distribution of estimates changes through the six scenarios. Comment on how your results do or do not help to illustrate the concept of consistency of the OLS estimator.

For one of the questions I ask you to run a simulation with 1 million observations in each group. This may create some space issues so if you can't do this or it is taking too long - just skip and stop at 100K obs in each group. You should be able to still answer all the questions.

4.(2pts) From your results in Q3, calculate the proportion of times for each of the six scenarios where you would have falsely rejected the true null hypothesis that the group difference (the coefficient) is equal to zero. Comment on how this proportion changes as the number of observations increases. Also comment on what this proportion should be equal to if we are using 95% confidence interval. Comment on how your results relate to unbiasedness and/or consistency.

5.(2pts) Next, for the scenario with 100 observations per group from Q3, execute simulations where you increase the number of replications by a factor of 10 up to 1M (100, 1000, 10000, 100000, 1000000), and show how the proportion of false rejections of the null hypothesis changes as you increase the number of replications. Comment on how your results relate to unbiasedness and/or consistency.

6.(2pts) For a version of your simulation from Q4 with 100 observations per group and 10,000 replications (this is the set up where the true ) produce the following:

–The plot of the distribution of your estimated group difference (the coefficient)

–Calculate the proportion of times the true null hypothesis is falsely rejected.

Then modify the set up so that the variance of the error (sd) is 2 for the first group and 4 for the second group and produce the same as above (plot of distribution of coefficient and calculate proportion of times true null hypothesis is falsely rejected). How does the distribution of your coefficient compare? How does the proportion of false rejections compare? Explain whether your pattern makes sense given the modification of the error variance. Note: If you can figure out how to overlay the distributions on the same graph or another way to assist in the comparison - all the better!

7.(2pts) Re-produce the second part of Q5 using heteroskedasticy-robust standard errors instead of the default homoskedastic standard errors produced by the lm() package. To accomplish this, I recommend following example 1 in Grant McDermott’s blog up to the point where he substitutes the robust variance-covariance matrix (“VCOV”" in his blog) into the original model. The modified results from coeftest() are what you want to save in each simulation so you’ll have to modify your simulation function to capture these results instead of the results from lm(). Note that I would like for you to use the HC1 version of the standard errors rather than the default HC3 from the vcovHC() function. You should be able to do this by adding vcov=vcovHC(m, type="HC1") to the coeftest() part rather than vcov=vcovHC.

Comment on how the distribution of your estimate and proportion of false rejections change in this scenario where you use heteroskedasticity-robust standard errors. To help inform your discussion, you could also plot the distrubtion of the estimated standard error of the coefficient and show how this varies when assuming homoskedasticity vs. allowing for heteroskedasticity.

Good luck!


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp