联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-07-28 11:02

Department of Economics – University of Victoria

Economics 345 “Applied Econometrics” (Summer 2019)

Assignment 4

Due on Monday, July 29

th, 4 pm in Department Dropbox

You are encouraged to work in groups on this assignment, but every student must submit their own

version of the assignment with their own write-up. Please indicate who was in your group by writing

the names and V-numbers of all group members at the top of your submitted assignment. Please

ensure you write legibly – an illegible assignment may lose marks.

Note that copying from existing solution manuals or solutions found online constitutes plagiarism

under the University’s guidelines.

Dummy Variables

1. (5 marks) The following equations were estimated using the data in “bwght”:

bwght i=4.682 - 0.005cigsi+0.155parityi+0.026malei+0.062whitei

(0.016) (0.0008) (0.006) (0.010) (0.012)

n=1,388, R2=0.045, R2=0.043

The variables’ definition can be found in the link:

http://fmwww.bc.edu/ec-p/data/wooldridge/bwght.des.

i (2 marks) Considering the second regression results, comment on the estimated effect and

statistical significance of fatheduc. (i.e., comment on the practical and statistical significance

of fatheduc’s partial effect separately)

ii (3 marks) Recall the content we learned in Chapter 4 (4-5f Testing General Linear

Restrictions). If two linear models are nested, it means that the restricted model is obtained

from the full model by setting some constraints on the parameters). In this question, the first

regression is the restricted model. It is obtained by restricting βmotheduc =0 and βfatheduc=0 in

the full model. Usually we can compare nested models with the F test to see whether the

restrictions are valid. In this questions, it means we need to compute the F statistic for joint

significance of motheduc and fatheduc by using the formula 4.37 in the textbook.

However, from the given information, why are you unable to compute the F statistic? What

would you have to do to compute the F statistic?

2. (11 marks) Use the data in “fertil2” to answer this question. The variables’ definition can be found

in the link: http://fmwww.bc.edu/ec-p/data/wooldridge/fertil2.des. Lab 9 provides the examples of

code that will be used in this question.

i (3 marks) Estimate the following model

2

and report the results with both usual and heteroskedasticity-robust standard errors. Are the

robust standard errors always bigger than the nonrobust ones?

ii (4 marks) Add the two religious dummy variables (protest, catholic). Suppose

heteroskedasticity is present in the equations from part (i) and (ii), can we use F-test here to

test whether the coefficients of protest and catholic are jointly significant? Why or why not?

If we cannot use F-test, what test should we apply? What is the p-value obtained from the

joint test for the coefficients of protest and catholic?

iii (3 marks) Now we move back to the regression in part (i). Choose one of the methods we

learned in lab 10 to test heteroskedasticity. Explain the method you choose. Make a

conclusion about whether heteroskedasticity is present in the equation for children.

iv (1 mark) If you find heteroskedasticity in part (iii), would you say the heteroskedasticity is

practically important?

3. (9 marks) Use the data in “loanapp” for this exercise. The binary variable to be explained

is approve, which is equal to one if a mortgage loan to an individual was approved. The key

explanatory variable is black, a dummy variable equal to one if the applicant was black. The other

applicants in the data set are white and Hispanic. The variables’ definition can be found in the link:

http://fmwww.bc.edu/ec-p/data/wooldridge/loanapp.des.

To test for discrimination in the mortgage loan market, a linear probability model can be used:

approvei

blacki+ control variables+ui

i (1 mark) If there is discrimination against blacks, and the appropriate factors have been

controlled for, what is the sign of β1

ii (3 marks) As controls, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch,

cosign, chist, pubrec, mortlat1, mortlat2, vr, and black*obrat. Attach the estimated regression

results from R (no need to write down the regression equation). Is there evidence of

discrimination against blacks?

iii (2 marks) Using the model from part (ii), what is the effect of being black on the probability

of approval when obrat=32, which is roughly the mean value in the sample? Obtain a 95%

confidence interval for this effect. (Hint: replace black*obrat with black*(obrat-32). Run the

regression to obtain the estimated coefficient and the corresponding standard error of the new

interaction term to construct the confidence interval.)

iv (3 marks) Show the histogram of the predicted values of approve from the model in part (iii).

Are there any predicted values outside of the [0,1] range? Why should we be concerned about

this? (Hint: use hist(variablename) to plot the histogram. Replace variablename with the

name of the variable you generated to store the predicted values of approve.)

Time Series

4 (10 marks). Consider a simple model of a time series yt as a function of its past (using lagged

values):

3

is what we refer to as `stationary’ – its distribution does not change over time, i.e.

i Interpret the model – what does 1 capture?

ii Now consider the model including an additional variable.Is

the marginal effect of on the expected value of ????

still equal to

iii Consider again the model in (1). Suppose you are interested in forecasting ????

for periods t+1,

t+2,…. Show that the predicted deviation of yt from its expected value in period t+2 is

{Hint: first show that (1) can be written in deviations from equilibrium as

. Then consider the predicted deviation at t+1, t+2, t+3.... and use

substitution to show the required result.}

iv Suppose |1| < 1. What does your result from iii) tell you about the predicted deviation from

the model’s estimated equilibrium as the forecasting period increases?

5. (10 marks) Use the data set “consump” for this question.

i Estimate a simple regression model relating the logarithm of real per capita consumption

(log(c)) to the logarithm of real per capita disposable income (log(y)). Report the results in the

usual form (including the standard deviations, the number of observations and the values of R2 and R2). Interpret the equation and discuss statistical significance.

ii Add a lag of the logarithm of real per capita disposable income to the equation from part (i).

Report the results in the usual form. What do you conclude about adjustment lags in

consumption growth? (Hint: you can use “lylag<-log(dplyr::lag(consump$y,k=1))” to

generate a lag of log(y) and then include lylag in the regression.)

iii Add the real interest rate (r3) to the equation in part (i). Report the results in the usual form.

Does r3 affect consumption growth?

iv If we plot the time series of the real consumption, the trend went flat from year 1979 to 1982

before going up again. What is the estimated model if we also include a dummy variable

equal to 1 for years after 1979? Report the results in the usual form and comment.

4

6. Forecasting Competition (55%)

This part of the assignment will require substantial studying on your own as we have not covered much

of forecasting in lectures, however, this task will be highly relevant for real-world application of

econometrics. Each of you must provide your own write-up of your estimation practise and competition

report. Lab 8 and 10 provide the syntax and examples of the code that will be used in this project.

We will have a forecasting competition where each of you (or teams) will create forecasts of the

number of cyclists on the Galloping Goose Trail in Victoria (http://www.ecopublic.com/public2/?id=100117730).

The city makes daily cyclists counts publicly available and you

will build a forecasting model and subsequently predict the number of cyclists on the trail. There will

be a prize for the best (most accurate) forecasts.

The following tasks in part 1) will walk you through creating a simple forecasting model. You can then

build your own model and refine your predictions in part 2).

6.1) Estimation and Practise (20%)

i) Download the estimation data from CourseSpaces (“goose_a4.csv”) on the number of cyclists on

the Galloping Goose. Plot the variable ‘bikes_count’ as a line plot with time on the x-axis.

ii) Estimate a simple regression model, modelling the number of cyclists as a function of weekend and

month dummy variables, and report the results in a well-formatted table.

iii) What can you tell from the results in part (ii)?

iv) On May 28th, 2018 the new cycle lane across the bridge in Victoria was opened. Construct a dummy

variable that is equal to one following the opening of this cycle lane, and zero otherwise. Estimate and

interpret the effect of the opening of this cycle lane on the number of daily cyclists.

v) Create so-called `hind-casts’ – by estimating your model up until the end of the estimation sample

(Dec. 16th , 2018), and then predict the number of cyclists for the next h=15 periods. Discuss and

illustrate how your forecasts compare to the observed numbers of cyclists which are shown in column

`bikes_pred’ in the dataset?

vi) Compute the root-mean-squared forecast error of your forecasts:、

Do this by:

a) Computing the square of the difference between observed, bikes_pred) and your

predicted number of cyclists,

b) Computing the mean of the sum of the squares from part (a), and then taking the square

root.

vii) How does the root-mean-squared forecast error of this model compare to one that omits the

weekend dummy variables?

5

6.2) Forecasting Competition (35%)

i) Download the forecasting data: “goose_competition.csv” from CourseSpaces. It includes daily

observations of the number of cyclists up until July 11

th, 2019.

ii) Think of a general model of the number of cyclists on the Galloping Goose (variable `bikes’ in the

dataset). What regressors you would include (these would be observable and perhaps unobservable

factors that affect the number of cyclists each day, such as days of the week, month of the year, etc.)

Discuss how you could estimate your model from part, which variables are available, what are the

limitations of data availability?

ii) Estimate and describe your best prediction model for the number of cyclists on the Galloping

Goose, and predict the number of cyclists 20 days into the future (up until and including July 31st ) –

the most accurate forecasts will win.

This model could include trends, dummies for weekdays, interaction terms, autoregressive lags, etc.,

perhaps you only want to estimate the model on a shorter sample, or use the full sample. It’s up to

you! There are quite a few variables already in the dataset, but you can also build your own. The latest

date for the actual time-series data used in your forecast model is July 11th, 2019. If you (and your

team) used any actual time-series data dated after July 11th, 2019 (estimated data and the trend and

dummy variables generated based on date are allowed) in your forecast, your forecast would be

disqualified from the competition.

Plot your forecasts and describe your forecasting model carefully.

iii) Upload three files to CourseSpaces (under `Entry for Forecasting Competition’):

(1) Your data file as a “.csv” file. If you worked in a team, save the csv file as “team_name_data.csv”

replacing “team_name” with the name you have come up with (keep it simple please). Otherwise save

the file as “firstname_lastname_data.csv”, replacing “firstname” with your first name and “lastname”

with your last name.

(2) Your forecast results as a “.csv” file. The forecasts must be in a csv file and must take the form

shown below – the first column has to be the date (labelled as date), the second column the number of

predicted cyclists (labelled cyclists) from 2019-07-12 to 2019-07-31, and the third column must contain

all V-numbers of you and your team members. Similarly, if you worked in a team, save the csv file as

“team_name.csv” replacing “team_name” with the name you have come up with (keep it simple please).

Otherwise save the file as “firstname_lastname.csv”, replacing “firstname” with your first name and

“lastname” with your last name.

(3) Your R Markdown file. The R Markdown file should be up and running without error message with

your uploaded csv data file. Again, if you worked in a team, save the Rmd file as “team_name.Rmd”

replacing “team_name” with the name you have come up with (keep it simple please). Otherwise save

the file as “firstname_lastname.Rmd”, replacing “firstname” with your first name and “lastname” with

your last name.

If you worked in a team, please upload one Rmd file, one csv data file and one csv result file for each

team. Again, each of you must submit your own write-up of your report.

The most accurate forecast will be awarded a prize, but forecast accuracy will not affect your assignment

grade. Forecast accuracy will be judged by the lowest RMSE over July 12th

-July 31st

. Good luck!

Format of csv file:

date cyclists v_numbers

2019-07-12 254 (your prediction here) “V...”

6


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp