ECON5408 – Applied Econometrics
Term 1, 2024 School of Economics
Instruction (Please read carefully.)
The purpose of this assignment is to give you practice in analysing data, using it to draw conclusions and then write a report. The assignment simulates the research you will encounter when carrying out research using econometric tools.
Cover Sheet. The project Cover Sheet (provided) must be properly illed, which includes your full name and your student number.
Stata Commands. You will need to write your own do-ile for the required data analysis. Some advices are provided in the Stata Commands section. The do-ile you compose will need to be submitted in a separate channel on Moodle.
The Deadline. This assignment counts for 25% of the inal mark. The deadline for submission is 10am, Monday, April 22 (Week 11). Late submission will not be considered. Early submission is allowed after the submission channel opens, but please submit once only. No revision will be accepted after submission.
Submission Format. You may use word to edit your article and compile the output as a pdf ile. A template Word iled is provided, which includes the cover page for this project. You may use Words equation editor to write equations, see the ile Equation Editor Commands for Words in the folder Assignments and Course Project on Moodle. Each student needs to submit a pdf document of this assignment on Moodle site, where a submission channel will be opened at a closer time. There will be a separate channel for submission of pdf document and for submission of do-ile, respectively. Please submit both iles before the deadline. The only way to submit is via the submission channel. Email submissions will not be considered.
Plagiarism vs. Discussing. All assignments will be checked for plagiarism. See notes on Plagiarism here. While discussing the assignments is encouraged, do not lend your assignment to another student. When an assignment is copied, it is difficult for the instructor to determine who the copier is and you may be penalised heavily. It is in your interest to do the assignment independently.
Course Project Description
The Topic and Data
The topic is based on Kenkel and Terza (2001): The effect of physician advice on alcohol consumption
(, also included in the kit), where a major task is to estimate the effect of advice on drinks. The data (KTDATA.DTA) and a do-file ( for reading the data are provided.
This topic involves various issues that may be encountered in empirical research. The issues include endogeneity and some special data features. Mostly, these issues have been discussed in this course and two assignments. You should carry out this project using the tools and techniques
covered in our course (up to the end of Ch17.2) although they may not be perfect for the data.
You are not required to replicate the above Kenkel and Terza (KT) article (as some techniques and methods there are not covered in this course). You should use this article to gain a good
understanding of the topic, motivation, questions of interest, issues involved, and data to be analysed.
The Report
You should read Chapter 19 of Wooldridge to get insights about how to proceed with an empirical
project. You should report your analysis in the following 7 sections. You should limit your report to 8 pages (excluding the cover sheet).
1. Introduction (1 page). You may discuss why the topic is of interest and how it is related to previous literature (referring to two or three related articles discussed in KT). You should outline the
econometric issues, your modelling strategies, and provide a summary of your findings.
2. Data (0.5 page). You may briefly describe the data, including the data source, variable definitions, important descriptive statistics, and the main features of key variables. You should let readers see what you see as important.
3. Conceptual Model (1 page). You may very briefly describe the empirical economic model, on which your econometric models are based. This can motivate your choice of regressors in the econometric
models. You should read Section 2 of KT for this part.
4. Econometric Models (2 pages). You may describe your econometric models in detail, and discuss
how you address various issues in econometric analysis (such as suspected endogeneity and data
features – drinks being nonnegative with many zeros and advice being binary). The main assumptions and estimation method for each econometric model should be briefly discussed. You may need to
complete this section in conjunction with your computation in Stata, which could involve many trial- and-error iterations. See also the “Econometric Analysis” section below.
5. Empirical Results (2 pages). Your results and findings of econometric analysis should be presented in detail in this section. You may use tables for your presentation (e.g., similar to Table 17.3 of
Textbook). You should interpret your results properly, using the tools covered in this course.
Comparing results from different models is a good way to check if your findings are robust or
insensitive to the variations in models and assumptions. You may also want to present the results of relevant tests, which may justify or reject the models and assumptions you use. It is important to
comment on the merits and drawbacks of your econometric models, and discuss possible violation of your main assumptions and biases in your findings.
6. Conclusions (0.5 page). You may reiterate your main findings here, and comment on possible policy implications. You may discuss briefly the remaining issues that you are unable to resolve, and you
may comment on how you would like to tackle them.
7. References (0.5 page). You should list your textbook (if it is used) and articles you have read and used as references.
Econometric Analysis
(a) A goal of this project is for you to explore and apply the knowledge and tools you have learned so far (up to the end of Ch17.2) in a research project. You should be able to comment on the strength and
weakness of your models and methods.
(b) You should briefly explain why some variables are included in, and others are excluded from, an equation. Always pay attention to endogeneity: Is there endogeneity? Do I have valid instruments? Can I test the validity of instruments? Does endogeneity make a difference?
(c) You should start with linear models. While not perfect, linear models can be regarded as a linear
approximation to the true model. It can also serve as a benchmark for comparisons. In particular, we understand well how endogeneity is handled in linear models.
(d) The method we test for endogeneity (see Ch15.5a) can also be used to estimate the regression
coefficients in the presence of endogeneity. This approach, known as “control function” method (see p10-13 of Slides-W2-1b and p13 of Slides-W4-2b), can be extended to nonlinear models. Suppose we want to use (x, Z1) to explain y, where Z1 is exogenous, and x is possibly endogenous. Note that Z1
may involve two or more variables (i.e., it can be a vector). You can think of y = dTinks and x = advice in this context.
Assume that the reduced-form equation for x can be either linear with x = Z1π1 + Z2π2 + v, or probit with x = Φ(Z1π1 + Z2π2) + v. Here, (Z1, Z2) are exogenous, (π1, π2) are parameters, Φ( ⋅) is the standard normal CDF, and v is an error term with E(v |Z1, Z2) = 0. Note that Z2 may involve two or more variables (i.e., it can be a vector).
Further, assume that the structural equation for y can be either linear with y = xy + Z1β + u, or Tobit with y = max{0, xy + Z1β + u). For Tobit, u is an error term that is conditionally normal with u = θv + e, E(e |v, z1, z2) = 0, e ∼ N(0, σ 2), and (y, β, θ) are parameters. The structure of u here takes into account the possible correlation between u and v. The parameter θ can be used to test
whether x is exogenous (when θ = 0, u and v are uncorrelated) or endogenous (when θ ≠ 0, u and v are correlated).
It follows that the structural equation for y can be expressed as y = xy + Z1β + vθ + e for the
linear model, andy = max{0, xy + Z1β + vθ + e) for the Tobit model, where e is normally
distributed and uncorrelated with (x, Z1, Z2). Hence, if we were able to observe v, the OLS estimation would be applicable to the linear model and the maximum likelihood estimation would be applicable to the Tobit model.
As we do not observe v, we use a two-step approach (control function approach). If models are correct, (y, β, θ) can be consistently estimated in two steps:
(i) estimate the reduced-form. equation for x, either the linear model x = z1π1 + z2π2 + v or
the probit model x = Φ(z1π1 + z2π2) + v, and save the residualv(根);
(ii) estimate the structural equation (either linear or Tobit) replacing v byv(根) .
However, the standard errors from Step (ii) can be incorrect because they are based on the first step
estimation. As we did not cover how to correct such standard errors, you may assume the standard errors from Step (ii) are good approximates to the true standard errors, and acknowledge this
For brevity, the above presentation does not include an intercept in the models. In your report, however, all models should include an intercept.
Stata Commands
For Stata commands, you may consult the Stata do-files (from Weeks 2 to Week 5) deposited in the
“Tutorials” folder on Moodle. You may also consult the do-files for Assignments 1 and 2. The following points should also be useful.
. OLS estimation of linear model
regress x z1 z2
predict xhat //Save fitted values
predict vhat, residuals //Save residuals in vhat
test z2 //Test null hypothesis that coef on z2 is zero
. 2SLS estimation of linear model
ivregress 2sls y z1 (x=z2 z3) //2SLS using z2 and z3 as instruments for x predict yhat //Save 2SLS fitted values
predict uhat, residuals //Save residuals in uhat
. Probit estimation
probit x z1 z2
predict xhat //Save fitted values in xhat
generate vhat=x-xhat //Save residuals in vhat
. Tobit estimation
tobit y x z1
predict yhat, ystar(0,.) //Save fitted values in yhat
margins, dydx(x) predict(ys(0,.)) //Find partial effect of x display r(rho)^2 //Display R-squared
. Tobit estimation: Prefix a binary regressor x with “i.”
margins, dydx(i.x) predict(ys(0,.)) //Find partial effect of binary x
