Applications of Econometrics
Assessed Group Project
Spring 2024
Due: 4pm, Thursday 21 March 2024
This project has two parts. In the first part we study the problem of forecasting average wages from a time series perspective. The wage level in an economy is an important macroeconomic variable because it is informative on the cost of labour and factors into firms’ decision making on investment and capital allocation. In the second part we study labour supply elasticities from a panel perspective. Labour supply elasticities are a key concept in economics and e.g. used to predict the impact of public policies. As you will see below we will approach these topics with a focus purely on applying the empirical methods we cover in this course. You are not expected to read up on the state-of-the-art methods to e.g. estimate labour supply elasticities.
In the first part of the project we use FRED data. In the second part we use the Survey of Income and Program Participation (SIPP). To prepare the datasets for analysis please see the Sections ’Preparing the Time Series Data’ and ’Preparing the Panel Data’ below.
• Groups have to submit a word/pdf file that has answers to the questions below along with a dofile that has all the commands in it that the group used.
• Both the word/pdf document and the dofile have to be submitted before the deadline. Projects sub- mitted without a dofile will incur the default penalty of a late submission.
• Answers to questions should be limited to 3 pages per question (1-2 pages is likely enough). Question 3 consists of 3 subquestions so 9 pages total. The entire project paper should not exceed 21 pages. This is a maximum, not a guideline. Font size between 10pt and 12pt is ok. Page margins, line spacing etc are up to you.
• The dofile should be written in such a way that anyone with access to the raw data files can replicate the analysis.
• Stata outputs (tables/figures) have to be included in the document. It is not enough to refer to outputs that are only included in the Stata log/dofile. If a result isn’t shown in the pdf/word document it doesn’t count.
• R is allowed (replace the word ’dofile’ with ’R script’) and we have tutors who know R, but in general it will probably be easier in Stata because all the lab materials are in Stata and that’s what most of the teaching staff use. So feel free to use R but don’t expect equal levels of support. You can import Stata files into R using the ’foreign’ package.
• Wherever possible try to convert raw Stata/R output into a nice looking table/figure. Regression output can be converted to a table using e.g. outreg2. You will have to install those programs first, e.g. by doing ’net install outreg2’ and then you can get help on how to use the command with ’help outreg2’ .
• Before submission groups have to declare that the project is their own work. There is no separate form. to complete, it can be done directly on Learn.
• Make sure that you are aware of the requirements for appropriate citation of references and data sources. Read the guidance on plagiarism in Section 4.4.1 of the Economics Honours Handbook and/or the general University guidance. If you include anything from another source it must be properly acknowledged, whether it’s a figure/table or a text passage or anything else.
• You are welcome to ask questions on piazza or come to helpdesks. We will try to help as much as possible with data preparation and Stata commands and are of course happy to clarify where things are unclear. We will generally not answer questions along the lines of ’is it correct/enough if I do x’ or ’how do I do x’ unless it is a specific technical question. We aim to be fair to all students.
Time Series Questions
For this part we use two main time series covering the U.S.: wages (hourly earnings) and labour turnover. These are available from FRED at the monthly level from 2006 to 2023. The basic goal will be to forecast wages using turnover data. This is a standard problem for forecasters since both workers and firms are very interested in knowing how wages will grow. Labour turnover is one important variable to make these forecasts. It is one of the key variables capturing the dynamics of the labour market. We recommend using levels (not logs) of both variables for simplicity (logs lead to complications when forecasting).
(1) Plot the time series for wages and turnover over time. Make sure you label the axes correctly. Test whether trends and seasonality are present and discuss your findings both in terms of what is visible in the figures and what you find through your tests. [10 points]
Hint: You can either plot the two series separately or combine them into one figure. If you combine them make sure to have two separate y-axes. Note that the FRED data are adjusted for seasonality, so our test here is mostly a test whether their adjustment worked.
(2) Investigate whether wages and turnover likely have a unit root or not. Discuss your findings. In particular, explain what can be done if we want to use them in regressions if they are not stationary (don’t forget to incorporate your findings from (1)). [10 points]
(3) Try to build a model that can be used to forecast wages incorporating your findings from (1) and (2). A starting point might be
waget = β0 + β1turnovert- 1 + β2 waget- 1 + ut
but this potentially has to be adjusted for trends and unit roots depending on your findings. [30 points] Hint: Note that we don’t include turnover in t in this regression because then we can’t eaily make a forecast without first forecasting turnover. You can also do this, there’s no need to separately forecast turnover in this question (it’s fine to use one-step-ahead forecasts) . This also means we are not interested in a VAR, we only want to forecast wages. You can ignore serial correlation in the error term. Also note that we don’t expect you to write a dissertation on this question. It is ok to keep it simple. E.g. testing for 3-5 lags is fine.
(a) Use in-sample criteria (e.g. R-squared, adjusted R-squared) to decide which is the ’best’ model (e.g. how many lags). Explain your results.
(b) Use out-of-sample criteria (e.g. RMSE, MAE) to decide which is the ’best’ model (e.g. how many lags). Explain your results.
Hint: To do this you have to decide which part of the sample you want to use to estimate the parameters of the model, and which part to use for evaluating the forecasts. One way is to use everything except the last year for estimation, e.g. by adding ’if year(dofm(monthly_date)) < 2023 ’ to your regression commands. You can then calculate the forecast errors for all ob- servations in 2023 and summarise them using RMSE or MAE, e.g. let’s say your predictions (one-step-ahead forecasts) for 2023 are stored in the variable ’ f ’. Then the forecast errors can be obtained with ’generate e = wage - f if year(dofm(monthly_date)) == 2023 ’. To get the RMSE we’d have to square them, take the average, then take the square root of the average.
(c) Decide which model (a or b) you think is best for forecasting and briefly explain why. Using this model calculate the point forecast for the wage in the first month after the sample period (January 2024 in our data) as well as the 95% forecast interval. Discuss the sources of uncertainty in this forecast.
Panel Questions
For this part we use the SIPP panel dataset at the individual level. We want to study labour supply elasticites, i.e. the effect of a change in wages on hours worked. Time is measured in months and the panel entity is an individual respondent. You can find our prepared dataset ’part2_panel. dta’ on Learn. Throghout we focus on ’prime-age’ individuals, i.e. ages 25-54. To prepare the data yourself see the Section ’Preparing the SIPP Data’ below. Going over this is necessary if you want to add additional variables and could be helpful to understand how the variables are constructed and what they measure.
(4) Provide some descriptive statistics for your sample, such as the mean, minimum, and maximum of key variables (wages, hours, age, etc). Make sure you provide clear indications of what you are reporting. This means do not include the raw variable names in the table. Instead, use a descriptive label like ’hourly wages in $’. Then estimate the labour supply elasticities for women by pooled OLS (POLS) and interpret your results. We usually do this by regressing log hours on log wages. Run this regression once without controls, once with controls and compare them. [10 points] Hint: Include your own choice of control variables. Some suggestions: time trends, seasonality, age, education, marital status, whether there are children in the household. We provide simple to use education variables in the prepared data. They are called ’edu_lessthanhs’, ’edu_hs’ and so on. It also makes sense to account for seasonality here. The SIPP variables are not already de-seasonalised. Just in terms of terminology: ’Regressing log hours on log wages’ means log hours is the dependent variable and log wages is a regressor.
(5) Estimate the labour supply elasticities for women using first differences and fixed effects and compare your estimates to the POLS results in (4), explaining why they might be different. For this comparison to be meaningful it makes sense to include the same controls as far as possible. Discuss which estimates we likely trust most. [10 points]
Hint: To be able to comment on which estimates we trust most it makes sense to check for serial correlation in the error term to be able to say something about efficiency (and not just bias/consistency) .
Preparing the Time Series Data
In this section we provide basic instructions how to download the datasets and make them ready for analysis. First you have to download the FRED data on wages and turnover. You need these three series:
• Average Hourly Earnings of All Employees, Total Private https://fred.stlouisfed.org/series/ CES0500000003
• Hires: Total Nonfarm https://fred.stlouisfed.org/series/JTSHIR
• Total Separations: Total Nonfarm https://fred.stlouisfed.org/series/JTSTSR
We recommend downloading them in CSV format. You can then import them into Stata using e.g. code like this:
// Set path where Stata dataset will be stored
global datapath "C:\Desktop\AofE"
// Change to the folder where you downloaded the CSV data
cd "C:\Users\AofE\Downloads"
// import csv of wage data into Stata
clear
import delimited CES0500000003.csv
rename ces wage
label var wage "Average Hourly Earnings of All Employees, Total Private"
// save the dataset
compress
save "$datapath/wages", replace
// import csv of hires data into Stata
clear
import delimited JTSHIR.csv
rename jtshir hires
label var hires "Hires: Total Nonfarm"
// save the dataset
compress
save "$datapath/hires", replace
// import csv of separations data into Stata
clear
import delimited JTSTSR.csv
rename jtstsr separations
label var separations "Total Separations: Total Nonfarm"
// save the dataset
compress
save "$datapath/separations", replace
This gives us three Stata datasets containing the three FRED series. We can then merge them together and create our turnover variable as the sum of hires and separations using code like this:
// merge the FRED data together
use "$datapath/wages", clear
merge 1:1 date using "$datapath/hires", nogen keep(match)
merge 1:1 date using "$datapath/separations", nogen keep(match)
// create turnover variable
g turnover = separations + hires
// create time indicator
g monthly_date = mofd(date(date,"YMD"))
format %tm monthly_date
sort monthly_date
// declare time series data
tsset monthly_date
// keep up to December 2023
keep if monthly_date <= ym(2023,12)
compress
save "$datapath/part1_timeseries", replace
This gives us a suitable dataset (part1_timeseries.dta) to conduct the time series analysis. Because we declared it as a time series dataset we can now use time series operators to create differences and lags, see help tsvarlist.
Preparing the Panel Data
You can use our prepared dataset on Learn (part2_panel.dta) and skip this section. But if you’re interested in creating your own exctract, for example to add additional control variables, or if you want to understand how we created our wage, hours and education variables then this is for you.
The SIPP is a household panel dataset with detailed information for a sample of U.S. households. It is representative for the U.S. population and has been used in many applied research projects. You can find all the raw datasets at https://www.census.gov/programs-surveys/sipp/data/datasets.html. These datasets can be very big so you might have to use some tricks to be able to even open them in Stata (for example by specifying the variables you want to import when using ’use’). Each file (wave) contains 12 months of a year, so we have the same person roughly 12 times per wave.
In a first step we simply open the dataset, convert all variable names to lower case, and keep only the variables we want. Then we generate a variable that contains the year covered by the survey wave (the wave released in 2022 covers questions asked about 2021). Then we compress and save. Here’s the example for the first wave in 2018:
// Set your own working directory
cd "/home/data"
// Type in path/folder where you downloaded the dataset to
global datapath "US-SIPP"
//====================================
// Load SIPP waves
//====================================
// Prepare 2018
use "$datapath/pu2018", clear
rename *, lower
// if you want to add more variables (e.g . to add other controls) then add them here
keep eplaydif eddelay tjb1_mwkhrs tjb1_msum esex ems erp spanel ssuid erace rmesr ///
tage eeduc edisabl ehltstat emd_scrnr emc_scrnr epr_scrnr efree_lunch edaycare tutils tosavval pnum /// tjb1_occ tjb1_ind ejb1_scrnr eafnow monthcode tpearn tmwkhrs rwksperm tage rmwkwjb ///
twkhrs1-twkhrs5 rpubmth rpubtype2 rpritype1 wpfinwgt rsnap_mnyn ems_ehc rprimth
// Generate reference year
// the survey released in year x covers the observation period x-1
gen refyear = 2017
lab var refyear "Year which the wave refers to"
// keep only prime-age workers
keep if tage >= 25 & tage < 55
// compress to save space
compress
save "$datapath/pu2018_prime", replace
If you want to add additional variables a helpful command is lookfor. This searches through the labels to find a search term. For example, you could find all variables that have ’children’ in the label by using ’lookfor children ’.
Once we have imported all years we have to assemble them into one dataset. Check out our dofile ’prepare_panelpart.do’ to see how this is done. We save the assembled dataset as ’part2_panel.dta’. With this we can start creating our own variables. For example, here’s how we create the wage variable:
// generate a wage variable based on total earnings and hours of work
// Due to measurement error we usually don’t use the reported hours but just look
// at full-/part-time when we’re interested in labour supply elasticities
g ftpt_hours = .
replace ftpt_hours = 0 if tmwkhrs == 0
replace ftpt_hours = 20 if tmwkhrs > 0 & tmwkhrs <= 25
replace ftpt_hours = 40 if tmwkhrs > 25 & tmwkhrs < .
// Divide total monthly labour earnings by weeks worked times normed hours
g wage = tpearn / (ftpt_hours*4*rmwkwjb/rwksperm)
Check out our ’prepare_panelpart.do’ code for how we created other variables.
To work with the panel data we need to create a unique person id that lets Stata know what the panel unit is. You could do this as follows.
egen id = group(ssuid pnum)
g monthly_date = ym(refyear,monthcode) xtset id monthly_date
Finally, adding additional variables or determining what the codes correspond to can be a bit tricky. We
show you an example for how to generate a dummy for ’married’ here. First we need to find any variable that has ’married’ in the label:
lookfor married
> storage display value
>variable name type format label variable label
>---------------------------------------------------------------------------
>ems byte %12.0g Is . . . currently married, . . .
tab ems
> Is . . . |
> or never |
> married? | Freq . Percent Cum .
>------------+-----------------------------------
> 1 | 502,104 54.07 54.07
> 2 | 17,628 1.90 55.96
> 3 | 11,412 1.23 57.19
> 4 | 113,676 12.24 69.43
> 5 | 24,576 2.65 72.08
> 6 | 259,308 27.92 100.00
>------------+-----------------------------------
> Total | 928,704 100.00
Then we need to find out what ’1’, ’2’ etc correpond to. You can find this in the SIPP Codebook available
on the Census Bureau SIPP homepage. Here’s the entry for ’ems’:
Now we are ready to label the ems values and create a dummy for ’married’.
label define ems 1 "1 . Married spouse present" 2 "2 . Married spouse absent" 3 "3 . Widowed" /// 4 "4 . Divorced" 5 "5 . Separated" 6 "6 . Never married"
label values ems ems
tab ems
>Is . . . currently married, |
> widowed, divorced, |
> separated, or never |
> married? | Freq . Percent Cum .
>--------------------------+-----------------------------------
> 2 . Married spouse absent | 17,628 1.90 55.96
> 3 . Widowed | 11,412 1.23 57.19
> 4 . Divorced | 113,676 12.24 69.43
> 5 . Separated | 24,576 2.65 72.08
> 6 . Never married | 259,308 27.92 100.00
> +-----------------------------------
> Total | 928,704 100.00
g married = ems == 1 | ems == 2
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。