联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2023-12-03 08:00

Econ78010: Econometrics for Economic Analysis, Fall 2023

Homework #3

Due date: Dec. 4th, 2023; 1pm.

Do not copy and paste the answers from your classmates. Two identical homework will be treated as

cheating. Do not copy and paste the entire output of your statistical package's. Report only the relevant part

of the output. Please also submit your R-script for the empirical part. Please put all your work in one single

le and upload via Moodle.

Part I Multiple Choice (30 points in total, 3 points each)

Please choose the answer that you think is appropriate.

1.1 A nonlinear function

a. makes little sense, because variables in the real world are related linearly.

b. can be adequately described by a straight line between the dependent variable and one of the explanatory

variables.

c. is a concept that only applies to the case of a single or two explanatory variables since you cannot draw

a line in four dimensions.

d. is a function with a slope that is not constant.

1.2 To test whether or not the population regression function is linear rather than a polynomial of order r,

a. check whether the regression for the polynomial regression is higher than that of the linear regression.

b. compare the TSS from both regressions.

c. look at the pattern of the coecients: if they change from positive to negative to positive, etc., then the

polynomial regression should be used.

d. use the test of (r-1) restrictions using the F-statistic.

1.3 In the regression model , Yi = β0 + β1Xi + β2Di + β3(Xi × Di) + ui

, where X is a continuous variable

and D is a binary variable, β3

a. indicates the slope of the regression when D = 1

b. has a standard error that is not normally distributed even in large samples since D is not a normally

distributed variable.

c. indicates the dierence in the slopes of the two regressions.

d. has no meaning since (Xi × Di) = 0 when Di = 0.

1.4 The interpretation of the slope coecient in the model ln(Yi) = β0 + β1Xi = ui

is as follows:

a. 1% change in X is associated with a β1% change in Y.

b. 1% change in X is associated with a change in Y of 0.01β1 .

c. change in X by one unit is associated with a 100β1% change in Y.

d. change in X by one unit is associated with a β1 change in Y.

1.5 The major aw of the linear probability model is that

a. the actuals can only be 0 and 1, but the predicted are almost always dierent from that.

b. the regression R2 cannot be used as a measure of t.

c. people do not always make clear-cut decisions.

d. the predicted values can lie above 1 and below 0.

1.6 In the expression, P r(Y = 1|X1) = Φ(β0 + β1X) ,

a.(β0 + β1X) plays the role of z in the cumulative standard normal distribution function.

b. β1 cannot be negative since probabilities have to lie between 0 and 1.

c.β0 cannot be negative since probabilities have to lie between 0 and 1.

d. min(β0 + β1X) > 0 since probabilities have to lie between 0 and 1.

1

1.7 In the expression Pr(deny = 1| P/I Ratio, black) =Φ (2.26 + 2.74P/I ratio + 0.71black), the eect of

increasing the P/I ratio from 0.3 to 0.4 for a white person

a. is 0.274 percentage points.

b. is 6.1 percentage points.

c. should not be interpreted without knowledge of the regression R2 .

d. is 2.74 percentage points.

1.8 E(Y |X1, ...Xk) = P r(Y = 1|X1, ..., Xk) means that:

A) for a binary variable model, the predicted value from the population regression is the probability that

Y=1, given X.

B) dividing Y by the X's is the same as the probability of Y being the inverse of the sum of the X's.

C) the exponential of Y is the same as the probability of Y happening.

D) you are pretty certain that Y takes on a value of 1 given the X's.

1.9 For the measure of t in your probit regression model, you can meaningfully use the:

A) regression R2.

B) size of the regression coecients.

C) pseudo R2.

D) standard error of the regression.

1.10 Your textbook plots the estimated regression function produced by the probit regression of deny on

P/I ratio. The estimated probit regression function has a stretched S shape given that the coecient on the

P/I ratio is positive. Consider a probit regression function with a negative coecient. The shape would

a. resemble an inverted S shape (for low values of X, the predicted probability of Y would approach 1)

b. not exist since probabilities cannot be negative

c. remain the S shape as with a positive slope coecient

d. would have to be estimated with a logit function

Part II Short Questions (32 points in total)

(10 points) 2.1 Dr. Qin would like to analyze the Return to Education and the Gender Gap. The equation

below shows the regression result using the 2005 Current Population Survey. lnEearnings refer to the logarithem of the monthly earnings; educ refers to the year of education; DF emme is a dummy variable, if the

individual is female, =1; exper is the working experience, measured by year; M idwest, South and W est are

dummy variables indicating the residence regions, while Northeast is the ommited region. Interpret the major

results(discuss the estimates for all variables and also address the question that Dr. Qin wants to analyze.

LnEarnings ? = 1.215 + 0.0899 × educ ? 0.521 × DF emme + 0.0180 × (DF emme × educ)

(0.018) (0.0011) (0.022) (0.0016)

+0.0232 × exper ? 0.000368 × exper2 ? 0.058 × M idwest ? 0.0078 × South ? 0.030 × W est

(0.0008) (0.000018) (0.006) (0.006) (0.006)

n = 57, 863 ˉ R2 = 0.242

(14 points) 2.2 Sports economics typically looks at winning percentages of sports teams as one of various

outputs, and estimates production functions by analyzing the relationship between the winning percentage

and inputs. In Major League Baseball (MLB), the determinants of winning are quality pitching and batting.

All 30 MLB teams for the 1999 season. Pitching quality is approximated by Team Earned Run Average

(teamera), and hitting quality by On Base Plus Slugging Percentage (ops). Your regression output is:

W inpct = ?0.19 ? 0.099 × teamera + 1.49 × ops, R2 = 0.92

(0.08) (0.008) (0.126)

(a) (3 points) Interpret the regression. Are the results statistically signicant and important?

2

(b) (8 points) There are two leagues in MLB, the American League(AL) and the National League (NL). One

major dierence is that the pitcher in the AL does not have to bat. Instead there is a designatedhitter in

the hitting line-up. You are concerned that, as a result, there is a dierent eect of pitching and hitting in

the AL from the NL. To test this Hypothesis, you allow the AL regression to have a dierent intercept and

dierent slopes from the NL regression. You therefore create a binary variable for the American League

(DAL) and estiamte the following specication:

W inpct = ?0.29 + 0.10 × DAL ? 0.100 × teamera + 0.008 × (DAL × teamera)

(0.12) (0.24) (0.008) (0.018)

+1.622 ? ops ? 0.187 ? (DAL × ops)

(0.163) (0.160) R

2 = 0.92

How should you interpret the winning percentage for AL and NL? Can you tell the dierent eect of

pitching and hitting between AL and NL? If so, how much?

(3 points) (c) You remember that sequentially testing the signicance of slope coecients is not the same as

testing for their signicance simultaneously. Hence you ask your regression package to calculate the F-statistic

that all three coecients involving the binary variable for the AL are zero. Your regression package gives a

value of 0.35. Looking at the critical value from the F-table, can you reject the null hypothesis at the 1%

level? Should you worry about the small sample size?

(8 points) 2.3 Four hundred driver's license applicants were randomly selected and asked whether they

passed their driving test (P assi = 1) or failed their test (P assi = 0 ); data were also collected on their gender

(M alei = 1 if male and = 0 if female) and their years of driving experience (Experiencei

in years). By this

data, a probit model is estimated and the result is as the following.

P r(P ass ? = 1) = Φ(0.806 + 0.041Experience ? 0.174M ale ? 0.015M ale × Experience)

= (0.200) (0.156) (0.259) (0.019)

The cumulative standard normal distribution table is appended.

(2 points) (a) Alpha is a man with 12 years of driving experience. What is the probability that he will

pass the test?

(2 points) (b) Belta is a woman with 5 years of driving experience. What is the probability that she will

pass the test?

(4 points) (c) Does the eect of experience on test performance depend on gender? Explain.

Part 3 Empirical Exercise (38 points in total)

For all regressions, please report the heteroskedasticity-robust standard errors.

(16 points) 3.1 Please use vote2023.dta to answer the following questions. The following model can be used

to study whether campaign expenditures aect election outcomes:

voteA = β0 + β1log(expendA) + β2log(expendB) + u_(1)

voteA = β0 + β1log(expendA) + β2log(expendB) + β3prtystrA + u (2)

where voteA is the percentage of the vote received by Candidate A, expendA and expendB are campaign

expenditures (in 1000 dollars) by Candidates A and B, and prtystrA is a measure of party strength for

Candidate A (the percentage of the most recent presidential vote that went to A's party).

(4 points) (i) Please run the regression (1) and report your result in a table. Do A's expenditure aect the

outcome and how? What about B's expenditure? (Hint: you need to rst creat the variables ln(expendA)

and ln(expendB)

(8 points) (ii) Please run the regression (2) and report your result in the same table. Do A's expenditure

aect the outcome and how? What about B's expenditure? Compare result from (i) and (ii), explain whether

we should include prtystrA in the regression or not. If we exclude it, to which direction the coecient of

interest tend to be biased towards?

3

(4 points) (iii) Can you tell whether a 1% increase in A's expenditures is oset by a 1% increase in B's

expenditure? How? Please suggest a regression or test and then answer the question according to your result.

(22 points) 3.2. Use the data set insurance.dta to answer the following questions. Please read the description le to understand the meanings of variables.

For the following questions, please use observations from those who report their health status as healthy

only.

(4 points) (a) Generate a new variable age2 = age ? age. Estimate a linear probability model with insured

as the dependent variable and the following regressors: selfemp age age2 deg_ged deg_hs deg_ba deg_ma

deg_phd deg_oth race_wht race_ot reg_ne reg_so reg_we male married. Please report the regression

outcome in a table. How does health insurance status vary with age? Is there a nonlinear relationship between

the probability of being insured and age?

(4 points) (b) Estimate a probit model using the same regressors as in (a), please report the regression

outcome in the same table as a. How does insurance status vary with age by this model?

(6 points) (c) Please get rid of the variable age2 and estimate the probit model by the left regressors.

Please report the regression outcome in the same table as a. Does throwing away age2 aect the t of the

model? How does insurance status vary with age by this model? Are the self-employed less likely to have

health insurance than wage earners? How does the status of self-employment aect insurance purchase for

individuals aged at 30? For individuals aged at 40?

(4 points) (d) Estimate a logit model using the same regressors as in (c). Pleasue report the regression

outcome in the same table. Is the eect of self-employment on insurance dierent for married workers than

for unmarried workers?

(4 points) (e) Use a linear probability model to answer the question: Is the eect of self-employment on

insurance dierent for married workers than for unmarried workers ? Is your answer consistent with the answer

in (d)?

4


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp