CEE 6640 Fall 2019
HW3: Conditional Logit
Due: 10/11/2019
Note: For submission please prepare a zip file containing your written report and your R
code. Name your zip file using the following the format: HW3 FamilyName GivenName.zip
(Only zip files will be accepted.)
Part 1: EMPIRICS
I Health Tests for Tay–Sachs (TS) and Cystic Fibrosis (CF).
Problem and Data Description
For this problem you will work with the data set tay_sachs.xlsx. This data set contains
4176 records of 216 subjects, each facing 16 choice situations with 4 alternatives: subjects
were asked whether they would choose to receive diagnostic tests for Tay–Sachs (TS) disease,
Cystic Fibrosis (CF), both, or neither (the 4 alternatives). Covariates include cost of the
test, whether the person’s doctor recommends to take the test, risk factor, and alternative
specific constants (ASCs). Sample members are from the general population. Hint: the data
is in the long shape. Reshape the data set using mlogit.data().
The following table describes the variables in the data set:
Variable Description Type/Level
id Individual ID nominal
cid Choice occasion ID nominal
alt Alternative ID nominal
choice 1 if chosen, 0 otherwise outcome
(binary)
asc_ts ASC for TS test 0, 1
asc_cf ASC for CF test 0, 1
asc_ts_cf ASC for both tests 0, 1
cost_ts Cost to patient of being tested for TS (0,150, 300, 600) /
1000
cost_cf Cost to patient of being tested for CF (0, 375, 750, 1500)
/ 1000
cost_ts_cf Cost to patient of being tested both TS and CF (0, 150, . . . , 1800
, 2100) / 1000
recommended Whether doctor recommends patient to have a test -1(no), 1(yes)
chance The chance that patient is a carrier even if the test
is negative
(15, 30, 45, 60)/10
1
Variable Description Type/Level
couple Whether patient is told carrier status as an
individual or as a couple
-1(individual),
1(couple)
risk_ts Risk of being a carrier for TS log base 10 of
(.004, .04, .4,4) x
10ˆ3
risk_cf Risk of being a carrier for CF log base 10 of
(.004, .04,. 4,4) x
10ˆ3
Using the health care data set, answer the following questions:
Questions:
EIQ1. (5 pts)
Using 80% of the sample (random subsample), train a conditional logit (MNL) model with
ASCs using the gmnl() package in R. Consider all attributes provided in the data set.
Discuss your results in terms of interpretation of the sign (as marginal utility), magnitude
(as odds ratios), and statistical significance of the estimates. Note: The data set is in panel
structure. So, we need to sample 80% of the subjects. Please use the following seed number:
set.seed(6640).
EIQ2. (10 pts)
First, use the training sample to show that the predicted shares are exactly the same as the
actual shares. Second, repeat the same prediction exercise for the testing data (remaining
20% of the observations). Discuss your results.
EIQ3. (5 pts)
What happens with the estimates and the predicted market shares if you change the reference
alternative. Discuss your results.
EIQ4. (5 pts)
Using the training dataset, estimate a model with only ASCs. What happens with the
predicted market shares for the training and testing datasets? Discuss your results.
II Residential Heating Systems.
Problem and Data Description
For this problem you will work with the file heating_system.xlsx. This data set contains
residential heating choices of 900 households, with a choice set of 5 alternatives. There are
2
2 alternative-specific attributes (installation and operating costs) and 4 household-specific
variables, as described in the following table.
Variable Description Type/Level
idcase Individual id Nominal
depvar Choice of heating system: one of gc (gas
central), gr (gas room), ec (electric central), er
(electric room), hp (heat pump)
categorical
ic.j Installation cost for heating system j (defined for
the 5 heating systems)
continuous
oc.j Annual operating cost for heating system j (defined
for the 5 heating systems)
continuous
income Annual income of the household continuous
agehed Age of the household head continuous
rooms Number of rooms in the house continuous
region Regional location of the house categorical
Questions:
EIIQ1. (5 pts)
Using the gmnl() function for the full sample, estimate a conditional logit model using as
predictors ASCs, installation cost (ic), operation cost (oc), household income (income), age
of the household head (agehed) and number of rooms in the house (rooms). Discuss sign
and significance of the estimates. Hint: the data is in the wide shape. Reshape the data set
using mlogit.data().
EIIQ2. (10 pts)
Consider a 1% increase in operation cost (oc) of central gas heating (gc). Using your own
code, provide estimates of the direct and cross probability elasticities. Discuss your results.
Hint: for your code, use the the expressions of the elasticities that were reviewed in lectures.
EIIQ3. (10 pts)
Consider now a 10% increase in operation cost (oc) of central gas heating (gc). Use your
elasticity code, as well as your own logit probability code, to determine the percent change
in the choice probabilities of all alternatives for all individuals in the sample. Discuss your
result. Hint: You have to check that the percent change that you obtain from the elasticty
code and that from the probability evaluation code are the same.
EIIQ4. (5 pts)
Take a look at the elasticities you produced, are they linear? Discuss.
3
EIIQ5. (5 pts)
Is the IIA property reflected in your elasticity calculations?
Part 2: METHODS
MI
Consider the following structural model
and its measurement equation
is a latent variable we don’t observe, yi
is what we observe in the data set, and xi
is
a K × 1 vector of predictors.
MIQ1. (10 pts)
Write the likelihood function which, if maximized, will yield an estimator for the model’s
parameter.
MIQ2. (5 pts)
Suppose now that εi
| xi
iid∼ Λ (0, 1) (Logistic). Write the specific log-likelihood function.
MIQ3. (5 pts)
Provide the MLE for β, say bβML, and interpret meaning of the parameters in this model.
MII
Suppose that a person is faced with three discrete choices 1, 2, and 3, depending on the value
of a latent variable.
is a utility function (or latent variable), which we don’t observe, but individual i
observe, yi
is the observed choice, and β ⊂ B a K-dimensional parameter space and µ1, µ2
are unknown parameters.
4
MIIQ1. (10 pts)
Write the likelihood function which, if maximized, will yield estimators for the model’s
parameters, β, µ1, µ2.
MIIQ2. (5 pts)
Suppose now that εi
| xi
iid∼ Λ (0, 1) (Logistic). Write the specific log-likelihood function.
MIIQ3. (5 pts)
Provide the MLE for β, say bβML, and interpret meaning of the parameters in this model.
5
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。