Marketing Analytics – Homework 3
Individual Assignment
MS Section: Due 1 PM October 9
th
MBA Section: Due 1 PM October 10
th
This dataset gives the characteristics of applicants to a major credit card. The key dependent
variable is card, which indicates whether a consumer was approved for a credit card. The
remaining variables contain other relevant information about each consumer. The data on realworld
setting that appeared in Greene, 2003.
However, I have modified the initial dataset, while keeping the relationships between variables
mostly intact. Download the training dataset that corresponds to the last digit of your student
number. That is, if your student number ends in 4, you should download “Homework 3 Training
Data – Number 4.csv”.
Assignment Materials for Download:
1. An Rmarkdown template
2. A dataset corresponding to the last digit of your student number.
Submission Checklist:
To help us grade the assignments efficiently and correctly, we ask that you submit your
assignments in a specific format. A complete submission for this assignment will send the
following to averyhavivgrading@gmail.com:
o A .rmd Rmarkdown file, based on the template for this assignment with all the code used
to estimate your models.
o A .html file, generated by knitting the .rmd file in RStudio.
o An R workspace containing your two chosen models. I have provided code in the
template to save the models for you. Just insert your student number on line 27 of the
template
save(chosenModel, chosenModel2, file = '[student number].Rdata')
o All file names should be ‘[student number].[file extension]’, where you replace everything
the square brackets with the appropriate values.
o Do not archive the files or combine the files (no zips, rars, web archives, iCloud links
etc.). Each of these files should be a separate attachment in the email.
Data Guide:
card: Boolean. Was the application for a credit card accepted?
reports: Number of major derogatory reports.
age: Age in years plus twelfths of a year. income Yearly income (in USD 10,000).
share: Ratio of monthly credit card expenditure to yearly income.
expenditure: Average monthly credit card expenditure.
owner: Boolean. Does the individual own their home?
selfemp: Boolean. Is the individual self-employed?
dependents: Number of dependents.
months: Months living at current address.
majorcards: Number of major credit cards held.
active: Number of active credit accounts.
Part 1: Predictive Analysis (16 Marks)
Now, you will estimate a predictive model to predict whether a consumer is approved for a credit
card, using the dataset that corresponds to the last digit of your student number. This might be
useful to a firm that is selecting which consumers to target, choosing how much to pay for the
contact information of a consumer, or a firm that is simply trying to forecast demand. Firms with
better predictive models will be able to more efficiently target consumers, or make better
purchasing decisions. Similarly, the quality of your predictions will form part of your grade here.
You will submit two predictive models:
a) The first predictive model, stored as chosenModel should use all the data except
expenditure
b) The second predictive model, stored as chosenModel2, can use all the provided
independent variables, including expenditure
Save your models to an R Workspace with the code provided in the template.
To keep the computational burden low, you may only use linear regressions or MARS
models in this section. You can complete this section using the runif, subset, lm,
earth, predict, and mean functions.
Your final submission will include a Rdata file with your two models. We will also look at your
RMD file to see how you trained your model.
The two models will be graded out of 8 marks. The marks will be assigned as follows:
1. Correctly submitting the model will yield 2 out of 8 marks
2. I have held back a sizable portion of each dataset to evaluate your predictions. The
graders will use this to evaluate the quality of your predictions, in terms of average out of
sample mean-squared error. They will look at the distribution of predictions for your data
set, and give marks based on the relative quality of your predictions.
3. If your predictions are in the bottom quartile, we will look at the code you submitted. So
long as your code demonstrates that you followed the recommendations below, you will
receive at least 6 marks out of 8
To improve your predictions, I have the following recommendations:
1. Run at least 10 different model specifications. Someone who assesses 20 models will
find a better model than someone who assesses 2.
2. At the same time, be thoughtful about the models you are running. Look to your
previous model estimates and the data exploration process to see what variables
worked in your context. The best assignment estimated only a fraction of the models
that others did, but they learned with each model they estimated. Other groups
estimated thousands of models, but didn’t think through their approach, and had worse
predictions.
3. Use cross-validation correctly, as described in the notes (including the last step). If you
are in MKT436R, use k-fold cross validation following the steps in the notes. Do not use
the built-in cross-validation in the earth function as you will not get consistent results
4. Tune your model by trying different model specifications. This includes different types,
formulas, and tuning parameters. Vary all three of these.
Please submit all 3 required files.
Bibliography
Greene, W. (. (2003). Econometric Analysis, 5th edition. Upper Saddle River, NJ: Prentice Hall.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。