Analysis of FinTech Data
Assignment #3
Download the data from the following link.
http://www.dropbox.com/sh/62m5jr0t4vpbeyp/AAASXqr3ZUlC71b3FYDxmLJJa?dl=0
NOTE: Use all available resources to solve the problems. You can find a solution to most of the coding
problems from the Internet. Google it, if you are stuck in the middle.
Q1. Load the data to your R system. How many variables and observations are in the data?
Q2. How many are currently employed? How many are self-employed among the currently employed?
Q3. What is the average monthly income of the whole sample? What is the average monthly income of the
currently employed?
Q4. Generate the histogram of “loan_amount.” Can you find some interesting patterns from the graph? Can
you guess the reasons why the graph has the shape?
Q5. Replace the value of “friends_facebook” to NA if the value is 0. What is the average number of Facebook friends of
applicants who have the account in Facebook?
Q6. Generate the scatterplot of “month_of_service” and “credit_score”. Can you find any relationship
between them? What about “monthly_income” and “credit_score”? Confirm the relationship with the
correlation tests.
Q7. Make a new variable, named “automatic_approved,” which has the value “t” if approved by the decision
engine, “f” if rejected by the decision engine, and “NA” if reviewed manually. How many cases are approved
or rejected by their decision engine? How many are classified as “manual review”?
Q8. Compare the automatically approved cases and the automatically rejected cases. Conduct t-tests on
variables available in the dataset to answer the following subquestions.
1) Are they different in “loan_amount”?
2) Are they different in “tenor”?
3) Are they different in “age”?
4) Are they different in “month_of_service”?
5) Are they different in “residential_status”?
6) Are they different in “monthly_income”?
7) Are they different in “bankrupted”?
8) Are they different in “currently_employed”?
9) Are they different in “channel”?
10) Are they different in “language”?
11) Are they different in “credit_score”?
12) Are they different in “friends_facebook”?
13) Are they different in “location_application”?
Q9. Make a new variable, named “automatic_approved_dummy,” which has the value of 1 if
automatic_approved = t, and 0 otherwise. Develop a regression model for approval by the decision engine
using the DV of “automatic_approved_dummy.” Include all relevant independent variables in the model.
Q10. Based on the analysis results above, provide the logic behind the decision engine to judge “approve”.
Q11. Develop the best classification model to reduce their manual jobs. Which classification models will you
choose? What is the sensitivity and specificity of your model? Provide a table that contains the sensitivity and
specificity of your models.
Q12. Given that your classification model is not perfect, the managers have concerns that the new decision
engine based on your classification model can accept the application which should be rejected, or reject the
application which should be accepted. What is your suggestion to address their concerns?
Guideline for Assignment 3:
Submit the answer sheet and R-code used for the analysis to the course Blackboard. Please include your
student number and name in the header of the document. Make your answer sheet formatted as follows:
Times New Roman, 12-point font, double-spaced only (not 1.5), 1-inch margins all around 8.5 x 11-inch
paper (or A4). Your answer sheet should not exceed two A4 pages.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。