联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2020-03-19 09:12

Assignment 6

Due: 3/18

Note: Show all your work.

Problem 1 (10 points) Consider the following confusion matrix.

predicted class

actual class

Note: C1 is positive and C2 is negative.

Compute sensitivity, specificity, precision, accuracy, F-meassure, F2, and MCC

measures.

Problem 2 (10 points) Suppose you built two classifier models M1 and M2 from the

same training dataset and tested them on the same test dataset using 10-fold crossvalidation.

The error rates obtained over 10 iterations (in each iteration the same

training and test partitions were used for both M1 and M2) are given in the table

below. Determine whether there is a significant difference between the two models

using the statistical method discussed that we discussed in the class (this method is

also discussed in Section 8.5.5, pp 372-373 of the textbook). Use a significance level

of 1%. If there is a significant difference, which one is better?

Iteration M1 M2

1 0.13  0.19

2 0.12  0.1

3 0.09  0.12

4 0.15  0.1

5 0.03  0.07

6 0.07  0.05

7 0.2  0.1

8 0.14  0.11

9 0.12  0.07

10 0.14  0.11

Note: When you calculate var(M1 – M2), calculate a sample variance (not a

population variance).

Problem 3 (10 points). The following table shows a test result of a classifier on a

dataset.

Tuple_id Actual Class Probability

Problem 2-1. For each row, compute TP, FP, TN, FN, TPR, and FPR.

Problem 2-2. Plot the ROC curve for the dataset. You must draw the curve yourself

(i.e., don’t use Weka, R, or other software to generate the curve).

Problem 4 (10 points). This is a practice of comparing performance of classifier

models using ROC curves. You can plot ROC curves using Weka Knowledge Flow.

On the Blackboard course web site, I posted a Weka Manual under Course Documents.

How to use Knowledge Flow is described in Chapter 7. Following the instruction in the

manual (especially Section 7.4.2), build and test Logistic and RandomForest classifiers

on crx-data.arff dataset, and capture the screenshot which shows two ROC curves.

Include this screenshot in your submission. Compare and discuss the performance of

the two models using the ROC curves.

Problem 5 (Extra Credit 10 points). This problem is a practice of using Weka to

perform t-tests to compare performance of classifier models. There is an instruction in

the Experimenter chapter (Chapter 6) of Weka 3.8 Manual. It is your responsibility to

read the manual and learn how to use Weka’s Experimenter to perform t-tests.

For this problem, build three classifier models, Naïve Bayes, Multilayer Perceptron

(neural network), and J48 from the crx-data.arff dataset, which you used in Problem 4.

Then, perform t-tests and determine the ranks of the classifier models based on the test

result. You must show, step by step, all screenshots of Weka Experimenter that you

have gone through and also you must explain how you determined their ranks.

Submission:

Include all answers in a single file and name it lastName_firstName_HW6.EXT. Here,

“EXT” is an appropriate file extension (e.g., docx or pdf). If you have multiple files,

then combine all files into a single archive file. Name the archive file as

lastName_firstName_HW6.EXT. Here, “EXT” is an appropriate archive file

extension (e.g., zip or rar). Upload the file to Blackboard.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp