联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-12-06 10:45

Coursework Assignment 2


The ‘Diabetes’ Data set (provided in arff. format is available on the Blackboard) contains information about patients affected by the Diabetes disease. The task is to predict if these patients have or have not diabetes (Histology: Yes or No).



Each instance represents individual patients and their various medical attributes along with diabetes classification

Number of Instances: 768

Number of Attributes: 9

1Pregnancies:  Number of pregnancies

2PG Concentration:  Plasma glucose at 2 hours in an oral glucose tolerance test

3Diastolic BP:  Diastolic Blood Pressure (mm Hg)

4Tri Fold Thick:  Triceps Skin Fold Thickness (mm)

5Serum Ins:  2-Hour Serum Insulin (mu U/ml)

6BMI:  Body Mass Index:  (weight in kg/ (height in m)^2)

7DP Function:  Diabetes Pedigree Function

8Age:  Age (years)

9Diabetes:  Whether or not the person has diabetes



You should use the Weka data mining package, which is installed in the university computers and also available to download from: http://www.cs.waikato.ac.nz/~ml/weka/


You should hand in a report covering the following:


a)Select a suitable tree building algorithm and build a model. Describe the validation method you are using (data split for training and test sets). Interpret the output results (the accuracy rates/metrics, which attributes were used to make predictions, how many nodes and leaves you obtained).  

b)Give a detailed technical description of the classification model (which algorithm is used, the tree induction method, which attribute selection criteria is used and how). Include a diagram showing the structure of the model that you built.

c)Vary the following parameters of the algorithm, report changes in the tree structure and accuracy rates:

-Set the ‘REP’ parameter (Reduced Error Pruning) to ‘TRUE’. Explain the meaning of this operation. Report and discuss any change in the model structure and accuracy.

-Change the confidence factor to 15%, report and discuss any impact.

-Set the parameter ‘unpruned’ to ‘TRUE’, Report and discuss impact. Discuss the pruning method used for this algorithm.

d)Use other 2 models of your choice (for example, neural networks or SVM) to predict the histology. Compare results and discuss possible reasons of better or worse performance.

e)Show a confusion matrix for the model and interpret it. Show a ROC curve and a Lift chart and interpret them.

f)Convert a subtree path of the decision tree into a set of rules along the following attributes: Plasma – Mass – Age – Plasma – Pedigree – Class ‘Yes’.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp