代做Data set、代写Python，Java编程语言、代做c/c++设计-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

The ‘Diabetes’ Data set (provided in arff. format is available on the Blackboard) contains information about patients affected by the Diabetes disease. The task is to predict if these patients have or have not diabetes (Histology: Yes or No).

Each instance represents individual patients and their various medical attributes along with diabetes classification

Number of Instances: 768

Number of Attributes: 9

1Pregnancies: Number of pregnancies

2PG Concentration: Plasma glucose at 2 hours in an oral glucose tolerance test

3Diastolic BP: Diastolic Blood Pressure (mm Hg)

4Tri Fold Thick: Triceps Skin Fold Thickness (mm)

5Serum Ins: 2-Hour Serum Insulin (mu U/ml)

6BMI: Body Mass Index: (weight in kg/ (height in m)^2)

7DP Function: Diabetes Pedigree Function

8Age: Age (years)

9Diabetes: Whether or not the person has diabetes

You should use the Weka data mining package, which is installed in the university computers and also available to download from: http://www.cs.waikato.ac.nz/~ml/weka/

You should hand in a report covering the following:

a)Select a suitable tree building algorithm and build a model. Describe the validation method you are using (data split for training and test sets). Interpret the output results (the accuracy rates/metrics, which attributes were used to make predictions, how many nodes and leaves you obtained).

b)Give a detailed technical description of the classification model (which algorithm is used, the tree induction method, which attribute selection criteria is used and how). Include a diagram showing the structure of the model that you built.

c)Vary the following parameters of the algorithm, report changes in the tree structure and accuracy rates:

-Set the ‘REP’ parameter (Reduced Error Pruning) to ‘TRUE’. Explain the meaning of this operation. Report and discuss any change in the model structure and accuracy.

-Change the confidence factor to 15%, report and discuss any impact.

-Set the parameter ‘unpruned’ to ‘TRUE’, Report and discuss impact. Discuss the pruning method used for this algorithm.

d)Use other 2 models of your choice (for example, neural networks or SVM) to predict the histology. Compare results and discuss possible reasons of better or worse performance.

e)Show a confusion matrix for the model and interpret it. Show a ROC curve and a Lift chart and interpret them.

f)Convert a subtree path of the decision tree into a set of rules along the following attributes: Plasma – Mass – Age – Plasma – Pedigree – Class ‘Yes’.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：SSA留学生代做、代写Computer Vision、Python编程设计调试、Python语言代做

【下一篇】：SSA留学生代做、代写Computer Vision、Python编程设计调试、Python语言代做

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代做Data set、代写Python，Java编程语言、代做c/c++设计

日期：2019-12-06 10:45

相关文章