联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-02-22 09:39

Group coursework 2

Please submit your coursework on Moodle by Midday on 1st of March.

Please upload your answers to Question 1 ii) and Question 2 in one pdf file.

Please also upload three R scripts in .R files for Question 1 i), Question 1 ii) and

Question 2.

Make sure that you have included sufficient comments in the codes to make them

readable by other people. There should be no error messages shown when I run

your R scripts. You can assume that I have installed all required packages.

Question 1 [8 marks]

i) Complete the following myLDA function without using any additional packages.

With the feature matrix X ∈ R

N×p

(N > p) and the label vector y ∈ R

N×1 of

the training data, the myLDA function outputs the linear discriminant w ∈ R

p×1

for

binary classification.

[6 marks]

myLDA <- function(X,y){ This function calculates the linear discriminant for binary

classification.

Input: Feature matrix, X (N by p) and label vector, y (N by 1)

Output: Linear discriminant, w (p by 1)

return(w)

}

ii) Calculate the cosine of the angle between the linear discriminant calculated from

myLDA(X=iris[51:150,-5],y=iris[51:150,5]) and that calculated from

lda(Species~.,data=iris[51:150,]). [You can ignore the warning message from

lda that the setosa class is empty.]

The cosine of the angle between two vectors, u ∈ R

p×1 and v ∈ R

p×1

, is defined as

cos(u, v) = u

T v

||u||2||v||2

,

where ||u||2 =

uTu and ||v||2 =

vT v.

What conclusion can you make from this result? [2 marks]

1

Question 2 [12 marks]

Download the newthyroid.txt data from moodle. This data contain measurements for

normal patients and those with hyperthyroidism. The first variable class=n if a patient

is normal and class=h if a patients suffers from hyperthyroidism. The rest variables

feature1 to feature5 are some medical test measurements.

i) Draw a pairs plot for the newthyroid.txt data. What patterns can you see from

this plot? [2 marks]

ii) Apply kNN and LDA to classify the newthyroid.txt data: randomly split the data

to a training set (70%) and a test set (30%) and repeat the random split 50 times.

Record the 50 AUC values of kNN and LDA in two vectors.

For kNN, repeat 5-fold cross-validation five times to choose k from (3, 5, 7, 9). Use

AUC as the metric to choose k, i.e. choose k with the largest AUC. [5 marks]

Hint: Read http://topepo.github.io/caret/model-training-and-tuning.html#

model-training-and-parameter-tuning to see how to use AUC as the metric to

choose k.

iii) For the first random split, draw the ROC curves of kNN and LDA on one plot.

[2 marks]

iv) Draw two boxplots based on the 50 AUC values of kNN and LDA. [1 mark]

v) What conclusions can you make from the classification results of kNN and LDA on

the newthyroid.txt data? [2 marks]


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp