联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Matlab编程Matlab编程

日期:2025-02-20 06:15

Data science

Assignment

Due: 5pm EST, 2/21/2025


1.  n-gram

Given the training data

<s>  John  read  a  book  by  Jane  </s>

<s>  John  read  another  book  </s>

<s>  I  read  a  different  book  </s>

(a)  Calculate bigrams using maximum likelihood estimates (MLE) and fill out the table.

Bigram

Probability

Bigram

Probability

P(John | <s>)

 

P(another | read)

 

P(read | John)

 

P(book | another)

 

P(a | read)

 

P(</s> | book)

 

P(book | a)

 

P(I | <s>)

 

P(by | book)

 

P(read | I)

 

P(Jane | by)

 

P(different | a)

 

P(<s> | Jane)

 

P(book | different)

 

(b)  Calculate the sentence probability of <s>  John  read  a  different  book  </s> using only MLE bigram.

(c)  Calculate the sentence probability of <s>  Jane  read  a  book  </s> using only MLE bigram.


2. Evaluation metrics on binary classification

Given the following output,

Actual Label

Predicted Label

0

0

1

1

0

1

0

1

1

1

0

0

1

1

0

1

1

0

0

1

(a)  Draw the confusion matrix.

(b)  Calculate the Accuracy, Precision, Recall, and F1 score.

(c)  Why might using accuracy as the only metric is not ideal?


3. Evaluation metrics on multiclass classification

Given the following confusion matrix of a multi-label classifier

Truth

 

A

B

C

D

E

F

A

95

1

13

0

1

0

B

0

1

0

0

0

0

C

10

90

0

1

0

0

D

0

0

0

34

3

7

E

0

1

2

13

26

5

F

0

0

2

14

5

10

Classifier

(a)  Calculate the precision, recall, and F1 for classes A-F

(b)  Calculate the micro-average precision, recall, and F1

(c)  Calculate the macro-average precision, recall, and F1


4.  Text classfication

The drug review dataset provides patient reviews on drugs and a positive and negative rating reflecting overall patient satisfaction.  The dataset consists of two files:  drug   review train .csv for training and drug   review test .csv for testing.  Both files contain plain-text, UTF8-encoded sample set in a tab-separated format with the following columns:

  Text

•  Binary label (0 and 1)

(a)  Use BernoulliNB to build a naıve Bayes classifier(¨).

BernoulliNB

true positive

false positive

false negative

precision

recall

F1-score

positive

 

 

 

 

 

 

negative

 

 

 

 

 

 

(b)  Repeat the process in Task (a), but use the SVM (SGDClassifier) model.

SGDClassifier

true positive

false positive

false negative

precision

recall

F1-score

positive

 

 

 

 

 

 

negative

 

 

 

 

 

 

(c)  Upload the source codes.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp