Data science
Assignment
Due: 5pm EST, 2/21/2025
1. n-gram
Given the training data
<s> John read a book by Jane </s>
<s> John read another book </s>
<s> I read a different book </s>
(a) Calculate bigrams using maximum likelihood estimates (MLE) and fill out the table.
Bigram |
Probability |
Bigram |
Probability |
P(John | <s>) |
|
P(another | read) |
|
P(read | John) |
|
P(book | another) |
|
P(a | read) |
|
P(</s> | book) |
|
P(book | a) |
|
P(I | <s>) |
|
P(by | book) |
|
P(read | I) |
|
P(Jane | by) |
|
P(different | a) |
|
P(<s> | Jane) |
|
P(book | different) |
|
(b) Calculate the sentence probability of <s> John read a different book </s> using only MLE bigram.
(c) Calculate the sentence probability of <s> Jane read a book </s> using only MLE bigram.
2. Evaluation metrics on binary classification
Given the following output,
Actual Label |
Predicted Label |
0 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
1 |
0 |
0 |
1 |
(a) Draw the confusion matrix.
(b) Calculate the Accuracy, Precision, Recall, and F1 score.
(c) Why might using accuracy as the only metric is not ideal?
3. Evaluation metrics on multiclass classification
Given the following confusion matrix of a multi-label classifier
Truth
|
A |
B |
C |
D |
E |
F |
A |
95 |
1 |
13 |
0 |
1 |
0 |
B |
0 |
1 |
0 |
0 |
0 |
0 |
C |
10 |
90 |
0 |
1 |
0 |
0 |
D |
0 |
0 |
0 |
34 |
3 |
7 |
E |
0 |
1 |
2 |
13 |
26 |
5 |
F |
0 |
0 |
2 |
14 |
5 |
10 |
Classifier
(a) Calculate the precision, recall, and F1 for classes A-F
(b) Calculate the micro-average precision, recall, and F1
(c) Calculate the macro-average precision, recall, and F1
4. Text classfication
The drug review dataset provides patient reviews on drugs and a positive and negative rating reflecting overall patient satisfaction. The dataset consists of two files: drug review train .csv for training and drug review test .csv for testing. Both files contain plain-text, UTF8-encoded sample set in a tab-separated format with the following columns:
• Text
• Binary label (0 and 1)
(a) Use BernoulliNB to build a naıve Bayes classifier(¨).
BernoulliNB |
true positive |
false positive |
false negative |
precision |
recall |
F1-score |
positive |
|
|
|
|
|
|
negative |
|
|
|
|
|
|
(b) Repeat the process in Task (a), but use the SVM (SGDClassifier) model.
SGDClassifier |
true positive |
false positive |
false negative |
precision |
recall |
F1-score |
positive |
|
|
|
|
|
|
negative |
|
|
|
|
|
|
(c) Upload the source codes.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。