联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-01 10:01

STAT 603: Homework 7

Due: Thursday, May 2nd.

Directions:

0. You may work in groups to discuss about ideas, but the programming and writing must be your own

work. Copying others’ work/code or allowing others to copy your own work/code are all

considered cheating and plagiarism, and will result in zero point for the whole homework

and F grade for STAT603. Cheating in any coursework is considered serious offense against academic

integrity and University rules.

1. Submit a PDF copy of your homework, R source code, and your label prediction on

Canvas. For the PDF file, you should name it as “myhomework.pdf”; for the R source code, you

should name it as “mycode.R”; for your prediction for the testing data, you should name it as

“myprediction.txt” (See Q8). Only file types of “pdf”, “R” and “txt” will be accepted on Canvas. If

any of these three files are missing online, we won’t grade your homework.

2. Submit a hardcopy of the PDF file “myhomework.pdf” in class. We won’t grade your

homework without a hardcopy.

3. Show all your work! Both source code and key outputs from running your code are required. Simply

giving a final answer or source code without appropriate explanation/key outputs will not receive any

points.

4. Typing answers in RMarkdown or LaTeX is strongly recommended.

In this homework, we continue to work on the MNIST data sets. Recall from Q10 in HW6 that using the

training count data set, for a given digit k (k = 0, 1, · · · , 9), we can get the sample points x1, x2, · · · , xn ∈ R

d

for true digit label k with xi = (xi1, · · · , xid). Then for digit k, its MLE p with d = 49 can be

obtained by

Using the training count data set “mnist_train_counts.csv”, perform the following exercise Q1-Q3.

Q1

For digit k = 5, extract the sub-sample of the training data set that corresponds to the true digit label “5”.

Print out the sample size of this sub-sample.

Q2

For digit k = 5, apply the MLE formula on the extracted sub-sample in Q1 to find the MLE p?k = p?. Print

out your answer.

Q3

Repeat Q2 for each digit k = 0, 1, · · · , 9. For grader to verify your answer, print out a d × 10 matrix that

contains all pk for k = 0, 1, · · · , 9, that is, the jth column of this matrix is pj1.

Next, we will use this “naive” probabilistic model to make prediction for the testing data set

“mnist_test_counts.csv”.

1

Q4

To warm up, suppose we want to make prediction for the 100th data point in the testing data set. Extract

this data point’s count vector x. For grader to check your results, print out x. In addition, find the sample

proportions πk (k = 0, 1, · · · , 9), which are from Q6 in HW6.

To make prediction for the 100th data point with the count vector x, we can use the Bayes rule:

y = arg max

k=0,1··· ,9

πfk(x | pk) = arg max

k=0,1··· ,9,

where the function g(x, p) given x = (x1, · · · , xd) and p = (p1, · · · , pd) is

g(x, p) = logY

xj log pj .

Q5

Write an R function named gfun(x, p), which returns output for g(x, p). For grader to verify your answer,

print out the outputs of gfun(x, p) using the 100th data point’s count vector x and pk for digit k = 5. Note:

when implementing gfun(x, p), how would you handle the possible situation that pj = 0?

Q6

We are now ready to make prediction for the 100th data point. Use the function gfun(x, p) above to calculate

log πk + g(x, pk) for all k = 0, 1, · · · , 9 and find your label prediction y. For grader to verify the results, print

out all these outputs.

Q7

Now let’s look at the true label for the 100th data point. Print out the true label y and I(y 6= y). Does your

prediction give the correct label?

Q8

Repeat the process above to perform prediction for all the data points in the testing data set. Calculate the

misclassification error rate for the the “naive” model by

misclassification rate

is the predicted label, yi

is the true label, and N is the sample size of the testing data set. In

addition, save your label prediction as a “myprediction.txt” file, with the ith row representing your prediction

yi. Specifically, suppose yhat is the vector object that contains your prediction, you should use the following

code to generate the file “myprediction.txt”:

write.table(yhat,file="myprediction.txt",row.names=FALSE,col.names=FALSE,sep="")

Any other format of your prediction file will NOT be graded.

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp