联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-09-30 11:09

CIS5200: Machine Learning Fall 2023

Homework 1

Release Date: September 20, 2023 Due Date: October 4, 2023

? You will submit your solution for the written part of HW1 as a single PDF file via Gradescope.

The deadline is 11:59 PM ET. Contact TAs on Ed if you face any issues uploading your

homeworks.

? Collaboration is permitted and encouraged for this homework, though each student must

understand, write, and hand in their own submission. In particular, it is acceptable for

students to discuss problems with each other; it is not acceptable for students to look at

another student’s written answers when writing their own. It is also not acceptable to publicly

post your (partial) solution on Ed, but you are encouraged to ask public questions on Ed. If

you choose to collaborate, you must indicate on each homework with whom you collaborated.

Please refer to the notes and videos posted on the website if you need to recall the material discussed

in the lectures.

1 Written Questions (46 points)

Problem 1: Margin Perceptron (15 points)

Recall the Perceptron algorithm we saw in the lecture. The Perceptron algorithm terminates

once it classifies all points correctly. It does not guarantee that the hyperplane that that it finds

has large margin (γ) despite our assumption that the true hyperplane w? has margin γ where

γ = mini∈{1,...,m}


In this problem, we will consider the following simple modification to the Perceptron algorithm:

Algorithm 1: Margin Perceptron

Initialize w1 = 0 ∈ R

d

for t = 1, 2, . . . do

if ?i ∈ {1, . . . , m} s.t. yi ?= sign

w

?

t xi



or |w

?

t xi

| ≤ 1 then update wt+1 = wt + yixi

else output wt

end

We will show that Margin Perceptron stops after 3/γ2

steps and returns a hyperplane w such that

min

i∈{1,...,m}


∥w∥2

≥ γ/3.

Note that the margin is the distance of the closest point to the hyperplane, and since ∥w∥2 is not

necessarily norm 1, this quantity is given by mini∈{1,...,m}

|w?xi|

∥w∥2

.

As in the lecture, we will assume that ∥xi∥2 ≤ 1 for all i ∈ {1, . . . , m} and ∥w?∥

2

2 = 1.


1.1 (2 points) Show that after every round t, we have

w

?

? wt+1 ≥ w

?

? wt + γ.

1.2 (4 points) Show that after every round t, we have

∥wt+1∥

2

2 ≤ ∥wt∥

2

2 + 3.

1.3 (3 points) Using the above two parts, show that after T rounds,

γT ≤ ∥wT +1∥2 ≤

3T .

Hint: Use the Cauchy-Schwarz Inequality: a

?b ≤ ∥a∥∥b∥.

1.4 (1 point) Use 1.3, to conclude that T ≤ 3/γ2

.

1.5 (4 points) Show that the output hyperplane w satisfies

min

i

|w

?xi

|

∥w∥21

γ

3

.

Hint: You will need to use the results in 1.2 and 1.3 plus the stopping condition of the algorithm.

1.6 (1 point) Why is it desirable to learn a predictor that has margin?

Problem 2: Bayes Optimal Classifier (15 points)

Let η(x) denote the conditional probability of the label being 1 given a point x under the distribution

D. That is

η(x) = Pr[y = 1|x].

Recall that the true risk, under the 0/1 loss, for any classifier f is

R(f) = E

x,y

[1[f(x) ?= y]] .

The Bayes optimal classfier w.r.t. D is the classifier f? that achieves the minimum risk among all

possible classifiers. In this problem, we will work out what the Bayes optimal classifier is.

2.1 (3 points) Show that

R(f) = Ex [η(x)1[f(x) = ?1] + (1 ? η(x))1[f(x) = 1]] .

Hint: Use the fact that Ex,y[·] = Ex Ey|x[·].

2

2.2 (3 points) Use the above to show that the minimum risk possible is

R(h?) = min

f

R(f) = Ex [min(η(x), 1 ? η(x))]

Hint: For a fixed x, think about what the minimum loss is using 2.1.

2.3 (2 points) Show that the Bayes optimal classifier that achieves the above loss is

f?(x) = (

1 if η(x) ≥ 1/2,

?1 otherwise..

2.4 (1 point) Derive the Bayes optimal classifier under the logistic model

η(x) = 1

1 + exp(?w?x)

.

2.5 (6 points) Now suppose we modify the loss function from 0/1 to the following cost-based loss

function

?c(?y, y) =

?

??

??

c if y = 1, y? = ?1

1 ? c if y = ?1, y? = 1

0 if y = ?y.

Here the loss penalizes false negative with cost c and false positive with cost 1 ? c, penalizing

different types of mistakes differently.1

Note that the true risk under this loss is

Rc(f) = E

x,y

[?c(f(x), y)] .

Find the Bayes optimal classifier in this setting.

Hint: Follow the same ideas you used to solve 2.1-2.3 using ?c instead of 0/1 loss.

2 Programming Questions (16 points)

Use the link here to access the Google Colaboratory (Colab) file for this homework. Be sure to

make a copy by going to ”File”, and ”Save a copy in Drive”. This assignment uses the PennGrader

system for students to receive immediate feedback. As noted on the notebook, please be sure to

change the student ID from the default ’99999999’ to your 8-digit PennID.

Instructions for how to submit the programming component of HW 1 to Gradescope are included

in the Colab notebook. You may find this PyTorch reference to be helpful - if you get stuck, it may

be helpful to review some of the PyTorch documentation and functions.

1Let us see why this is a useful loss function. Consider the case of medical diagnosis, high false negative rate

means that we are predicting that patients do not have the disease when they actually do. Such a prediction could

lead to the patient not getting the care they need. In such a setting, you would want c to be closer to 1.

3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp