联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-02-04 05:23

DS 5220 - Spring 2019

Homework 1

Please follow the homework submission instructions provided on Piazza.

Due on Blackboard before midnight on Friday, January 18 2019.

Each part of the problems 5 points

1. [Analytical question] Consider two Normally distributed random variables Y1and Y2 with

expected values μ1 and μ2, variances σ21

and σ22, and correlation ρ.

(a) State the joint probability distribution of these random variables. State it twice:

once in a non-matrix and the second time in a matrix form. Explain the meaning

of each term.

(b) Use Bayes theorem to derive the conditional probability distribution of Y1|Y2 and

of Y2|Y1

(c) Does the correlation or the parameter of linear regression depend on whether we

want to predict Y1 as function of Y2, or Y2 as function of Y1?

(d) Use the derivations above to explain the difference between the coefficient of correlation

and the slope of linear regression.

2. [Analytical question] Consider the following loss functions for error terms ei, i =1, . . . , N in linear regression. For each loss function, (i) state whether it is convex,

(ii) provide a mathematical proof, and (iii) explain how it can be useful in the context

of linear regression.

(a) Quadratic loss (related to mean squared error, L2 norm) L =

PN

i=1 e2i

(b) Mean absolute error (L1 norm) L =PN i=1 |ei|

(c) Huber loss (smooth mean absolute error) with parameter δ

L =XN

i=1l(ei), where l(e) =  12e2, if |e| ≤ δδ|e| ? 12δ2, if |e| > δ

3. [Analytical question] For linear regression Yi = θ0 + θ1Xi + ei, i = 1, . . . , N minimizing

squared loss:

(a) Write down the likelihood on the training data, and analytically derive the maximum

likelihood solution for parameter estimates.

(b) Calculate the gradient with respect to the parameter vector.

(c) Write down the steps of the (batch) gradient descent rule.

(d) Write down the steps of the stochastic gradient descent rule.

1

4. [Implementation question]

(a) Overlay graphs of the loss functions in question 2 for a range of e (consider two

different values of δ for Huber loss). Use the graph to discuss the relative advantages

and disadvantages of these loss functions for linear regression.

(b) Implement gradient descent for the loss functions above.

(c) Implement stochastic gradient descent for the loss functions above

5. [Implementation question] In this question we will revisit JW Figure 3.3, and empirically

evaluate various approaches to fitting linear regression.

(a) Simulate N=50 values of Xi

, distributed Uniformly on interval (-2,2). Simulate the

values of Yi = 3 + 2Xi + ei

, where ei

is drawn from N (0, 4). Fit linear regression

with squared loss to the simulated data using (i) analytical solution, (ii) batch

gradient descent, and (iii) stochastic gradient descent implemented in Question 4.

Set learning rate α to a small value (say, α = 0.01).

(b) Repeat (a) 1,000 times, overlay the histograms of the estimates of the slopes, and

overlay the true value. Comment on how the choice of the algorithm affects the

estimates of the slope parameter.

(c) Simulate N=50 values of Xi

, distributed Uniformly on interval (-2,2). Simulate the

values of Yi = 3+2Xi+ei

, where ei

is drawn from N (0, 4). Fit linear regression with

(i) squared loss with the analytical solution, (ii) mean absolute error with batch

gradient descent, and (iii) Huber loss with batch gradient descent implemented in

Question 4. Set learning rate α to a small value (say, α = 0.01).

(d) Repeat (c) 1,000 times, overlay the histograms of the estimates of the slopes, and

overlay the true value. Comment on how the choice of the loss function in the case

of Normal distribution affects the estimates of the slope parameter.

(e) Simulate N=50 values of Xi

, distributed Uniformly on interval (-2,2). Simulate

the values of Yi = 3 + 2Xi + ei

, where ei

is drawn from N (0, 4). Modify the

simulated values of Y to introduce outliers, as follows. With probability 0.1, select

an observation for modification. If it is selected, increase its value by 200% with

probability 0.5, and decrease its value by 200% with probability 0.5. Fit linear

regression to the modified data, with (i) squared loss with the analytical solution,

(ii) mean absolute error with batch gradient descent, and (iii) Huber loss with batch

gradient descent implemented in Question 4. Set learning rate α to a small value

(say, α = 0.01).

(f) Repeat (c) 1,000 times, overlay the histograms of the estimates of the slopes, and

overlay the true value. Comment on how the choice of the loss function in presence

of outliers affects the estimates of the slope parameter.

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp