代写DS 5220、代做Java编程设计、代写Python，c/c++语言-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

代写DS 5220、代做Java编程设计、代写Python，c/c++语言

日期：2020-02-04 05:23

DS 5220 - Spring 2019

Homework 1

Please follow the homework submission instructions provided on Piazza.

Due on Blackboard before midnight on Friday, January 18 2019.

Each part of the problems 5 points

1. [Analytical question] Consider two Normally distributed random variables Y1and Y2 with

expected values µ1 and µ2, variances σ21

and σ22, and correlation ρ.

(a) State the joint probability distribution of these random variables. State it twice:

once in a non-matrix and the second time in a matrix form. Explain the meaning

of each term.

(b) Use Bayes theorem to derive the conditional probability distribution of Y1|Y2 and

of Y2|Y1

want to predict Y1 as function of Y2, or Y2 as function of Y1?

(d) Use the derivations above to explain the difference between the coefficient of correlation

and the slope of linear regression.

2. [Analytical question] Consider the following loss functions for error terms ei, i =1, . . . , N in linear regression. For each loss function, (i) state whether it is convex,

(ii) provide a mathematical proof, and (iii) explain how it can be useful in the context

of linear regression.

(a) Quadratic loss (related to mean squared error, L2 norm) L =

i=1 e2i

(b) Mean absolute error (L1 norm) L =PN i=1 |ei|

L =XN

i=1l(ei), where l(e) = 12e2, if |e| ≤ δδ|e| − 12δ2, if |e| > δ

3. [Analytical question] For linear regression Yi = θ0 + θ1Xi + ei, i = 1, . . . , N minimizing

squared loss:

(a) Write down the likelihood on the training data, and analytically derive the maximum

likelihood solution for parameter estimates.

(b) Calculate the gradient with respect to the parameter vector.

(d) Write down the steps of the stochastic gradient descent rule.

4. [Implementation question]

(a) Overlay graphs of the loss functions in question 2 for a range of e (consider two

different values of δ for Huber loss). Use the graph to discuss the relative advantages

and disadvantages of these loss functions for linear regression.

(b) Implement gradient descent for the loss functions above.

5. [Implementation question] In this question we will revisit JW Figure 3.3, and empirically

evaluate various approaches to fitting linear regression.

(a) Simulate N=50 values of Xi

, distributed Uniformly on interval (-2,2). Simulate the

values of Yi = 3 + 2Xi + ei

, where ei

is drawn from N (0, 4). Fit linear regression

with squared loss to the simulated data using (i) analytical solution, (ii) batch

gradient descent, and (iii) stochastic gradient descent implemented in Question 4.

Set learning rate α to a small value (say, α = 0.01).

(b) Repeat (a) 1,000 times, overlay the histograms of the estimates of the slopes, and

overlay the true value. Comment on how the choice of the algorithm affects the

estimates of the slope parameter.

, distributed Uniformly on interval (-2,2). Simulate the

values of Yi = 3+2Xi+ei

, where ei

is drawn from N (0, 4). Fit linear regression with

(i) squared loss with the analytical solution, (ii) mean absolute error with batch

gradient descent, and (iii) Huber loss with batch gradient descent implemented in

Question 4. Set learning rate α to a small value (say, α = 0.01).

(d) Repeat (c) 1,000 times, overlay the histograms of the estimates of the slopes, and

overlay the true value. Comment on how the choice of the loss function in the case

of Normal distribution affects the estimates of the slope parameter.

(e) Simulate N=50 values of Xi

, distributed Uniformly on interval (-2,2). Simulate

the values of Yi = 3 + 2Xi + ei

, where ei

is drawn from N (0, 4). Modify the

simulated values of Y to introduce outliers, as follows. With probability 0.1, select

an observation for modification. If it is selected, increase its value by 200% with

probability 0.5, and decrease its value by 200% with probability 0.5. Fit linear

regression to the modified data, with (i) squared loss with the analytical solution,

(ii) mean absolute error with batch gradient descent, and (iii) Huber loss with batch

gradient descent implemented in Question 4. Set learning rate α to a small value

(say, α = 0.01).

(f) Repeat (c) 1,000 times, overlay the histograms of the estimates of the slopes, and

overlay the true value. Comment on how the choice of the loss function in presence

of outliers affects the estimates of the slope parameter.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：CSCI 3151代做、代写data留学生、Python语言代写、Python程序设计调试

【下一篇】：CSCI 3151代做、代写data留学生、Python语言代写、Python程序设计调试

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代写DS 5220、代做Java编程设计、代写Python，c/c++语言

日期：2020-02-04 05:23

相关文章