联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-06-23 09:22

ECE 595: Machine Learning I

Spring 2019

Homework 1: Linear Algebra and Probability Review

Spring 2019

(Due: Friday, Jan 18, 2019 )

Homework is due at 4:30pm. Please put your homework in the dropbox located at MSEE 330. No late

homework will be accepted.

Objective

As the first homework assignment, we would like you to refresh some of the concepts in the Background

chapter, and have some hands-on experience with Python. Here are the specific objectives.

(a) Familiarize yourself with tools in Python that will be helpful to you in later part of the course. In

addition to basic Python functions and objects, you will gain experience working with functions that

simulate random data sampling from probability distributions, and visualize the data;

(b) Review of some important concepts in linear algebra and probability. Warm up with some proof

techniques that will be used later in the course.

Exercise 1: Installing Python and Getting Started (0 point)

To get started with the homework, please download and install Python on your local machine. Here are

a few steps to guide you through. For additional information, e.g., video demonstrations, please visit our

course website.

(a) If you are a beginner to Python, we suggest you download Anaconda at https://www.anaconda.com/

download/. Follow the instruction and install on your local machine.

(b) Once you have installed Anaconda, open an environment and install Spyder.

(c) Make sure you have standard packages installed: scipy, numpy, matplotlib, cvxpy, cvxopt, and

imageio.

(d) After you have installed all these packages, open Spyder and type your hello world program.

import numpy as np

import scipy

import matplotlib.pyplot as plt

import cvxpy as cp

import csv

import imageio

print("Hello World!")

If you are already familiar with Python, you may skip this exercise. Please contact our teaching assistants

if you need any help.


c 2019 Stanley Chan. All Rights Reserved. 1

Exercise 2: Generating 1D Random Variables

In this exercise, we will use Python to draw random samples from a 1D Gaussian and visualize the data

using a the histogram.

(a) Let X be a random variable with X ~ N (μ, σ2). The PDF of X is written explicitly as

(b) Let μ = 0 and σ = 1 so that X ~ N (0, 1). Plot fX(x) using matplotlib.pyplot.plot for the range

x ∈ [3, 3]. Use matplotlib.pyplot.savefig to save your figure.

(c) Let us investigate the use of histograms in data visualization.

(i) Use numpy.random.normal to draw 1000 random samples from N (0, 1).

(ii) Make two histogram plots using matplotlib.pyplot.hist, with the number of bins m set to 4

and 1000.

(iii) Use scipy.stats.norm.fit to estimate the mean and standard deviation of your data. Report

the estimated values.

(iv) Plot the fitted gaussian curve on top of the two histogram plots using scipy.stats.norm.pdf.

(v) Are the two histograms representative of your data’s distribution? How are they different in terms

of data representation?

(d) A practical way to estimate the optimal bin width is to make use of what is called the cross validation

estimator of risk (CVER) of the dataset. Denoting h = (max data value  min data value)/m as

the bin width, with m = the number of bins (assuming you applied no rescaling to your raw data), we

seek h

that minimizes the CVER Jb(h), expressed as follows:

Jb(h) = 2(2)

where {pbj}

m

j=1 is the empirical probability of a sample falling into each bin, and n is the total number

of samples.

Plot Jb(h) with respect to m the number of bins, for m = 1, 2, ..., 200. Find the m?

that minimizes

Jb(h), plot the histogram of your data with that m?

, and plot the Gaussian curve fitted to your data on

top of your histogram. How is your current histogram different from those you obtained in part (c)?

Note: If you are interested in why Jb(h) plays an important role in estimating the optimal bin width

of the histogram, see the additional note of this homework.

Exercise 3: Generating 2D Random Variables

In this exercise, we consider the following question: suppose that we are given a random number generator

that can only generate zero-mean unit variance Gaussians, i.e., X ~ N (0, I), how do we transform the

distribution of X to an arbitrary Gaussian distribution? We will first derive a few equations, and then verify

them with an empirical example, by drawing samples from the 2D Gaussian, applying the transform to the

dataset, and checking if the transformed dataset really takes the form of the desired Gaussian.

(a) Let X ~ N (μ, Σ) be a 2D Gaussian. The PDF of X is given by


c 2019 Stanley Chan. All Rights Reserved. 2

where in this exercise we assume

(4)

(i) Simplify the expression fX(x) for the particular choices of μ and Σ here. Show your derivation.

(ii) Using matplotlib.pyplot.contour, plot the contour of fX(x) for the range x ∈ [?1, 5]×[0, 10].

(b) Suppose X ~ N (0, I). We would like to derive a transformation that can map X to an arbitrary

Gaussian.

(i) Let X ~ N (0, I) be a d-dimensional random vector. Let A ∈ R

d×d and b ∈ Rd. Let Y = AX +b

be an affine transformation of X. Let μY

def = E[Y ] be the mean vector and ΣY

def = E[(Y μY )(YμY )T] be the covariance matrix. Show that

μY = b, and ΣY = AAT. (5)

(ii) Show that ΣY is symmetric positive semi-definite.

(iii) Under what condition on A would ΣY become a symmetric positive definite matrix?

(iv) Consider a random variable Y ~ N (μY , ΣY ) such that

Determine A and b which could satisfy Equation (5).

Hint: Consider eigen-decomposition of ΣY . You may compute the eigen-decomposition numerically.

(c) Now let us verify our results from part (b) with an empirical example.

(i) Use numpy.random.multivariate_normal to draw 5000 random samples from the 2D standard

normal distribution, and make a scatter plot of the data point using matplotlib.pyplot.scatter.

(ii) Apply the affine transformation you derived in part (b)(iv) to the data points, and make a

scatter plot of the transformed data points. Now check your answer by using the Python function

numpy.linalg.eig to obtain the trasformation and making a new scatter plot of the transformed

data points.

(iii) Do your results from parts (c)(i) and (ii) support your theoretical findings from part (b)? You

are welcome to utilize Python functions you find useful and include plots in your answer.

Exercise 4: Norm and Positive Semi-Definiteness

The aim of this exercise is to reinforce your understanding of the vital concepts of norms, the two famous

inequalities, eigen-decomposition, and the notion of positive (semi-)definiteness, which will be ubiquitous

throughout the semester.

(a) Schur’s lemma (one of the several named after Issai Schur) is one of the most commonly used inequalities

in estimating quadratic forms. Given a matrix A ∈ R

m×n, vectors x ∈ Rm and y ∈ R

n, the inequality

takes the form

RCkxk2kyk2, where R = max

|[A]j,k|, C = max

|[A]j,k| (6)

Prove this inequality.

Hint: Use the Cauchy-Schwarz inequality.


c 2019 Stanley Chan. All Rights Reserved. 3

(b) Recall from the lectures the concepts related to positive (semi-)definite matrices.

(i) Prove that any positive definite matrix A is invertible.

(ii) Find a function f : R

2 → R whose Hessian is invertible but not positive definite anywhere in R2.

(iii) Under what extra condition is any positive semi-definite matrix positive definite? Justify your

answer.

(c) Recall the concept of eigen-decomposition: for any symmetric matrix A ∈ R

n×n, there exist a diagonal

matrix Λ ∈ R

n×n with eigenvalues of A on its diagonal, and orthonormal matrix U ∈ Rn×n with

eigenvectors of A as its columns, such that A = UΛU

T. Prove that there exists A ∈ R

n×n such that

the following holds:

A = A (7)

Hint: You can use the fact that, for symmetric A with rank k ≤ n, it is possible to eigen-decompose

A such that the first k diagonal entries of Λ are nonzero, and the rest are all zeros. Then define

j,j for 1 ≤ j ≤ k, and 0 everywhere else. A

is what is called the

pseudoinverse of A.


c 2019 Stanley Chan. All Rights Reserved. 4


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp