联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2022-02-19 11:40

Assignment 1 Q1

A few more cool things about PCA (30 points)

For parts a) to c) below, please assume the following:

Let be an random matrix such that , i.e. the is the covariance

matrix for row of (the th column of .

Assume that is a positive definite matrix with normed eigenvalue decomposition .

Question parts:

a. (10 points) Let be the vector of scores for the -th row of . Show that the PCA

representation preserves distance between the two vectors and , i.e. that

where . Hint: Use the properties of the various pieces of the eigenvalue

decomposition.

b. (10 points) Using the properties of traces of products of matrices and the definition of in part a),

show that:

showing that the sum of the eigenvalues is equal to the sum of the marginal variances.

c. (10 points) Assume that we generate a random vector such that and

. Let

where as described at the beginning of this question.

i. What are the and ?

ii. What is the distribution of ?

Please show your work in deriving the answers, but you may use standard results for the properties of Normal

random variables.

X = ( |? | )X1 Xp n × p Var(( ) = Σ?iXt)i Σ

i X i Xt

Σ Σ = WΛW t

= W(Yi Xt)i p i X

(Xt)i (Xt)j

|| ? ||(X)t i (X)t j = || ? ||Yi Yj

||u ? v|| = (u ? v (u ? v))t

Σ

tr(Σ) = tr(Λ)

p × 1 Z ~ Normal(0, 1)Zi

Cov( , ) = 0?i ≠ jZi Zj

V = ZWtΛ1/2

Σ = WΛWt

E(V) Var(V)

ViAssignment 1 Q2

Analyzing wine data (30 points)

The data for this exercise comes from a paper by Cortez, et al. (2009)

(https://www.sciencedirect.com/science/article/abs/pii/S0167923609001377?via%3Dihub) where the authors

were trying to relate various chemical properties of red and white wine to perceived quality. For this question,

we will analyze only the data for the chemical properties, not the quality. Also the original paper looked at red

and white wine, we will only use the data for the red.

The data can be read in via:

library(tidyverse)

wine_data<-read_csv("red_wine_data.csv") # Be sure this is in your current working di

rectory

glimpse(wine_data)

Rows: 1,599

Columns: 12

$ `fixed acidity` 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, 7.8, 7…

$ `volatile acidity` 0.700, 0.880, 0.760, 0.280, 0.700, 0.660, 0.600…

$ `citric acid` 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06, 0.00,…

$ `residual sugar` 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2.0, 6.…

$ chlorides 0.076, 0.098, 0.092, 0.075, 0.076, 0.075, 0.069…

$ `free sulfur dioxide` 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15, 17, …

$ `total sulfur dioxide` 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, 65, 10…

$ density 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0.9978,…

$ pH 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30, 3.39,…

$ sulphates 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0.47,…

$ alcohol 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, 9.5, 1…

$ quality 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5, 5, 5,…

The variables are self-evident from the names. We will not want to use the quality varible and we can create a

new dataset without it via:

wine_data_chem <- wine_data %>% select(-quality)

head(wine_data_chem)

# A tibble: 6 x 11

`fixed acidity` `volatile acidity` `citric acid` `residual sugar` chlorides


1 7.4 0.7 0 1.9 0.076

2 7.8 0.88 0 2.6 0.098

3 7.8 0.76 0.04 2.3 0.092

4 11.2 0.28 0.56 1.9 0.075

5 7.4 0.7 0 1.9 0.076

6 7.4 0.66 0 1.8 0.075

# … with 6 more variables: free sulfur dioxide ,

# total sulfur dioxide , density , pH , sulphates ,

# alcohol

This is the data you should analyze.

a. (10 points) Using only scatterplots and the sample correlation matrices, summarize what you believe to

be are the most interesting associations you observe amongst these characteristics. Show both the

plots and summaries you generate to support your summaries.

b. (20 points) Perform a principal component analysis of this data using your preferred function. As part of

this analysis, please be sure complete the following tasks:

Report the eigenvalues for all 11 principal compoments.

For the first two principal components, plot and interpret compononents in terms of the original

variables. In particular, explain which variables are most highly correlated with each of these two

components and how these components are different from each other.

Choose the smallest number of principal components that you believe can be used to summarize

the information from the data and justify your choice.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp