MATH2392 Practice of Analytics

Assignment 3

Due date: Sunday, 4 October, 2020

1. The file juul2.txt (located on Canvas -> Assignment 3) contains data from

Anders Juul on insulin-like growth factors:

? Age – age in years

? Height – height in cm

? Menarche – did the menarche occur? (1/2 = no/yes)

? Sex – 1 = Male, 2 = Female

? IGF1 – Serum IGF-I

? Tanner – Tanner’s puberty classification (1-5)

? Testvol – testis volume

? Weight – weight in kg

Numbers are separated by blanks and missing values are coded by a dot.

a. Find the correlation coefficient between “Weight” and each of the

following variables: “Sex”, “Age”, “Height”, “IGF1”, “Tanner”.

b. Restrict the data to the variables IGF1, Age, Sex, Tanner, Weight and

Height to the subjects that are younger than or exactly 20 years old.

Remove all subjects that have a missing value in any of the variables.

c. Consider body weight as response variable and perform a multiple

regression analysis using the following variables as predictors: “Sex”,

“Age”, “Height”, “IGF1”, “Tanner”. Extract the regression coefficients

with standard error, 95% confidence intervals, and p-values. (Hint: use

OUTEST = option in PROC REG statement with TABLEOUT

requirement).

d. Consider body weight as response variable and do a linear regression

model with age and gender as predictors.

e. Consider body weight as response variable and do a linear regression

model with age and gender and an interaction term for age and gender

as predictors. (Hint: Use PROC GLM procedure).

Copy and Paste the program on your answer sheet for each part.

Submit the answer sheet with your report via Canvas.

All reports should be created in RTF format.

(1 + 2 + 2 + 1 + 1 = 7 marks)

2. There are three datasets Inventory, Orders, and Transaction are located

on Canvas -> Assignment 3. This information has been extracted from a

clothing e-retailer’s database.

Use DATA step to read all three the datasets into SAS, then use PROC

SQL procedure answering the following questions.

a. Combine the three datasets as a single data set.

b. Identify the web_id which has the largest total order quantity.

c. Identify the web_id that has the lowest average cost.

Copy and Paste the program on your answer sheet for each part.

Submit the answer sheet with your report via Canvas.

(4 + 2 + 2 = 8 marks)

版权所有：留学生编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。