联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2018-11-11 11:37

BIST 515: Introduction to Statistical Software Homework 5

Due date: Wednesday, November 14

1. (35 total points) The National Football League(NFL)players go through a number of evaluations

during the combine so that NFL teams can assess their ability.

The nfl.csv data file is available on the Canvas, and it contains information on some of the players who

participated in 2014. The columns in the data file represent the following information:

Player: Name of player being evaluated

College: College that the player attended

Position: The position of the player where DB = defensive back, LB = linebacker, OL = offensive

linemen, RB = running back, S = safety, TE = tight end, WO = wide receiver; players who

played other positions were excluded from the data file

OverallGrade: The overall grade of the player based on the evaluations

Height: Height in inches

ArmLength: Arm length in inches

Weight: Weight in pounds

Dash40: 40-yard dash time in seconds

BenchPress: Number of bench press repetitions of 225 pounds

VerticalJump: Vertical jump in inches

BroadJump: Broad jump in inches

Cone3Drill: 3-cone drill time in seconds

Shuttle20: 20-yard shuttle run in seconds

Using these data, complete the following problems below. While you are welcome to use any football

knowledge to help answer questions, this is not needed to perform well on this assignment.

(a) (4 points) Read the data into SAS using proc import and print the first five observations using

proc print.

(b) (4 points) Sort the players by their 40-yard dash times. Print only the names of each offensive

linemen with their 40-yard dash times.

(c) (4 points) Find the mean 40-yard dash times for the players by each position. Which position has

the fastest players on average? Which position has the slowest players on average?

(d) Using proc means, find the mean, standard deviation, and sample size for the 40-yard dash times

of both offensive linemen and wide receivers (each separately). Export these values into a data set the

1

following ways:

(i) (3 points) output statement

(ii) (3 points) ods statement

(e) (5 points) Assuming these players are a simple random sample from a population of all players,

we can perform statistical inference procedures to make inferences about this population. With this

assumption, perform a two-sample t-test with unequal variances to test the equality of means for the

40-yard dash times of offensive linemen and wide receivers. More formally, we can create the hypotheses

as

H0 : μOL μWR = 0

Ha : μOL μWR 6= 0

where μposition denotes the mean for particular position. This hypothesis test should be performed

by showing the correct test statistic and p-value equations with their values AND without using a

SAS procedure to automatically find these values. Your write-up here should be formal by including

statements of hypotheses, test statistic, p-value, critical value, and decision with reasoning.

(f) This problem involves a new procedure, proc ttest, to perform the same types of calculations as

in part (e).

(i) (3 points) Show the main syntax help page available in SAS for this procedure. A screen capture

will suffice to obtain full credit.

(ii) (3 points) Perform the calculations for the test using proc ttest. Indicate where the key components

of the output are that allows one to perform the test.

(iii) (3 points) Use ods trace to determine what is the appropriate table name that contains the

p-value for the test.

(iv) (3 points) Use the ods statement to create a data set with the p-value for the test. Print this data

set.

2. (20 total points) “Stability testing” is performed by pharmaceutical companies to determine the shelf

life for drug‘ products. Typically, part of a drug batch (like a number of pills) is put into storage in

a controlled temperature and humidity environment. At regular time points, an item is taken out of

storage and testing is performed on it. A common response measured on each item is potency. Over

time, the potency of a drug will usually degrade, so the Food and Drug Administration (FDA) has set

a 95% lower limit of the desired potency level which the drug needs to remain above. The exact time

point where the drug goes below this limit is the shelf life. This shelf life (say, 4 months) then is added

to the manufacturing date of a drug to find the expiration date, which is what consumers often see

printed on drug packaging.

The shelf life is found with the help of regression models. To show how this done, below is a simulated

data set where the potency of a drug has been measured over time in months. Suppose a single pill

has been measured at each time point.

2

Time Potency

3 1.0155450

6 0.9835495

9 0.9957994

12 0.9836627

15 0.9863230

18 0.9945146

21 0.9995710

24 0.9679062

30 0.9690051

36 0.9891509

48 0.9674187

60 0.9557498

For example, the pill taken out of storage at time 3 months had a potency of 101.55% of the desired

potency level. Using this data, complete the following problems.

(a) (4 points) Use a data step with the datalines statement to create a SAS data set containing the

data in the previous table. Print the data set using proc print.

(b) (5 points) Estimate and state the sample regression model with time as the explanatory variable

and potency as the response variable. Use proc reg to perform the estimation and make sure that no

plots are produced by the procedure. Interpret the relationship between time and potency as given by

the model.

(c) (4 points) Is there sufficient evidence to indicate a linear relationship between time and potency?

Use the appropriate statistical inference methods to make this judgment.

(d) (4 points) Use proc reg again as in part (b), but include the plot with 95% confidence interval

bands for the expected potency. No other plots should be included in the output! I recommend using

the SAS help to determine the correct coding specification.

(e) (3 points) The FDA has guidelines to determine the shelf life of a drug. Specifically, a 95% confi-

dence interval band plot (like in part (d)) is used to find the time where the lower band intersects a

horizontal line drawn at a 95% potency level. The corresponding time point where this occurs is the

shelf life. Using the plot in part (d), approximate what the shelf life would be for this data. Note that

you do not need to use SAS to draw the line at a 95% potency level.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp