BIST 515: Introduction to Statistical Software Homework 5
Due date: Wednesday, November 14
1. (35 total points) The National Football League(NFL)players go through a number of evaluations
during the combine so that NFL teams can assess their ability.
The nfl.csv data file is available on the Canvas, and it contains information on some of the players who
participated in 2014. The columns in the data file represent the following information:
Player: Name of player being evaluated
College: College that the player attended
Position: The position of the player where DB = defensive back, LB = linebacker, OL = offensive
linemen, RB = running back, S = safety, TE = tight end, WO = wide receiver; players who
played other positions were excluded from the data file
OverallGrade: The overall grade of the player based on the evaluations
Height: Height in inches
ArmLength: Arm length in inches
Weight: Weight in pounds
Dash40: 40-yard dash time in seconds
BenchPress: Number of bench press repetitions of 225 pounds
VerticalJump: Vertical jump in inches
BroadJump: Broad jump in inches
Cone3Drill: 3-cone drill time in seconds
Shuttle20: 20-yard shuttle run in seconds
Using these data, complete the following problems below. While you are welcome to use any football
knowledge to help answer questions, this is not needed to perform well on this assignment.
(a) (4 points) Read the data into SAS using proc import and print the first five observations using
proc print.
(b) (4 points) Sort the players by their 40-yard dash times. Print only the names of each offensive
linemen with their 40-yard dash times.
(c) (4 points) Find the mean 40-yard dash times for the players by each position. Which position has
the fastest players on average? Which position has the slowest players on average?
(d) Using proc means, find the mean, standard deviation, and sample size for the 40-yard dash times
of both offensive linemen and wide receivers (each separately). Export these values into a data set the
1
following ways:
(i) (3 points) output statement
(ii) (3 points) ods statement
(e) (5 points) Assuming these players are a simple random sample from a population of all players,
we can perform statistical inference procedures to make inferences about this population. With this
assumption, perform a two-sample t-test with unequal variances to test the equality of means for the
40-yard dash times of offensive linemen and wide receivers. More formally, we can create the hypotheses
as
H0 : μOL μWR = 0
Ha : μOL μWR 6= 0
where μposition denotes the mean for particular position. This hypothesis test should be performed
by showing the correct test statistic and p-value equations with their values AND without using a
SAS procedure to automatically find these values. Your write-up here should be formal by including
statements of hypotheses, test statistic, p-value, critical value, and decision with reasoning.
(f) This problem involves a new procedure, proc ttest, to perform the same types of calculations as
in part (e).
(i) (3 points) Show the main syntax help page available in SAS for this procedure. A screen capture
will suffice to obtain full credit.
(ii) (3 points) Perform the calculations for the test using proc ttest. Indicate where the key components
of the output are that allows one to perform the test.
(iii) (3 points) Use ods trace to determine what is the appropriate table name that contains the
p-value for the test.
(iv) (3 points) Use the ods statement to create a data set with the p-value for the test. Print this data
set.
2. (20 total points) “Stability testing” is performed by pharmaceutical companies to determine the shelf
life for drug‘ products. Typically, part of a drug batch (like a number of pills) is put into storage in
a controlled temperature and humidity environment. At regular time points, an item is taken out of
storage and testing is performed on it. A common response measured on each item is potency. Over
time, the potency of a drug will usually degrade, so the Food and Drug Administration (FDA) has set
a 95% lower limit of the desired potency level which the drug needs to remain above. The exact time
point where the drug goes below this limit is the shelf life. This shelf life (say, 4 months) then is added
to the manufacturing date of a drug to find the expiration date, which is what consumers often see
printed on drug packaging.
The shelf life is found with the help of regression models. To show how this done, below is a simulated
data set where the potency of a drug has been measured over time in months. Suppose a single pill
has been measured at each time point.
2
Time Potency
3 1.0155450
6 0.9835495
9 0.9957994
12 0.9836627
15 0.9863230
18 0.9945146
21 0.9995710
24 0.9679062
30 0.9690051
36 0.9891509
48 0.9674187
60 0.9557498
For example, the pill taken out of storage at time 3 months had a potency of 101.55% of the desired
potency level. Using this data, complete the following problems.
(a) (4 points) Use a data step with the datalines statement to create a SAS data set containing the
data in the previous table. Print the data set using proc print.
(b) (5 points) Estimate and state the sample regression model with time as the explanatory variable
and potency as the response variable. Use proc reg to perform the estimation and make sure that no
plots are produced by the procedure. Interpret the relationship between time and potency as given by
the model.
(c) (4 points) Is there sufficient evidence to indicate a linear relationship between time and potency?
Use the appropriate statistical inference methods to make this judgment.
(d) (4 points) Use proc reg again as in part (b), but include the plot with 95% confidence interval
bands for the expected potency. No other plots should be included in the output! I recommend using
the SAS help to determine the correct coding specification.
(e) (3 points) The FDA has guidelines to determine the shelf life of a drug. Specifically, a 95% confi-
dence interval band plot (like in part (d)) is used to find the time where the lower band intersects a
horizontal line drawn at a 95% potency level. The corresponding time point where this occurs is the
shelf life. Using the plot in part (d), approximate what the shelf life would be for this data. Note that
you do not need to use SAS to draw the line at a 95% potency level.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。