STATS 201/208
THE UNIVERSITY OF AUCKLAND
First SEMESTER 2020
Campus: City
STATISTICS
Data Analysis/Data Analysis for Commerce
INSTRUCTIONS
• Time Allowed: This Final Assessment has been designed so that a well-prepared student
could complete it within two hours. From the 1pm release time you will have 24
hours to complete and submit your assessment. No marks will be deducted for taking
longer than two hours within that 24-hour period, but you must submit before the
deadline.
• This assessment is open book, you are permitted to access your course manuals and
other written material including online resources.
• Calculators are permitted.
• Answers should be written using the R Markdown file provided.
• It is your responsibility to ensure your assessment is successfully submitted on time.
• We recommend you aim to submit a couple of hours in advance of the deadline, to
allow time to deal with any technical issues that might arise.
• We STRONGLY recommend you download your submitted document from Canvas,
after submitting it, to verify you have uploaded the correct document.
• Attempt all questions.
• There are 70 marks in total for this examination.
Page 1 of 32
STATS 201/208
Support:
• If you have any concerns regarding your Final Assessment, please call the Contact
Centre for advice, rather than your instructors.
• The Contact Centre can be reached on these numbers:
– Auckland: 09 373 7513
– Outside Auckland: 0800 61 62 63
– International: +64 9 373 7513
• For any Canvas issues, please use 24/7 help on Canvas by chat or phone.
• If any corrections are announced during the 24 hours of the final assessment, you
will be notified by a Canvas Announcement. Please ensure your notifications are
turned on during this period.
Question Interpretation:
Please note that during the final assessment period you cannot contact your instructors
for clarification on how to interpret the wording of any specific questions or to verify that
your answer is correct. Interpreting wording and making appropriate assumptions is part
of what is being assessed. You will need to interpret the question yourself and check your
own answers.
If you believe there is a typo, first re-read the question to check you have not misunderstood
the question, as it is very common for students to misread questions. If you still
believe there is a typo, please phone the Contact Centre.
Page 2 of 32
STATS 201/208
Academic Honesty Declaration:
By completing this assessment, I agree to the following declaration:
I understand the University expects all students to complete coursework with integrity
and honesty. I promise to complete all online assessment with the same academic integrity
standards and values. Any identified form of poor academic practice or academic
misconduct will be followed up and may result in disciplinary action.
As a member of the University’s student body, I will complete this assessment in a fair,
honest, responsible and trustworthy manner. This means that:
• I declare that this assessment is my own work.
• I will not seek out any unauthorised help in completing this assessment.
• I am aware the University of Auckland may use plagiarism detection tools to check
my content.
• I will not discuss the content of the assessment with anyone else in any form,
including, Canvas, Piazza, Facebook, Twitter or any other social media or online
platform within the assessment period.
• I will not reproduce the content of this assessment anywhere in any form at any
time.
• I declare that I generated the calculations and data in this assessment independently,
using only the tools and resources defined for use in this assessment.
• I will not share or distribute any tools or resources I developed for completing this
assessment.
Page 3 of 32
STATS 201/208
1. This question refers to the Job Prestige Data in Appendix A. [Total 18 marks]
a. Using the pairs plot, comment on the most important features of the data.
[3 marks]
b. The model fit.prestige1 uses both the variables income and log10income
while the other models use only log10income. Give TWO reasons why income
was dropped instead of log10income. [2 marks]
c. What would happen to the P-value for income if we dropped log10income
from fit.prestige1 instead of dropping income? Briefly justify your answer.
[2 marks]
d. According to the model fit.prestige3, which job has greater prestige:
gov.administrators or physicists? Briefly justify your answer. [2 marks]
Reminder: The data contains information collected from a large number of
people averaged for each of 102 different professions.
e. Give a brief Executive Summary of the main conclusions of the
analysis of the Job Prestige Data in APPENDIX A. [9 marks]
Note: To describe the effect of log10income it will be extremely helpful
to note that a 0.30 increase in log10income is equivalent to doubling income.
2. This question refers to the m&m’s™ Data in Appendix B.
[Total 18 marks]
a. Comment on what the interaction plot at the start of Appendix B reveals.
[2 marks]
b. Write brief Methods and Assumption Checks of the analysis of the
m&m’s™ Data in Appendix B. [6 marks]
c. Give a brief Executive Summary of the main conclusions of the
m&m’s™ Data in Appendix B. [10 marks]
Page 4 of 32
STATS 201/208
3. This question refers to the Viral pock data in Appendix C.
[Total 10 marks]
a. Comment on what the plot at the start of Appendix C reveals. [2 marks]
b. Write down an equation of the final model fitted to the data, just as you would
for a Method and Assumption Checks section. [2 marks]
c. Give a brief Executive Summary of the main conclusions of the
analysis of the Viral pock data in Appendix C. [6 marks]
4. This question refers to the Turbines data in Appendix D.
[Total 10 marks]
a. Write brief Methods and Assumption Checks notes of the analysis of the
Turbines data in Appendix D. [4 marks]
b. Give a brief Executive Summary of the main conclusions of the
analysis of the Turbines data in Appendix D. [6 marks]
Page 5 of 32
STATS 201/208
5. This question refers to the Nitrous Oxide Data in Appendix E.
[Total 14 marks]
a. Comment on the time series plot at the start of Appendix E (Global atmospheric
concentration of Nitrous Oxide). [2 marks]
b. Comment on the Holt-Winters coefficients. [2 marks]
c. Is the Holt-Winters model a good model of the N2O data? Justify your answer.
[1 mark]
d. Using the Holt-Winters model, provide a 95% prediction interval for the
N2O level in May, 2020. [2 marks]
e. Comment on the diagnostic plots for the model SF.fit1. Why was this model
rejected? [2 marks]
f. Comment on the diagnostic plots for the model SF.fit2. Are the underlying
assumptions satisfied? Briefly explain. [2 marks]
g. Using model SF.fit2, write an equation to calculate the estimated N2O for
November, 2019 . This is the estimated equation, not the model equation, so
use the estimated values for the coefficients and substitute appropriate values
for variables. You DO NOT need to calculate the final value. [3 marks]
APPENDICES BOOKLET FOLLOWS
Page 6 of 32
STATS 201/208
APPENDICES BOOKLET
CONTENTS
Appendix Name Pages
A Job Prestige Data 8–13
B m&m’s™ Data 14–18
C Viral pock data 19–21
D Turbines data 22–25
E Nitrous Oxide Data 26–32
Page 7 of 32
STATS 201/208
Appendix A Job Prestige Data
Employment data was collected from a large number of people working in Canada in
1971 across 102 different professions. For each profession, the data was combined and
average education and income levels were calculated. The prestige score and percentage
of women was also recorded for each profession.
We want to build a model to explain the perceived prestige of a profession in terms of
the other information collected. In particular, is there any evidence that professions with
a higher proportion of women are either more or less prestigious?
The variables are:
job The profession
prestige Pineo-Porter prestige score for the profession. A numeric measure of
prestige, the higher the score the more prestigious the profession.
education Average education (years) for people in the profession
income Average income ($) for people in the profession
log10income The logarithm to base 10 of income
women Percentage of women in the profession
Page 8 of 32
STATS 201/208
> pairs20x(Prestige.df[, c(1:5)])
> fit.prestige1 = lm(prestige ~ income + log10income + education
+ + women, data = Prestige.df)
> plot(fit.prestige1,which=1)
20 30 40 50 60 70 80
−20 −10
0 10 20
Fitted values
Residuals
Residuals vs Fitted
> summary(fit.prestige1)
Call:
lm(formula = prestige ~ income + log10income + education + women,
data = Prestige.df)
Residuals:
Min 1Q Median 3Q Max
-17.3357 -4.3855 -0.0915 4.3120 19.1733
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.097e+02 2.227e+01 -4.925 3.46e-06 ***
income 2.936e-05 3.736e-04 0.079 0.938
log10income 3.056e+01 6.557e+00 4.661 1.00e-05 ***
education 3.724e+00 3.669e-01 10.150 < 2e-16 ***
women 4.706e-02 3.012e-02 1.562 0.121
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.129 on 97 degrees of freedom
Multiple R-squared: 0.8351,Adjusted R-squared: 0.8283
F-statistic: 122.8 on 4 and 97 DF, p-value: < 2.2e-16
Page 10 of 32
STATS 201/208
> fit.prestige2 = lm(prestige ~ log10income + education + women,data=Prestige.df)
> summary(fit.prestige2)
Call:
lm(formula = prestige ~ log10income + education + women, data = Prestige.df)
Residuals:
Min 1Q Median 3Q Max
-17.364 -4.429 -0.101 4.316 19.179
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -110.9658 14.8429 -7.476 3.27e-11 ***
log10income 30.9427 4.4066 7.022 2.90e-10 ***
education 3.7305 0.3544 10.527 < 2e-16 ***
women 0.0469 0.0299 1.568 0.12
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.093 on 98 degrees of freedom
Multiple R-squared: 0.8351,Adjusted R-squared: 0.83
F-statistic: 165.4 on 3 and 98 DF, p-value: < 2.2e-16
> fit.prestige3 = lm(prestige ~ log10income + education, data = Prestige.df)
> summary(fit.prestige3)
Call:
lm(formula = prestige ~ log10income + education, data = Prestige.df)
Residuals:
Min 1Q Median 3Q Max
-17.0346 -4.5657 -0.1857 4.0577 18.1270
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -95.1940 10.9979 -8.656 9.27e-14 ***
log10income 26.3357 3.3090 7.959 2.94e-12 ***
education 4.0020 0.3115 12.846 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.145 on 99 degrees of freedom
Multiple R-squared: 0.831,Adjusted R-squared: 0.8275
F-statistic: 243.3 on 2 and 99 DF, p-value: < 2.2e-16
Page 11 of 32
STATS 201/208
> plot(fit.prestige3,which=1)
20 30 40 50 60 70 80
−20 −10
0 10 20
Fitted values
Residuals
Residuals vs Fitted
67
46
82
> cooks20x(fit.prestige3)
observation number
0 20 40 60 80 100
0.00 0.05 0.10 0.15 0.20
Cook's Distance plot
Cook's distance
63
67 20
Page 12 of 32
STATS 201/208
> normcheck(fit.prestige3,main="histogram")
Theoretical Quantiles
Sample Quantiles
histogram
Residuals from lm(prestige ~ log10income + education)
> confint(fit.prestige3)
2.5 % 97.5 %
(Intercept) -117.016298 -73.371676
log10income 19.769848 32.901591
education 3.383824 4.620081
> head(Prestige.df)
prestige education women income log10income job
1 68.8 13.11 11.16 12351 4.091702 gov.administrators
2 69.1 12.26 4.02 25879 4.412947 general.managers
3 63.4 12.77 15.70 9271 3.967127 accountants
4 56.8 11.42 9.11 8865 3.947679 purchasing.officers
5 73.5 14.62 11.68 8403 3.924434 chemists
6 77.6 15.64 5.13 11030 4.042576 physicists
> head(predict(fit.prestige3, interval = "confidence"))
fit lwr upr
1 65.02953 62.71844 67.34062
2 70.08810 65.99493 74.18127
3 60.38808 58.51342 62.26274
4 54.47327 52.71791 56.22863
5 66.66736 64.20611 69.12861
6 73.86069 70.95756 76.76381
Page 13 of 32
STATS 201/208
Appendix B m&m’s™ Data
In this study researchers were interested in whether low-fat nutrition labels increase the
actual consumption of chocolate candies (m&m’s™) for overweight and normal weight
consumers.
The researchers set up the experiment as follows. They asked forty adult family members
participating in a university open house to serve themselves unusual colours of m&m’s™
(gold, teal, purple, and white), which were clearly labelled either as “New Colors of
Regular m&m’s™” (regular-label) or as “New ‘Low-Fat’ m&m’s™ ” (low-fat-label).
They then measured how many calories of m&m’s™ the participants ate. The participant
was also assessed as being normal weight or overweight based on their body mass index.
The variables in the data set are:
Calories the number of calories consumed per helping by the participant,
BMI whether the participant had a BMI that was normal or ow for overweight,
MM whether m&m’s™ were labelled regular or lowfat.
The researchers were interested in whether the designations of ”low-fat” resulted in
greater calorific consumption and, if so, did the effect of the label on the m&m’s™depend
on whether the respondent was of normal weight or deemed overweight.
They also want the following estimated:
• the average number of calories consumed per helping by normal weight people eating
regular m&m’s™.
• the difference(s) in average number of calories consumed per helping between normal
and overweight people - for each type of m&m’s™.
• the difference(s) in average number of calories consumed per helping between when the
m&m’s™are labelled regular and lowfat - for both weight groups.
Reference: “Can “Low-Fat” Nutrition Labels Lead to Obesity?”, Wansink, B & Chandon,
P., Journal of Marketing Research, November 2006.
Page 14 of 32
STATS 201/208
We wish to order the factor MM so that regular is first.
> MM.df=within(MM.df,{MM=factor(MM,levels=c("regular","lowfat"))})
> interactionPlots(Calories~BMI+MM,ylab="Calories consumed",data =MM.df)
Plot of 'Calories consumed'
by levels of 'BMI' and 'MM'
BMI
Calories consumed
100 150 200 250 300 350
normal ow
lowfa
regul
> MM.fit1 = lm(Calories~BMI*MM, data = MM.df)
> eovcheck(MM.fit1)
200 220 240 260 280
−80 −40
0 20 60
Fitted values
Residuals
Page 15 of 32
STATS 201/208
> normcheck(MM.fit1)
Theoretical Quantiles
Sample Quantiles
Residuals from lm(Calories ~ BMI * MM)
> cooks20x(MM.fit1)
observation number
0 10 20 30 40
0.00 0.10 0.20
Cook's Distance plot
Cook's distance
28 26
34
Page 16 of 32
STATS 201/208
> anova(MM.fit1)
Analysis of Variance Table
Response: Calories
Df Sum Sq Mean Sq F value Pr(>F)
BMI 1 11465 11465 12.3800 0.001195 **
MM 1 35593 35593 38.4339 3.765e-07 ***
BMI:MM 1 9030 9030 9.7507 0.003528 **
Residuals 36 33339 926
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> summary(MM.fit1)
Call:
lm(formula = Calories ~ BMI * MM, data = MM.df)
Residuals:
Min 1Q Median 3Q Max
-77.50 -14.82 2.56 15.10 56.89
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 187.790 9.623 19.514 < 2e-16 ***
BMIow 3.810 13.609 0.280 0.78112
MMlowfat 29.610 13.609 2.176 0.03622 *
BMIow:MMlowfat 60.100 19.247 3.123 0.00353 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.43 on 36 degrees of freedom
Multiple R-squared: 0.6272,Adjusted R-squared: 0.5961
F-statistic: 20.19 on 3 and 36 DF, p-value: 7.604e-08
> confint(MM.fit1)
2.5 % 97.5 %
(Intercept) 168.272946 207.30705
BMIow -23.791283 31.41128
MMlowfat 2.008717 57.21128
BMIow:MMlowfat 21.065891 99.13411
Page 17 of 32
STATS 201/208
> summary2way(MM.fit1,page="nointeraction")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = fit)
$BMI
diff lwr upr p adj
ow-normal 33.86 14.34295 53.37705 0.0011955
$MM
diff lwr upr p adj
lowfat-regular 59.66 40.14295 79.17705 4e-07
> summary2way(MM.fit1,page="interaction")
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = fit)
$`Comparisons within BMI`
diff lwr upr p adj
normal:lowfat-normal:regular 29.61 -7.043392 66.26339 0.1494601
ow:lowfat-ow:regular 89.71 53.056608 126.36339 0.0000007
$`Comparisons between BMI`
diff lwr upr p adj
ow:regular-normal:regular 3.81 -32.84339 40.46339 0.9922076
ow:lowfat-normal:lowfat 63.91 27.25661 100.56339 0.0002140
Page 18 of 32
STATS 201/208
Appendix C Viral pock data
When creating vaccines and medications for treating viruses, researchers need to be able
to produce large amounts of samples of the virus in petri dishes. A set quantity of a viral
medium is placed in a petri dish and left for the virus to grow. One question researchers
had was what was the effect of diluting the viral medium on the viral Activity.
They set up an experiment as follows. A supply of viral medium was produced. This was
then divided into 5 batches, each with a different level of dilution applied to them. Petri
dishes were randomly assigned to the batches and filled. After a fixed amount of time,
the viral activity in each petri dish was assess by counting the number of pock marks
that appeared. The dilution levels are actually in increasing powers of 2, so each increase
of 1 in dilution level d
¯
oubles the dilution and therefore h
¯
alves the concentration of the
viral medium.
The variables are:
Count a count of the number of pock marks recorded in the petri dish
Dilution the dilution level used for viral medium in the petri dish
The researcher has two questions they want answered. What is the estimated number of
pock marks you would expect to see at dilution level 0 and what is the effect of doubling
the dilution level?
> plot(Count~Dilution, data=Pock.df)
0 1 2 3 4
0 50 100 150 200 250
Dilution
Count
Page 19 of 32
STATS 201/208
> fit.glm<- glm(Count ~ Dilution,family = poisson, data=Pock.df )
> summary(fit.glm)
Call:
glm(formula = Count ~ Dilution, family = poisson, data = Pock.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-6.058 -2.031 -0.462 1.373 5.357
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.26793 0.02255 233.60 <2e-16 ***
Dilution -0.68094 0.01544 -44.09 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2729.74 on 47 degrees of freedom
Residual deviance: 290.44 on 46 degrees of freedom
AIC: 562.42
Number of Fisher Scoring iterations: 4
> 1-pchisq(deviance(fit.glm), df.residual(fit.glm))
> fit.quasi=glm(Count ~ Dilution,family = quasipoisson, data=Pock.df )
> summary(fit.quasi)
Call:
glm(formula = Count ~ Dilution, family = quasipoisson, data = Pock.df)
Deviance Residuals:
Min 1Q Median 3Q Max
-6.058 -2.031 -0.462 1.373 5.357
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.26793 0.05678 92.78 <2e-16 ***
Dilution -0.68094 0.03888 -17.51 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 6.338981)
Null deviance: 2729.74 on 47 degrees of freedom
Residual deviance: 290.44 on 46 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
> confint(fit.quasi)
2.5 % 97.5 %
(Intercept) 5.1548909 5.3775069
Dilution -0.7583492 -0.6058599
> exp(confint(fit.quasi))
2.5 % 97.5 %
(Intercept) 173.2769014 216.4819000
Dilution 0.4684391 0.5456051
> 100*(exp(confint(fit.quasi))-1)
2.5 % 97.5 %
(Intercept) 17227.69014 21548.19000
Dilution -53.15609 -45.43949
Page 21 of 32
STATS 201/208
Appendix D Turbines data
An experiment was set up to test the life of turbine wheels. Of interest was the likelihood
of the turbine wheels developing fissures (narrow cracks) that would require maintenance
to prevent more serious damage.
In an experiment, turbine wheels were randomly allocated to be run for a number of hours.
Each wheel was then examined to see whether or not it had developed any fissures.
The data collected contains the variables:
Hours the number of hours the turbines were run, a numeric vector,
Turbines the number of turbine wheels that where tested at that number of hours,
Fissures the number of turbine wheels that developed fissures at that number of hours.
The researchers wanted to know the effects of time on the likelihood of turbine wheels
developing fissures. They also wanted to estimate how many hours the turbine could run
before the probability of the wheel developing fissures exceeded 25%.
> Turbines.df
Hours Turbines Fissures
> plot(Fissures/Turbines~Hours,ylab="probability",data=Turbines.df,
+ main="Probability of having a fissure by number of hours")
1000 2000 3000 4000
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Probability of having a fissure by number of hours
Hours
probability
> turbines.glm <- glm( Fissures/Turbines ~ Hours, family=binomial,
+ weights=Turbines, data=Turbines.df)
> plot(turbines.glm,which=1)
−3 −2 −1 0
−1
0 1 2
Predicted values
Residuals
Residuals vs Fitted
9
1
7
Page 23 of 32
STATS 201/208
> summary(turbines.glm)
Call:
glm(formula = Fissures/Turbines ~ Hours, family = binomial, data = Turbines.df,
weights = Turbines)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5055 -0.7647 -0.3036 0.4901 2.0943
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.9235966 0.3779589 -10.381 <2e-16 ***
Hours 0.0009992 0.0001142 8.754 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 112.670 on 10 degrees of freedom
Residual deviance: 10.331 on 9 degrees of freedom
AIC: 49.808
Number of Fisher Scoring iterations: 4
> 1-pchisq(deviance(turbines.glm), df.residual(turbines.glm))
[1] 0.3243236
> confint(turbines.glm)
2.5 % 97.5 %
(Intercept) -4.704479229 -3.219091395
Hours 0.000783381 0.001231879
> exp(100*confint(turbines.glm))
2.5 % 97.5 %
(Intercept) 4.864778e-205 1.572668e-140
Hours 1.081488e+00 1.131097e+00
> 100*(exp(100*confint(turbines.glm))-1)
2.5 % 97.5 %
(Intercept) -100.000000 -100.00000
Hours 8.148825 13.10969
Page 24 of 32
STATS 201/208
> Pred.data = data.frame(Hours = seq(400,5000,by=100))
> Turbine.pred = predict(turbines.glm, Pred.data)
> Prob=round(exp(Turbine.pred) / (1 + exp(Turbine.pred)),3)
> names(Prob)=seq(400,5000,by=100)
> plot(Fissures/Turbines~Hours,ylab="probability", main="Probability of
+ turbine having a fissure by number of hour",data=Turbines.df)
> lines(seq(400,5000,by=100),Prob)
1000 2000 3000 4000
0.0 0.2 0.4 0.6
Probability of
turbine having a fissure by number of hour
Hours
probability
> Prob
400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600
0.029 0.032 0.035 0.038 0.042 0.046 0.051 0.056 0.062 0.068 0.074 0.081 0.089
1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900
0.098 0.107 0.117 0.127 0.139 0.151 0.164 0.179 0.194 0.210 0.227 0.245 0.264
3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200
0.284 0.304 0.326 0.348 0.371 0.395 0.419 0.444 0.468 0.493 0.518 0.543 0.568
4300 4400 4500 4600 4700 4800 4900 5000
0.592 0.616 0.639 0.662 0.684 0.705 0.726 0.745
Page 25 of 32
STATS 201/208
Appendix E Nitrous Oxide Data
The following data are the monthly average global atmospheric concentrations of N2O
from January 2008 to October 2019 in parts per billion (ppb).
> N2O.ts = ts(N2O.df$N2O,start=2008,frequency=12)
> N2O.ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2008 321.3 321.4 321.4 321.4 321.4 321.5 321.4 321.4 321.5 321.6 321.8 322.0
2009 322.2 322.3 322.3 322.1 322.0 322.0 322.0 322.1 322.3 322.5 322.7 322.8
2010 322.9 322.9 322.9 323.0 323.0 323.0 323.0 323.1 323.3 323.5 323.7 323.9
2011 324.0 324.1 324.1 324.1 324.1 324.0 324.1 324.1 324.2 324.4 324.6 324.8
2012 324.9 325.0 325.0 324.9 324.9 324.9 324.9 325.0 325.1 325.3 325.4 325.5
2013 325.6 325.6 325.7 325.7 325.8 325.9 326.0 326.0 326.1 326.2 326.4 326.5
2014 326.6 326.7 326.7 326.8 326.8 326.9 327.0 327.2 327.3 327.5 327.7 327.9
2015 328.0 328.1 328.0 328.0 327.9 328.0 328.0 328.1 328.2 328.4 328.6 328.8
2016 328.9 328.9 328.9 328.9 328.9 328.9 328.9 328.8 328.9 329.0 329.2 329.4
2017 329.5 329.5 329.5 329.5 329.5 329.6 329.7 329.8 329.9 330.0 330.2 330.3
2018 330.4 330.6 330.7 330.7 330.7 330.7 330.7 330.9 331.1 331.3 331.5 331.7
2019 331.8 331.7 331.7 331.6 331.6 331.7 331.9 331.9 331.9 332.1
> plot(N2O.ts,xlab="month",ylab="ppb",main="Global atmospheric concentration
+ of Nitrous Oxide - Jan 2008 to Oct 2019")
Global atmospheric concentration
of Nitrous Oxide − Jan 2008 to Oct 2019
month
ppb
2008 2010 2012 2014 2016 2018 2020
322 324 326 328 330 332
Page 26 of 32
STATS 201/208
> HW.fit = HoltWinters(N2O.ts)
> HW.fit
Holt-Winters exponential smoothing with trend and additive seasonal component.
Call:
HoltWinters(x = N2O.ts)
Smoothing parameters:
alpha: 0.8527811
beta : 0.01102914
gamma: 1
Coefficients:
[,1]
a 332.17880227
b 0.07663861
s1 0.01925176
s2 0.10884812
s3 0.18119490
s4 0.22432467
s5 0.23304180
s6 0.14142845
s7 0.01506802
s8 -0.08650202
s9 -0.14807969
s10 -0.18645879
s11 -0.16506783
s12 -0.07880227
Page 27 of 32
STATS 201/208
> HW.pred = predict(HW.fit,n.ahead=12,prediction.interval=T)
> plot(HW.fit,HW.pred,main="Global atmospheric concentration of Nitrous Oxide")
Global atmospheric concentration of Nitrous Oxide
Time
Observed / Fitted
2010 2012 2014 2016 2018 2020
322 326 330 334
> HW.pred
fit upr lwr
Nov 2019 332.2747 332.4768 332.0726
Dec 2019 332.4409 332.7077 332.1741
Jan 2020 332.5899 332.9096 332.2702
Feb 2020 332.7097 333.0756 332.3437
Mar 2020 332.7950 333.2028 332.3873
Apr 2020 332.7801 333.2266 332.3336
May 2020 332.7303 333.2132 332.2475
Jun 2020 332.7054 333.2227 332.1881
Jul 2020 332.7205 333.2707 332.1703
Aug 2020 332.7587 333.3406 332.1768
Sep 2020 332.8568 333.4693 332.2442
Oct 2020 333.0197 333.6620 332.3773
Page 28 of 32
STATS 201/208
Note: Define Month as a factor for month of the year coded as 1 for January, 2 for
February, etc. Define Time as a numeric variable with values 1 for January 2008, 2 for
February 1980, ..., 142 for October 2019.
> Time = 1:142
> Month = factor(c(rep(1:12,11),(1:10)))
> SF.fit1 = lm(N2O.ts~Time+Month)
> plot.ts(residuals(SF.fit1),main="Residual Series")
Residual Series
Time
residuals(SF.fit1)
0 20 40 60 80 100 120 140
−0.3 −0.1 0.1 0.2 0.3
Page 29 of 32
STATS 201/208
> acf(residuals(SF.fit1))
0 5 10 15 20
−0.5 0.0 0.5 1.0
Lag
ACF
Series residuals(SF.fit1)
> SF.fit2 = lm(N2O.ts[-1]~Time[-1]+Month[-1]+N2O.ts[-142])
> plot.ts(residuals(SF.fit2),main="Residual Series")
Residual Series
Time
residuals(SF.fit2)
0 20 40 60 80 100 120 140
−0.15 −0.05 0.05 0.15
Page 30 of 32
STATS 201/208
> acf(residuals(SF.fit2))
0 5 10 15 20
−0.2 0.2 0.6 1.0
Lag
ACF
Series residuals(SF.fit2)
> normcheck(SF.fit2, main = "")
Theoretical Quantiles
Sample Quantiles
Residuals from lm(N2O.ts[−1] ~ Time[−1] + Month[−1] + N2O.ts[−142])
Page 31 of 32
STATS 201/208
> summary(SF.fit2)
Call:
lm(formula = N2O.ts[-1] ~ Time[-1] + Month[-1] + N2O.ts[-142])
Residuals:
Min 1Q Median 3Q Max
-0.163659 -0.042180 0.005527 0.041432 0.157619
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.135431 11.598702 2.598 0.010480 *
Time[-1] 0.007490 0.002867 2.613 0.010067 *
Month[-1]2 -0.045989 0.025966 -1.771 0.078939 .
Month[-1]3 -0.098020 0.025925 -3.781 0.000239 ***
Month[-1]4 -0.129731 0.025956 -4.998 1.88e-06 ***
Month[-1]5 -0.130448 0.026398 -4.941 2.40e-06 ***
Month[-1]6 -0.088718 0.027186 -3.263 0.001414 **
Month[-1]7 -0.092309 0.027628 -3.341 0.001096 **
Month[-1]8 -0.070900 0.028129 -2.521 0.012958 *
Month[-1]9 -0.022152 0.028311 -0.782 0.435404
Month[-1]10 0.031276 0.027794 1.125 0.262593
Month[-1]11 0.063619 0.027393 2.322 0.021800 *
Month[-1]12 0.046722 0.026635 1.754 0.081815 .
N2O.ts[-142] 0.906416 0.036144 25.078 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.06205 on 127 degrees of freedom
Multiple R-squared: 0.9997,Adjusted R-squared: 0.9996
F-statistic: 2.938e+04 on 13 and 127 DF, p-value: < 2.2e-16
> N2O.ts[142]
[1] 332.1
Page 32 of 32
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。