STAT 464/864 - Assignment #1
Due: Thursday, January 24th, 2019
Please carefully read the following instructions below, as they must be followed to
achieve the requirements of the assignment.
Format
Standard formatting requirements must be met.
1. Be aware of the instructions and consequences listed in “Policies” and “Academic
Integrity” in the course outline.
2. All code must be included in an appendix at the end of the document. The
solutions cannot include any of the code. Furthermore, all results should be
clearly stated at the end of a calculation. This means that the final solution is
written in a sentence. If several results are obtained, they should be presented
in a table or a graph, depending on which is more useful in the context of the
problem.
3. Discussions must include full sentences in paragraph structure, and explain in
full what has been discovered from the result of the calculations or analyses.
Whenever a result is obtained, take a moment to consider what the point of
the question is regarding the course content. Given what has been taught in
the course, why was the question chosen for the assignment? This should be a
necessary philosophy when answering the discussion questions.
4. Ensure that all graphs contain axis labels which clearly define what is being
plotted. If possible, write “Temperature in degrees Celsius”, not “T (
C)”. Legends
should not interfere with the lines of the plot, so set the intervals of the
plotting axis accordingly. Graphs should not require colour to separate between
different curves. The distinction between curves should be made using different
line styles. Lines or points should be set to different types.
1
References
When part of a solution is based on an external reference to the lecture notes
or the typed notes, the reference must be cited in an appendix called “References”.
The important aspect of assignments is that what is written must be in the words of
the student, and that the solution must be unique; it cannot be a straight copy of
the solution of the reference, but, rather, the solution which the student understands
from that research.
2
Question 1
For this problem, the R software is required. To install R Studio, follow the directions
of the file, “Instructions for R Setup”, in the “Course Readings and Resources” page
of “Content” in OnQ. For the first question, read “US_Population_Example_1.R”
and make the requested changes in the code so that you can run the code on your
own computer.
Part a)
Look at the output file and plots of “US_Population_Example_1.R”. The following
files should appear in the directory in which you saved the R code.
1. US_Population_Time_Series.pdf
2. US_Population_Fitting_Comparison.pdf
3. QQ_Plot.pdf
List answers to the following two questions.
1. In words, state what the measurement process is.
2. Specify the time-series models used in the code.
Part b)
Are the time-series models strictly stationary? Are they weakly stationary?
Use the mathematical definitions in your justification.
3
Part c)
Generate polynomial fits to the data with orders increasing from 1 to 6. Create
a figure and include in it the data and the different trends. Use the example
of the R code to generate the curves of the different fits, and include the figure
in the assignment paper. If one or more of the colours is difficult to see, make an
adjustment so that that colour is easily legible. Based on a visual inspection, is there
an order beyond which including higher-order terms in the polynomial model might
appear to be redundant?
Part d)
Let βj ∈ R denote the j’th coefficient in the fitting model, where j ∈ {0, · · · , p} and
β0 is the coefficient corresponding to the intercept term. Let β?
j
: → R denote the
corresponding linear-regression estimator. In “US_Population_Output.txt”, read
the “Coefficients” table under the section, “Summary of the lm.object2 R fitting
object”. In that table, refer to the column, “Pr(>|t|)”. The j’th entry of this
column lists a probability that a realization of the absolute value, T
(t)
j ∝ β
j
, of the
standardized coefficient estimator exceeds the value, |t|, corresponding to β
j = β
(0)
.
This probability is obtained from the distribution of the test statistic, T
(t)
j
, under
null hypothesis,
H0 : βj = 0. (1)
Construct a table with all the p-values from the model fits, and include this in the
assignment paper. Rows should be ordered by polynomial order. Write all the entries
of the table in scientific notation (m × 10l or m e l) and then round the coefficients
(the m) of those numbers to one decimal place. Based on the set of p-values of a row,
list the models for which all the polynomial terms must be included at the 95% level
of the hypothesis test.
4
Part e)
In order to accurately model a specific dataset, an analysis of variance (ANOVA)
can be used to determine if a linear model should include more terms than the
current number. It tests whether a linear model with p terms has a residual variance
which is distinguishable from that of a model with J ? p additional terms. In
“US_Population_Output.txt”, read the “Analysis of Variance Table” table under
the section, “ANOVA of the lm.object2 R fitting object”. In that table, refer to the
column, “Pr(>F)”. The j’th entry of this column lists a probability that a realization
of the variance ratio estimator, T
(F)
j
, exceeds the value, F, corresponding to the
variance-ratio estimate. This probability is obtained from the distribution of the test
statistic, T
(F)
j
, under null hypothesis,
H0 : {βj}
J
j=p+1 = 0. (2)
Based on the ANOVA p-values, include a list of models where all the terms must be
included. Given the three lm objects, “fit1”, “fit2” and “fit3”, the line,
“anova(fit1, fit2, fit3)”,
of R code generates a table with a column of p-values associated with the ratios
of the variance ratios of the linear model to respective polynomial models of
orders, {2, · · · , 6}. Construct a table with all the p-values from the five model fits for
p ∈ {2, · · · , 6}, and include this in the assignment paper. Write all the entries of the
table in scientific notation (m × 10l or m e l) and then round the coefficients (the m)
of those numbers to one decimal place. Write all the entries of the table in scientific
notation (m × 10l or m e l) and then round the coefficients (the m) of those numbers
to one decimal place. Based on the set of p-values of a row, list the models for which
all the polynomial terms must be included at the 95% level of the hypothesis test.
Part f)
Based on the coefficient-estimator p-values and the ANOVA p-values, declare
which order the best-fitting polynomial has, and explain why. Compute and list
sample estimates of the mean and variance of the distribution of the discrete-time
process associated with the residual time series of the best fit.
5
Part g)
Include in the assignment paper a quantile-quantile plot of the residuals of
the best polynomial fit, where the theoretical quantiles are from the standardnormal
distribution. An example of a quantile-quantile plot is included in
“US_Population_Example_1.R”. The residual series which is entered as a parameter
to the “qqnorm” function must be standardized prior to running the function
in the plotting code. Include a diagonal line segment in the plot. Explain whether
or not the residuals appear to approximately constitute the realization of a Gaussian
white-noise process.
Part h) Graduate students only
Compute the time series associated with the discrete-time processes defined by
Xn, 2Xn and ?3Xn, where the Xn are elements of X = (Xn)
N1
n=0 , the time-series
model chosen in Part f). Include in the assignment paper a single plot with each
of the three time series overlaid on the residual series. Do these series support the
choice of the polynomial order in Part f)? Explain why or why not.
6
Question 2
In this problem,
1. TD := {tn}
N1
n=0 .
2. tm, tm1 ∈ TD.
3. t := tm tm1.
4. T := Nt.
5. TC := {X(t)}t∈[0,T)
.
For each of the following discretizations, X = (Xn)n∈TD
, of X = {X(t)}t∈TC
, determine
if the process is strictly stationary and if it is weakly stationary.
a) X(tm) = μ0 + X(GW N)
(tm), where:
1. μ0 ∈ R and
2.
X
(GW N)
(tm)
IID~ Normal(0, σ2
). (3)
b) X(tm) = Acos(2πf0tm) + X(IID)
(tm), where:
1. A, f0 ∈ R.
2.
X
(IID)
(tm) ~ IID(0, σ2
). (4)
3.
t 6=2πf0
.
c) Graduate students only
Same as Part b), but with
t =2πf0
.
7
Question 3
Let X = (Xn)n∈TD denote a discrete-time process used in the modelling of the discretetime
series of the wins and losses of the National League team against the American
League team in the annual All-Star American Baseball game. The process is defined
by the relation,
Xn =+1 National League team wins
1 American League team wins
(5)
In the case of a fair game, it is straightforward to see that the distribution of the
process is Xn
IID~ Bernoulli(p), where p = P(X0 = 1). For each n in TD, then, the
ensemble of Xn (here called the state space of Xn) is {?1, +1}. Let S = {Sn}n∈TD
denote a random walk, where
Sn =
0 n = 0
Xn
m=0
Xm n > 0
(6)
a) Is S strictly stationary? Is it weakly stationary?
b) For n > m, derive an expression for the best linear predictor, S?
n(Sm), of Sn given
Sm. The expression must be in terms of {m, n, p, Sm} only and simplified as far as
possible.
c) Graduate students only
Use the central-limit theorem to approximate the probability distribution of Sn.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。