Make Your Publications Visible.
A Service of
zbwLeibniz-Informationszentrum
Wirtschaft
Leibniz Information Centre
for Economics
Clegg, Matthew; Krauss, Christopher; Rende, Jonas
Working Paper
partialCI: An R package for the analysis of partially
cointegrated time series
FAU Discussion Papers in Economics, No. 05/2017
Provided in Cooperation with:
Friedrich-Alexander University Erlangen-Nuremberg, Institute for
Economics
Suggested Citation: Clegg, Matthew; Krauss, Christopher; Rende, Jonas (2017) : partialCI:
An R package for the analysis of partially cointegrated time series, FAU Discussion Papers
in Economics, No. 05/2017, Friedrich-Alexander-Universit?t Erlangen-Nürnberg, Institute for
Economics, Erlangen
This Version is available at:
http://hdl.handle.net/10419/150014
Standard-Nutzungsbedingungen:
Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen
Zwecken und zum Privatgebrauch gespeichert und kopiert werden.
Sie dürfen die Dokumente nicht für ffentliche oder kommerzielle
Zwecke vervielf ltigen, ffentlich ausstellen, ffentlich zug nglich
machen, vertreiben oder anderweitig nutzen.
Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen
(insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten,
gelten abweichend von diesen Nutzungsbedingungen die in der dort
genannten Lizenz gew?hrten Nutzungsrechte.
Terms of use:
Documents in EconStor may be saved and copied for your
personal and scholarly purposes.
You are not to copy documents for public or commercial
purposes, to exhibit the documents publicly, to make them
publicly available on the internet, or to distribute or otherwise
use the documents in public.
If the documents have been made available under an Open
Content Licence (especially Creative Commons Licences), you
may exercise further usage rights as specified in the indicated
licence.
www.econstor.eu
_____________________________________________________________________
Friedrich-Alexander-Universitit Erlangen-Nürnberg
Institute for Economics
https://www.iwf.rw.fau.de/research/iwf-discussion-paper-series/
No. 05/2017
partialCI: An R package for the analysis of partially
cointegrated time series
Matthew Clegg
Independent
Christopher Krauss
University of Erlangen-Nürnberg
Jonas Rende
University of Erlangen-Nürnberg
ISSN 1867-6707
Discussion Papers
in Economics
partialCI: An R package for the analysis of partially cointegrated
time series
Matthew Clegga,1,, Christopher Kraussb,1,, Jonas Rendec,1,
a
Independent
bUniversity of Erlangen-N¨urnberg, Department of Statistics and Econometrics, Lange Gasse 20, 90403
N¨urnberg, Germany
cUniversity of Erlangen-N¨urnberg, Department of Statistics and Econometrics, Lange Gasse 20, 90403
N¨urnberg, Germany
Friday 10th February, 2017
Abstract
Partial cointegration is a weakening of cointegration, allowing for the residual series to contain
a mean-reverting and a random walk component. Analytically, the residual series is
described by a partially autoregressive process. The partialCI package provides estimation,
testing, and simulation routines for PCI models in state space. We illustrate the functionality
with two examples: A financial application in the context of pairs trading and a macroeconomic
application, i.e., the relationship between GDP and consumption. For both examples,
we show that the variables are not cointegated in the classic sense, but can be modeled with
partial cointegration.
Keywords: R software, cointegration, partial cointegration, pairs trading, permanent
components, transient components.
Email addresses: matthewcleggphd@gmail.com (Matthew Clegg), christopher.krauss@fau.de
(Christopher Krauss), jonas.rende@fau.de (Jonas Rende)
1The authors have benefited from many helpful discussions with Ingo Klein.
1. Introduction
The partialCI package (Clegg, 2016) fits a partial cointegration model2
to describe a
time series. Partial cointegration (PCI) is a weakening of cointegration, allowing for the
residual series to contain a mean-reverting and a random walk component. Analytically,
this residual series is described by a partially autoregressive process (PAR – see Summers
(1986), Poterba and Summers (1988), and Clegg (2015a))3
, consisting of a stationary ARprocess
and a random walk. Related is the short-term / long-term model introduced by
Schwartz and Smith (2000), which models a security price as the sum of a Brownian motion
and an Ornstein-Uhlenbeck process. Whereas classic cointegration in the sense of Engle and
Granger (1987) requires all shocks to be transient, PCI is more flexible and allows for permanent
shocks as well – a realistic assumption across many (macro)economic applications.
Even though neither the residual series, nor its mean-reverting and permanent component
are directly observable, estimation is still possible in state space – see Brockwell and Davis
(2010) and Durbin and Koopman (2012). The partialCI package encloses suitable estimation,
testing, and simulation routines for such PCI models.
Partial cointegration enhances several existing cointegration concepts in the literature –
namely classic cointegration, fractional cointegration and threshold cointegration.
In their seminal paper, Engle and Granger (1987) introduce the concept of classic cointegration.
Loosely speaking, if a collection of time series is cointegrated, they share a long-run
equilibrium. Shocks to the cointegration process are not persistent, i.e., the process adjusts
exponentially towards the long-run equilibrium value after exhibiting a shock (Pfaff, 2008).
Thus, if the cointegration process is subject to permanent shocks, the partial cointegration
model may be more appropriate. Test procedures for classic cointegration are implemented
in the R packages urca (Pfaff, 2008) and egcm (Clegg, 2015c).
2Please note that we use the term partial cointegration according to Clegg and Krauss (2016).
3Partially autoregressive processes are already implemented in the corresponding R package partialAR
(Clegg, 2015b).
2
In a fractional cointegration model the residual series is assumed to follow a fractionally
integrated process. Such a process incorporates weighted higher-order lags to model longterm
effects (Baillie, 1996). In terms of shock persistence, fractionally integrated processes
are between classic cointegrating processes (short-run persistence) and random walks (infinite
persistence). The ability to account for long-term persistence makes fractionally integrated
processes especially useful to analyze long-memory time series data (Baillie, 1996). The
benefit of PCI compared to fractional cointegration is twofold: First, with PCI it is possible
to disentangle the transient and permanent component, allowing to separately investigate
the dynamics associated with the transient component (Clegg and Krauss, 2016). Second,
within a PCI framework the proportion of variance attributable to mean-reversion (PVMR)
can be computed (Clegg and Krauss, 2016). The PVMR allows to assess the degree of noise
in the time series.
In their seminal paper, Balke and Fomby (1997) introduce the concept of threshold cointegration.
In the cointegration models introduced so far, every shock, independent of its
magnitude, induces an instant adjustment process towards the long-run equilibrium value.
Balke and Fomby (1997) flexiblize this assumption of linear adjustment. The process is assumed
to solely consist of a permanent component, if it does not exceed a certain threshold
level. By contrast, if the time series exceeds the threshold level the process is modeled as a
classic cointegration process and adjustment towards the corresponding long-run equilibrium
occurs as long as the process exceeds the threshold value in absolute terms. The advantage
of the partial cointegration model is the ability to model the impact of permanent shocks
globally and not just locally as in a thresold cointegration model. Threshold cointegration
models are implemented in the R package tsDyn (Stigler, 2010).
Potential fields of application for the PCI model in a financial context are: term structures,
stock indices and tracking portfolios, stock pairs, spot and future prices, commodities,
spread options, international stock indices, as well as foreign exchange (Alexander, 2011). In
addition, the PCI framework could be used to revisit macroeconomic theories, e.g., monetary
policy, fiscal policy or business cycle models. An initial show case for PCI can be found in
3
Clegg and Krauss (2016). They apply the partialCI package to detect partially cointegrated
pairs of stocks on the S&P 500 from January 1990 to October 2015. The authors extract
the mean-reverting component of the price spread time series of the partially cointegrated
pairs of stocks as baseline for a relative-value arbitrage strategy.
The remainder of this paper is organized as follows. In section 2, we outline the methodological
details of the PCI model. In section 3, we explain how to use the key functions of the
partialCI package. In section 4, we provide a finance as well as a macroeconomic example.
Finally, section 5 provides concluding thoughts.
2. The partial cointegration framework
2.1. Model definition
Based on Engle and Granger (1987), Clegg and Krauss (2016, p. 4) define the concept
of partial cointegration as follows:
Definition: ”The components of the vector Xt are said to be partially cointegrated of
order d, b, denoted Xt ~ P CI (d, b), if (i) all components of Xt are I (d)
4
; (ii) there exists
a vector α so that Zt = α0Xt and Zt can be decomposed as a sum Zt = Rt + Mt
, where
Rt ~ I (d) and Mt ~ I (d ? b).”
While Clegg and Krauss (2016) focus on the special case of two partially cointegrated time
series, we extend the model to the case of (k + 1) partially cointegrated time series. Let Yt denote
the target time series and Xj,t the j
th factor time series at time t, where j = {1, 2, . . . , k}.
The target time series and the k factor time series are partially cointegrated, if a parameter
vector ι = {β1, β2, . . . , βk, ρ, σM, σR, M0, MR} exists such that the subsequent model
4
If a time series exhibits d unit roots, it is said to be integrated of order d (I (d)) (L¨utkepohl, 2007, p.
238-242).
4
equations are satisfied (Clegg and Krauss, 2016)5:
Yt = β1X1,t + β2X2,t + ... + βkXk,t + Wt
Wt = Mt + Rt
Mt = ρMt1 + εM,t
Rt = Rt1 + εR,tεM,t ~ N0, σ2MεR,t ~ N0, σ2Rβj ∈ R; ρ ∈ (?1, 1) ; σ2
M, σ2R ∈ R+0.
(1)
Thereby, Wt denotes the partially autoregressive process, Rt the permanent component,
Mt the transient component and β = {β1, β2, . . . , βk} is the partially cointegrating vector.6
The permanent component is modeled as a random walk and the transient component as
an AR(1)-process with AR(1)-coefficient ρ. The corresponding error terms εM,t and εR,t
are assumed to follow mutually independent, normally distributed white noise processes
with mean zero and variances σ2M and σ2
R. For the sake of simplicity, we set M0 = 0 and
R0 = Y0 β1X1,0 β2X2,0 ... βkXk,0. A key advantage of modeling the cointegrating
process as a partially autoregressive process is that we are able to calculate the PVMR,
defined as (Clegg and Krauss, 2016),
R2MR = AR [(1 B) Mt]V AR [(1 B) Wt]=2σ2M2σ2M + (1 + ρ) σ2R, R2
MR ∈ [0, 1] , (2)
where B denotes the backshift operator. The statistic R2
MR is useful to assess how close the
cointegration process is to either a pure random walk (R2
MR = 0) or a pure AR(1)-process
(R2MR = 1).
2.2. State space representation
The applied state space transformation is in line with Clegg and Krauss (2016). Given
that the PAR process Wt
is not observable, we convert the PCI model into the following
5
It is possible to include an intercept within the partialCI package.
6Note that in the implemented estimation routine the estimated partially cointegrating vector is a linear
combination of all existing partially cointegrating vectors in the sense of Verbeek (2010, p. 324).
5
state space model, consisting of an observation (3) and a state equation (4):
Xt = HZt (3)Zt = F Zt1 + Wt. (4)
Thereby, Zt (4) denotes the state which is assumed to be influenced linearly by the state in
the last period and a noise term Wt
. The matrix F is assumed to be time invariant. The
observable part is denoted by Xt (3). By assumption, there is a linear dependence between
Xt and Zt
, captured in the time invariant matrix H.
The PCI framework presented in equation (1) consists of the observable target as well as
factor time series and the two hidden state variables Mt and Rt
. Following the approach
of Clegg and Krauss (2016), the k factor variables are declared as additional hidden state
variables. As a consequence X1,t, X2,t, ..., Xk,t are part of both, the observation and the state
equation. Applying the state space transformation yields the following observation equation:, (6)
with εXj,t denoting the innovation of process Xj,t. By assumption, εXj,t is normally distributed
with zero mean and variance σ
and is independent of εM,t and εR,t.
6
2.3. Estimation of a partial cointegration model
Parameters are estimated via the maximum likelihood (ML) method. Using a quasiNewton
algorithm, the ML method searches for the parameters ρ, σand the parameter
vector β which maximizes the likelihood function of the associated Kalman filter.7 The
following likelihood score is maximized (Clegg and Krauss, 2016):, (7)
where φ (·) denotes the probability density function of the normal distribution. Clegg and
Krauss (2016) provide (i) a derivation of the likelihood function (7), (ii) a proof that the
partial cointegration model is identifiable, and (iii) a comprehensive discussion about the
consistency of the ML estimation routine.8
2.4. A likelihood ratio test routine for partial cointegration
The likelihood ratio test (LRT) implemented in the partialCI package adopts the LRT
routine for PAR models proposed by Clegg (2015a). In a PCI scenario the null hypothesis
consists of two conditions – namely the hypothesis that the residual series is a pure random
walk (HR) or a pure AR(1)-process (HM0). The two conditions are separately tested. Only
if both, HR0 and HM 0
are individually rejected, the null hypothesis of no partial cointegration
is rejected. On the first stage the LRT for partial cointegration tests the null hypothesis
of a pure random walk versus the alternative hypothesis of a pure AR(1)-process or PCI. To construct the first stage of the LRT for partial cointegration it is necessary
to estimate the likelihood scores of an unrestricted and a restricted model. The likelihood
score of the unrestricted model, i.e., the largest likelihood score found by the Kalman filter
optimization routine, is denoted by(8)
7The complete algorithm as well as the determination of the starting values are available in the R package
partialCI.
8The partialCI package also provides a two-step estimation method, which often produces results that
are inferior to the joint-penalty method, and so the joint-penalty method is to be preferred.
7
The restricted model is obtained by setting ρ and σM to zero which is in line with the null
hypothesis of a pure random walk. The restricted model is given by. (9)
The test statistic for the pure radom walk hypothesis is given as
ΛR = log. (10)
Let CR (α) (CM (α)) denote the critical value associated with ΛR (ΛM) dependent on the
significance level α. If HR0
cannot be rejected, i.e., ΛR < CR (α), the tested time series is
classified as a pure random walk. On the other hand, if the test rejects HR0, the routine
continues, testing the conditional null hypothesis HM0|ΛR < CR (α) against HPCI1. Settingσ2R = 0 yields the likelihood score of the restriced model:L
M = maxβ,ρ,σ2MLMRβ, ρ, σ2M, σ2R = 0. (11)
The test statistic for the second stage is given as. (12)
If the conditional null hypothesis HM
0
|ΛR < CR (α) cannot be rejected, i.e., ΛM < CM (α),
the tested time series follows a pure AR(1)-process. Vice versa, if ΛM > CM (α) holds, the
time series is classified as partially cointegrated. Note that the critical values for both test
statistics ΛR as well as ΛM need to be simulated because the test statistics do not follow a
standard distribution. They are embedded in the package partialCI.
3. Using the PCI package
In this section, we outline the four key functions of the partialCI package in detail –
namely fit.pci(), test.pci(), statehistory.pci(), and hedge.pci().
3.1. fit.pci()
The function fit.pci() fits a partial cointegration model to a given collection of time
series.
8
fit.pci(Y, X, pci opt method = c("jp", "twostep"), par model = c("par",
"ar1", "rw"), lambda = 0, robust = FALSE, nu = 5, include alpha=FALSE)
Y : Denotes the target time series and X is a matrix containing the k factors used to
model Y .9 pci opt method: Specifies, whether the joint-penalty method ("jp") or the twostep
("twostep") method is applied to obtain the model with the best fit. If pci opt method
is specified as "twostep", a two-step procedure similar to the method introduced by
Engle and Granger (1987) is performed. The residuals of the first stage regression are
extracted and a prespecified model is fitted to the residual series. Which model is fitted
to the residual series, depends on the specification for the argument par model. In case
of "par", a partial autoregressive model is used, in case of "ar1", an AR(1)-process
and in case of "rw" a random walk (default: par model = "par"). On the other
hand, if the pci opt method is specified as "jp", the joint-penalty method is applied,
to estimate β, ρ, σ2M and σ2
R jointly via ML. The likelihood score of the associated
Kalman filter is extended by a penalty value λσ2
R, where λ ∈ R+0. Larger values for λ
favor solutions with a larger transient component and vice versa (default: lambda =0). To reach a higher chance of finding the global minimum, the procedure uses several
different starting points. One of these starting points are the parameter estimates of
an ex-ante two-step procedure, ensuring that the likelihood score obtained under "jp"is at least as good as under "twostep" (default: pci opt method = "jp").
robust: Determines whether the residuals are assumed to be normally (FALSE) or tdistributed
(TRUE) (default: robust = TRUE). If robust is set to TRUE the degrees of
freedom can be specified, using the argument nu (default: nu = 5). If pci opt method
matches "twostep", a robust linear model (rlm()) included in the R package MASS
(Ripley and Venables, 2002) is applied, i.e., a Huber (1981) M-estimator is calculated.10
include alpha: If TRUE, an intercept α is added to the PCI relationship (default:
9Both, X and Y are plain or zoo (Grothendieck and Zeileis, 2005) objects. If k = 1, X is a vector.
10For a discussion about robust parameter estimation in a PAR context, see Clegg (2015a).
9
include alpha = FALSE).
key return values: The proportion of variance attributable to mean-reversion ($pvmr),
the partially cointegrating vector ($beta), the AR(1)-coefficient ($rho) and the negative
log likelihood ($negloglik).
3.2. test.pci()
The test.pci() function tests the goodness of fit of a PCI model.
test.pci(Y, X, alpha = 0.05, null hyp = c("rw", "ar1"), robust = FALSE,
pci opt method = c("jp", "twostep"))
alpha: Determines at which significance level the null hypothesis is rejected (default:
alpha = 0.05).
null hyp: Specifies whether the null hypothesis is a random walk ("rw"), an AR(1)-process ("ar1") or a union of both hypotheses (c("rw", "ar1")) (default: null hyp= c("rw", "ar1")).
key return values: The test statistic ($statistic) and p-values ($p.value) for the
selected null hypothesis.
3.3. statehistory.pci()
To estimate the sequence of hidden states the statehistory.pci() function can be applied.
statehistory.pci(A, data = A$data, basis = A$basis)
A: Denotes a fit.pci() object.
data: Is a matrix consisting of the target time series and the k factor time series
(default: data = A$data).
basis: Captures the coefficients of the factor time series (default: basis = A$basis). key return values: The two estimated hidden states Mt ($M) and Rt ($R).
10
3.4. hedge.pci()
The function hedge.pci() finds those k factors from a predefined set of factors which yield
the best fit to the target time series.
hedge.pci(Y, X, maxfact = 10, lambda = 0, use.multicore = TRUE,
minimum.stepsize = 0, verbose = TRUE, exclude.cols = c(), search type =
c("lasso", "full", "limited"), pci opt method=c("jp", "twostep"))
maxfact: Denotes the maximum number of considered factors (default: maxfact =
10). use.multicore: If TRUE, parallel processing is activated (default: use.multicore =TRUE).
verbose: Controls whether detailed information are printed (default: verbose =
TRUE).
exclude.cols: Defines a set of factors which should be excluded from the search
routine (default: exclude.cols = c()).
search type: Determines the search algorithm applied to find the model that fits best
to the target time series. The likelihood ratio score (LRT score) is used to compare
the model fits, whereby lower scores are associated with better fits. If the option
"lasso" is specified the lasso algorithm as implemented in the R package glmnet
(Friedman et al., 2010) is deployed to search for the portfolio of factors that yields the
best linear fit to the target time series. If the option "full" is specified, then at each
step, all possible additions to the portfolio are considered and the one which yields the
highest likelihood score improvement is chosen. If the option "limited" is specified,
then at each step, the correlation of the residuals of the current portfolio is computed
with respect to each of the candidate series in the input set X, and the top B series
are chosen for further consideration. Among these top B candidates, the one which
improves the likelihood score by the greatest amount is chosen. The parameter B can
be controled via maxfact (default: search type = "lasso").
11
key return values: The best fit ($pci), the column indices ($indexes), and the names
of the factors included in the best fit ($index names).
4. Examples
4.1. Finance
As an introductory example, we explore the relationship between Royal Dutch Shell plc
A (RDS-A) and Royal Dutch Shell plc B (RDS-B), using daily (closing) price data from 1
January 2006 to 1 December 2016.11 To download the price data we use the getYahooData()
function, implemented in the R package TTR (Ulrich, 2016). The subsequent R code is
used to obtain the data.
library(partialCI)
library(TTR)
RDSA<-getYahooData("RDS-A", 20060101, 20161201)$Close
RDSB<-getYahooData("RDS-B", 20060101, 20161201)$Close
A classic cointegration analysis yields that the two time series are not cointegrated. In particular,
we apply the two-step approach of Engle and Granger (1987) implemented in the R
package egcm. By default, the egcm package uses the unit root test of Phillips and Perron
(1988)
12 (specification: with constant, no linear time trend) to investigate the residuals obtained
from an Ordinary Least Squares (OLS) regression. The R code,
library(egcm)
egcm_finance <- egcm(RDSA,RDSB,include.const = FALSE),
results in the following output:
Y[i] = 0.9732 X[i] + 0.0000 + R[i], R[i] = 0.9941 R[i-1] + eps[i],
(0.0005) (0.0000) (0.0025)
11RDS-A (Royal Dutch Shell plc - A, 2016) and RDS-B (Royal Dutch Shell plc - B, 2016) data are
downloaded from Yahoo Finance.
12The test of Phillips and Perron (1988) corrects for heteroscedasticity, a well-known stylized fact of
financial price time series (Krauss and Herrmann, 2017).
12
eps ~ N(0, 0.1679^2)
R[2016-12-01] = -1.8991 (t = -1.477)
WARNING: X and Y do not appear to be cointegrated.
The residual plot in figure 1 (code: plot(egcm finance$residuals,type = "l") suggests
that the residual series is not purely mean-reverting, but rather shows a stochastical trend
as well as a mean-reverting behavior. Hence, it is not suprising that RDS-A and RDS-B are
Figure 1: Residual plot classic cointegration: RDS-A and RDS-B (1.01.2006 - 1.12.2016, daily)
not cointegrated. Using the PCI framework, we are able to fit a PCI model to RDS-A and
RDS-B with the following R code:
PCI RDSA RDSB<-fit.pci(RDSA, RDSB, pci opt method = c("jp"), par model
=c("par"), lambda = 0, robust = FALSE, nu = 5, include alpha = FALSE)).
The R output is given as,
Fitted values for PCI model
Y[t] = X[t] %*% beta + M[t] + R[t]
M[t] = rho * M[t-1] + eps_M [t], eps_M[t] ~ N(0, sigma_M^2)
R[t] = R[t-1] + eps_R [t], eps_R[t] ~ N(0, sigma_R^2)
13
Estimate Std. Err
beta_Close 0.9274 0.0038
rho 0.3959 0.0965
sigma_M 0.1081 0.0083
sigma_R 0.1195 0.0076
-LL = -1117.29, R^2[MR] = 0.540,
where beta Close denotes the partially cointegrating coefficient. Thereby, the coefficient of
0.9274 indicates a positive relationship between RDS-A and RDS-B, and the PVMR of 0.54
suggests that the spread time series also exhibits a clear mean-reverting behavior.
In the subsequent step, we utilize the test.pci() function to check whether RDS-A and RDS-B
are partially cointegrated. The R code
test.pci(RDSA, RDSB, alpha = 0.05, null hyp = c("rw", "ar1"), robust =
FALSE, pci opt method = c("jp")),
leads to the following output:
Likelihood ratio test of [Random Walk or CI(1)] vs Almost PCI(1)
(joint penalty method)
data: StockA
Hypothesis Statistic p-value
Random Walk -55.09 0.010
AR(1) -52.88 0.010
Combined 0.010.
Recall that a time series is classified as partially cointegrated, if and only if the random walk
as well as the AR(1)-hypotheses are rejected. The p-value of 0.010 for the combined null
hypothesis indicates that RDS-A and RDS-B are partially cointegrated in the considered
period of time.
Next, we demonstrate the use of the statehistory.pci() function which allows to estimate and
extract the hidden states. The R code,
statehistory.pci(PCI RDSA RDSB), results in the R output:
14
Y Yhat Z M R eps_M eps_R
2006-01-03 35.87002 35.26781 0.6022031 0.00000000 0.6022031 0.00000000 0.00000000
2006-01-04 36.23993 35.57175 0.6681755 0.02030490 0.6478706 0.02030490 0.04566752
2006-01-05 35.80276 35.24161 0.5611509 -0.02112621 0.5822771 -0.02916450 -0.06559352
2006-01-06 36.48653 35.83377 0.6527591 0.01590352 0.6368556 0.02426695 0.05457850
...
2016-11-25 50.18000 49.52231 0.6576906 -0.08762384 0.7453144 -0.07643882 -0.17191764
2016-11-28 49.20000 48.22397 0.9760311 0.04699758 0.9290335 0.08168603 0.18371909
2016-11-29 49.06000 48.02922 1.0307808 0.04419468 0.9865862 0.02558931 0.05755262
2016-11-30 51.10000 50.23639 0.8636066 -0.02573955 0.8893462 -0.04323530 -0.09724000
2016-12-01 51.78000 51.15450 0.6254956 -0.08826115 0.7137567 -0.07807140 -0.17558945.
The latter table covers the estimates of the hidden states M and R as well as the corresponding
error terms eps M and eps R. Z is equal to the sum of M and R. The estimate
of the target time series is denoted by Yhat. Figure 2 illustrates a plot of the extracted
mean-reverting component of the spread associated with the RDS-A and RDS-B price time
series (plot(statehistory.pci(PCI RDSA RDSB)[,4]
,type = "l",ylab = "", xlab = "")). The horizontal blue lines are equal to two times
Figure 2: Mean-reverting component RDS-A and RDS-B (1.01.2006 - 1.12.2016, daily)
the historical standard deviation in absolute terms of the mean-reverting component. A pairs
trading strategy could exploit the mean-reverting behavior of Mt
. Note that this example is
in-sample; for a true out-of-sample application see Clegg and Krauss (2016).
15
We continue with using hedge.pci() to find the set of sector ETFs forming the best hedging
portfolio for the SPY index (S&P500 index). Thereby, the R code,
sectorETFS <- c("XLB", "XLE", "XLF", "XLI", "XLK", "XLP", "XLU", "XLV", "XLY")
prices <- multigetYahooPrices(c("SPY", sectorETFS), start=20060101)
hedge.pci(prices[,"SPY"], prices),
results in the subsequent output:
-LL LR[rw] p[rw] p[mr] rho R^2[MR] Factor | Factor coefficients
2320.00 -23.3743 0.0100 0.0100 0.5759 0.4526 XLI | 3.1106
1765.50 -46.5925 0.0100 0.0100 0.3170 0.4713 XLY | 1.8951 1.1989
1494.95 -53.7256 0.0100 0.0100 0.3244 0.5038 XLV | 1.6999 0.9106 0.6619
972.58 -65.9058 0.0100 0.0100 0.4060 0.5904 XLK | 1.3089 0.4933 0.5320 1.5182.
The table summarizes information about the best hedging portfolio, where each row corresponds
to an increasing number of factors. Row 1: The best single-factor hedging portfolio
comprises XLI (industrials) as only factor. Row 2: The best two-factor hedging portfolio
consists of XLI and XLY (consumer discretionary). As such, XLY leads to the best improvement
of the LRT score among all remaining factors. Row 3 includes XLV (health care) for
the three-factor portfolio and row 4 XLK (technology) for the best four-factor portfolio. The
last row corresponds to the overall best fit out of the nine potential sector ETFs, based on
the LRT score. Note that for all rows, the union of random walk and AR(1)-null hypothesis
is rejected at the 5 percent significant level, so we find a PCI model at each step.
4.2. Macroeconomics
As a second example, we revisit the relationship between GDP and personal consumption
expenditures for the United States (among others see Cochrane (1994), Gonzalo et al. (2008)
and Guisan (2008)), using quarterly seasonally adjusted annual rates in billion US-Dollar
from January 1976 to July 2016.13 The following R code triggers the data download:
13We utilize the R package Quandl (Daroczi et al., 2016) to download the GDP (US. Bureau of Economic
Analysis, 2016a) as well as personal consumption expenditures data (US. Bureau of Economic Analysis,
2016b). Thereby, the time series data are directly converted into xts (Ryan and Ulrich, 2014) objects.
16
library(xts)
library(Quandl)
library(partialCI)
GDP = Quandl("FRED/GDP", start_date = "1976-01-01",
end_date = "2016-04-01", type = "xts")
Consumption = Quandl("FRED/PCEC", start_date = "1976-01-01",
end_date = "2016-04-01",type = "xts").
Applying the unit root test of Phillips and Perron (1988) as implemented in the R package
egcm yields that GDP and personal consumption are not cointegrated in the classic sense,
within the considered time frame.14 The residual plot in figure 3 (code: plot(egcm macro$
residuals,type = "l") obtained from standard cointegration analysis shows that the residuals
exhibit both, mean-reverting and stochastic trending behavior.15 To account for the
Figure 3: Residual plot classic cointegration: GDP and consumption (1976-2016, quarters)
stochastic trending behavior we apply the following PCI model:
14The R code is given as egcm macro <- egcm(Consumption,GDP,include.const = FALSE). For the sake
of brevity, we do not show the R output.
15We are aware of the structural break in the residual series around the second quarter of the year 2000.
The function breakpoints() implemented in the R package strucchange (Hornik et al., 2003) is used to
obtain the estimate of the structural break.
17
PCI GDP Consumption<-fit.pci(GDP, Consumption, pci opt method = c("jp"),
par model =c("par"), lambda = 0, robust = FALSE, nu = 5, include alpha =
FALSE)).
The latter function yields the following R output:
Fitted values for PCI model
Y[t] = X[t] %*% beta + M[t] + R[t]
M[t] = rho * M[t-1] + eps_M [t], eps_M[t] ~ N(0, sigma_M^2)
R[t] = R[t-1] + eps_R [t], eps_R[t] ~ N(0, sigma_R^2)
Estimate Std. Err
beta_ 1.3963 0.0358
rho 0.2812 0.3357
sigma_M 27.1132 8.7402
sigma_R 35.3842 6.8836
-LL = 845.02, R^2[MR] = 0.478.
Thereby, the coefficient of 1.396 is associated with a positive relationship between GDP and
personal consumption. From a policy makers point of view the existence of such a partial
equilibrium relationship is crucial for designing appropriate economic stimulus packages.
The mean-reverting component accounts for 47.8 percent of the total variance, i.e., political
authorities could utilize this partly predicitive behavior for anti-cyclical fiscal policy interventions.
Next, we use test.pci() to test, if GDP and personal consumption are indeed partially cointegrated.
The R code is given by,
test.pci(GDP, Consumption, alpha = 0.05, null hyp =c("rw", "ar1"), robust
= FALSE, pci opt method =c("jp")),
leading to the subsequent output:
Likelihood ratio test of [Random Walk or CI(1)] vs Almost PCI(1)
(joint penalty method)
data: GDP
18
Hypothesis Statistic p-value
Random Walk -12.76 0.010
AR(1) -2.47 0.010
Combined 0.010.
Folllowing the p-value for the combined null hypothesis, GDP and personal consumption in
the United Stated are indeed partially cointegrated within the considered time frame.
To estimate the hidden states we use the statehistory.pci() function:
statehistory.pci(PCI GDP Consumption).
The latter code yields to the following output:
Y Yhat Z M R eps_M eps_R
1976 Q1 1824.5 1553.123 271.3768 0.0000000 271.3768 0.00000000 0.0000000
1976 Q2 1856.9 1580.631 276.2693 1.2902076 274.9791 1.29020760 3.6023508
1976 Q3 1890.5 1621.543 268.9573 -1.3209016 270.2782 -1.68368735 -4.7009741
1976 Q4 1938.4 1668.738 269.6618 -0.4360234 270.0978 -0.06460694 -0.1803871
1977 Q1 1992.5 1718.307 274.1925 0.9895420 273.2030 1.11214485 3.1051870
...
2015 Q1 17783.6 16893.90 889.7023 12.3240495 877.3782 14.279077 39.868192
2015 Q2 17998.3 17091.20 907.1027 10.3900797 896.7126 6.924754 19.334401
2015 Q3 18141.9 17254.15 887.7525 -0.2117553 887.9643 -3.133280 -8.748339
2015 Q4 18222.8 17368.51 854.2942 -8.9229218 863.2171 -8.863380 -24.747182
2016 Q1 18281.6 17451.17 830.4322 -10.4929841 840.9252 -7.984001 -22.291894
2016 Q2 18450.1 17723.03 727.0693 -32.1971225 759.2665 -29.246663 -81.658749.
Thereby, M denotes the mean-reverting component and R the random walk component, respectively.
To illustrate a possible application of the statehistory.pci() function in a macroeconomic
context we extract and plot the mean-reverting component. To reduce the noise and
smooth the mean-reverting component series, we use a moving average, i.e., observation i is
replaced by the mean of the observations i, i ? 1, i ? 2 and i ? 3, where i ≥ 4. In particular,
the rollmean() function from the zoo package is applied:
MRC_GDP<-statehistory.pci(PCI_GDP_Consumption)[,4]
RollingMean<-as.zoo(coredata(rollmean(MRC_GDP,4)),index(MRC_GDP)[-c(1:3)])
plot(RollingMean, type = "l").
19
Figure 4: Mean-reverting component (running-mean (k = 4)): GDP and consumption (1976-2016, quarters);
circles = troughs, squares = peaks
A close investigation of figure 4 shows that the mean-reverting component identifies peaks
and troughs of major macroeconomic expansions and recessions. The circles denote troughs
during severe U.S. recessions, whereas the squares represent peaks of important economic
U.S. expansions. From left to right, the first circle corresponds to the early 1980’s crisis,
mainly caused by the 1979 energy crisis and the contractionary policy of the U.S. central
bank (FED). The next circle identifies the early 2000’s crisis which can to some extent be
attributed to the bust of the dot-com bubble and the September 11 attacks. The third circle
is associated with the global financial crisis. The first square is associated with the economic
expansion during the Reagan era. The second square covers the emergence of the dot-com
bubble. To evaluate the accuracy of event identification associated with the mean-reverting
component, we contrast the mean-reverting component with a Hodrick-Prescott filter (HP
filter) – the standard tool in macroeconomics (Hodrick and Prescott (1997), Guay and St.-
Amant (2005), Harvey and Trimbur (2008), Choudhary et al. (2014)).16 The basic idea of
16To deal with the well-known drawbacks of the HP filter (among others see King and Rebelo (1993)
and Canova (1998)) we apply the approximate band-pass filter of Baxter and King (1999), but the general
pattern does not change.
20
Figure 5: Hodrick-Prescott filter (λ = 1400): GDP (1976-2016, quarters)
Hodrick and Prescott (1997) is to seperate a given time series in a trend and a stationary
component. The HP filter is already implemented in the R package mFilter (Balcilar, 2007),
and we can apply it with:
library(mFilter)
HPF_GDP <- mFilter::hpfilter(GDP, freq=1600, type=c("lambda"), drift=TRUE),
where lambda denotes the smoothing parameter. In the business cycle literature it is common
to choose λ = 1600 (freq) when analyzing quarterly data (Hodrick and Prescott, 1997; Ravn
and Uhlig, 2002). Figure 5 (code: plot(HPF GDP,type = "l") shows the plot of the cyclical
GDP component. A comparison of figures 4 and 5 reveal that many of the peaks and troughs
identified by the mean-reverting component are similar to those identified by the HP filter.
The GDP consists of four major components – namely personal consumption expenditures,
investment17, government expenditures and net exports (Hodrick and Prescott, 1997). Given
17In line with Hodrick and Prescott (1997) we consider total fixed investment.
21
these four possible factors, we utilize the hedge.pci() function to identify the optimal hedging
portfolio for GDP.18 The R code is given as,
GS = Quandl("FRED/GCE", start_date = "1976-01-01",
end_date = "2016-04-01", type = "xts")
Investment = Quandl("FRED/FPI", start_date = "1976-01-01",
end_date = "2016-04-01", type = "xts")
Export = Quandl("FRED/EXPGS", start_date = "1976-01-01",
end_date = "2016-04-01", type = "xts")
Import = Quandl("FRED/IMPGS", start_date = "1976-01-01",
end_date = "2016-04-01", type = "xts")
NetExport <- Export - Import.
Next, we run the hedge.pci() function with the search algorithm "full".
FactorMatrix <- cbind(Consumption,Investment,GS,NetExport)
HedgeGDP<-hedge.pci(GDP, FactorMatrix,
maxfact = 4,
lambda = 0 ,
use.multicore = TRUE,
minimum.stepsize = 0,
verbose = TRUE,
exclude.cols = c(),
search_type = c("full"),
pci_opt_method=c("jp")).
The corresponding R output is given as,
-LL LR[rw] p[rw] p[mr] rho R^2[MR] Factor | Factor coefficients
845.02 -12.7580 0.0100 0.0100 0.2812 0.4782 ..1 | 1.3963
829.04 -14.5563 0.0100 0.0100 0.2532 0.6465 ..2 | 1.2622 0.4907.
18As a preliminary step we download quarterly investment (US. Bureau of Economic Analysis, 2016c),
government expenditures (US. Bureau of Economic Analysis, 2016d), export (US. Bureau of Economic
Analysis, 2016e) and import data (US. Bureau of Economic Analysis, 2016f) for the time span of interest,
using Quandl. Net exports are derived as exports minus imports.
22
At the first stage, the best single-factor hedging portfolio contains personal consumption
expenditures. At the second stage, the best two-factor hedging portfolio consists of personal
consumption expenditures and investment, i.e., investment leads to the highest LRT score
improvement compared to government expenditures and net exports. Out of the four potential
components of GDP, the overall best hedging portfolio consists of personal consumption
expenditures and investment. Note that GDP, investment and personal consumption expenditures
are partially cointegrated, i.e., they share a partial equilibrium relationship. Thus,
for policy makers investment is a second possible channel to stimulate the economy.
5. Conclusion
In this article, we introduce the partial cointegration model and discuss differences to
other cointegration concepts. Thereby, we contribute to the literature by extending the
partial cointegration model from the special case of two partially cointegrated time series
(see Clegg and Krauss (2016)) to the general case of k + 1 partially cointegrated time series.
Next, we outline the estimation procedure and the likelihood ratio test routine for partial
cointegration. Furthermore, we explain in detail how to use the most important functions
implemented in the partialCI package – our second contribution to the literature. The
functionality is illustrated with a financial application in the context of pairs trading and a
macroeconomic application, revisiting the relationship between GDP and consumption. For
both examples, we demonstrate that the variables are not cointegated in the classic sense,
but can be modeled with partial cointegration.
Bibliography
Alexander, C., 2011. Practical financial econometrics, reprinted with corr Edition. Vol. /
Carol Alexander ; Vol. 2 of Market risk analysis. Wiley, Chichester [u.a.].
Baillie, R. T., 1996. Long memory processes and fractional integration in econometrics.
Journal of Econometrics 73 (1), 5–59.
Balcilar, M., 2007. mFilter: Miscellaneous time series filters.
URL https://CRAN.R-project.org/package=mFilter
23
Balke, N. S., Fomby, T. B., 1997. Threshold cointegration. International Economic Review
38 (3), 627.
Baxter, M., King, R. G., 1999. Measuring business cycles: Approximate band-pass filters for
economic time series. Review of Economics and Statistics 81 (4), 575–593.
Brockwell, P. J., Davis, R. A., 2010. Introduction to time series and forecasting, 2nd Edition.
Springer texts in statistics. Springer, New York [u.a.].
Canova, F., 1998. Detrending and business cycle facts: A user’s guide. Journal of Monetary
Economics 41 (3), 533–540.
Choudhary, M. A., Hanif, M. N., Iqbal, J., 2014. On smoothing macroeconomic time series
using the modified HP filter. Applied Economics 46 (19), 2205–2214.
Clegg, M., 2015a. Modeling time series with both permanent and transient components using
the partially autoregressive model. SSRN Electronic Journal.
URL http://dx.doi.org/10.2139/ssrn.2556957
Clegg, M., 2015b. partialAR: Partial autoregression.
URL https://CRAN.R-project.org/package=partialAR
Clegg, M., 2015c. egcm: Engle-Granger cointegration models.
URL https://CRAN.R-project.org/package=egcm
Clegg, M., 2016. partialCI: Partial cointegration.
URL https://github.com/matthewclegg/partialCI
Clegg, M., Krauss, C., 2016. Pairs trading with partial cointegration. FAU Discussion Papers
in Economics, University of Erlangen-N¨urnberg.
Cochrane, J. H., 1994. Permanent and transitory components of GNP and stock prices. The
Quarterly Journal of Economics 109 (1), 241–265.
Daroczi, G., Leung, C., McTaggart, R., 2016. Quandl: API wrapper for quandl.com.
URL https://CRAN.R-project.org/package=Quandl
24
Durbin, J., Koopman, S. J., 2012. Time series analysis by state space methods, 2nd Edition.
Vol. 38 of Oxford statistical science series. Oxford University Press, Oxford.
Engle, R. F., Granger, C. W. J., 1987. Co-Integration and error correction: Representation,
estimation, and testing. Econometrica 55 (2), 251.
Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized linear
models via coordinate descent. Journal of Statistical Software 33 (1), 1–22.
URL https://CRAN.R-project.org/package=glmnet
Gonzalo, J., Lee, T.-H., Yang, W., 2008. Permanent and transitory components of GDP
and stock prices: Further analysis. Macroeconomics and Finance in Emerging Market
Economies 1 (1), 105–120.
Grothendieck, G., Zeileis, A., 2005. zoo: S3 infrastructure for regular and irregular time
series. Journal of Statistical Software 14 (6), 1–27.
URL https://CRAN.R-project.org/package=zoo
Guay, A., St.-Amant, P., 2005. Do the Hodrick-Prescott and Baxter-King filters provide
a good approximation of business cycles? Annales d’Economie et de Statistique (77), ′
133–135.
Guisan, M.-C., 2008. Causality and cointegration between consumption and GDP in 25
OECD countries: Limitations of the cointegration approach.
Harvey, A., Trimbur, T., 2008. Trend estimation and the Hodrick-Prescott filter. Journal of
the Japan Statistical Society 38 (1), 41–49.
Hodrick, R., Prescott, E., 1997. Postwar U.S. business cycles: An empirical investigation.
Journal of Money, Credit and Banking 29 (1), 1–16.
Hornik, K., Kleiber, C., Kraemer, W., Zeileis, A., 2003. Testing and dating of structural
changes in practice. Computational Statistics & Data Analysis 44, 109–123.
URL https://CRAN.R-project.org/package=strucchange
Huber, P. J., 1981. Robust statistics. John Wiley & Sons, Inc.
25
King, R. G., Rebelo, S. T., 1993. Low frequency filtering and real business cycles. Journal
of Economic Dynamics and Control 17 (1-2), 207–231.
Krauss, C., Herrmann, K., 2017. On the power and size properties of cointegration tests
in the light of high-frequency stylized facts. Journal of Risk and Financial Management
10 (1), 7.
L¨utkepohl, H., 2007. New introduction to multiple time series analysis, 1st Edition. Springer,
Berlin.
Pfaff, B., 2008. Analysis of integrated and cointegrated time series with R, 2nd Edition.
Springer, New York.
Phillips, P. C. B., Perron, P., 1988. Testing for a unit root in time series regression. Biometrika
75 (2), 335–346.
Poterba, J. M., Summers, L. H., 1988. Mean reversion in stock prices. Journal of Financial
Economics 22 (1), 27–59.
Ravn, M. O., Uhlig, H., 2002. On adjusting the Hodrick-Prescott filter for the frequency of
observations. Review of Economics and Statistics 84 (2), 371–376.
Ripley, B. D., Venables, W. N., 2002. Modern applied statistics with S, 4th Edition. Springer,
New York.
Royal Dutch Shell plc - A, 2016. Historical data.
URL https://finance.yahoo.com/quote/RDS-A/history?p=RDS-A
Royal Dutch Shell plc - B, 2016. Historical data.
URL https://finance.yahoo.com/quote/RDS-B/history?p=RDS-B
Ryan, J. A., Ulrich, J. M., 2014. xts: eXtensible time series.
URL https://CRAN.R-project.org/package=xts
Schwartz, E., Smith, J. E., 2000. Short-term variations and long-term dynamics in commodity
prices. Management Science 46 (7), 893–911.
26
Stigler, M., 2010. tsDyn: Threshold cointegration: Overview and implementation in R.
URL https://CRAN.R-project.org/package=tsDyn
Summers, L. H., 1986. Does the stock market rationally reflect fundamental values? The
Journal of Finance 41 (3), 591.
Ulrich, J., 2016. TTR: Technical Trading Rules.
URL https://CRAN.R-project.org/package=TTR
US. Bureau of Economic Analysis, 2016a. Gross domestic product [GDP]. Federal Reserve
Bank of St. Louis.
URL https://fred.stlouisfed.org/series/GDP
US. Bureau of Economic Analysis, 2016b. Personal consumption expenditures [PCEC]. Federal
Reserve Bank of St. Louis.
URL https://fred.stlouisfed.org/series/PCEC
US. Bureau of Economic Analysis, 2016c. Fixed private investment [FPI]. Federal Reserve
Bank of St. Louis.
URL https://fred.stlouisfed.org/series/FPI
US. Bureau of Economic Analysis, 2016d. Government consumption expenditures and gross
investment [GCE]. Federal Reserve Bank of St. Louis.
URL https://fred.stlouisfed.org/series/GCE
US. Bureau of Economic Analysis, 2016e. Exports of goods and services [EXPGS]. Federal
Reserve Bank of St. Louis.
URL https://fred.stlouisfed.org/series/EXPGS
US. Bureau of Economic Analysis, 2016f. Imports of goods and services [IMPGS]. Federal
Reserve Bank of St. Louis.
URL https://fred.stlouisfed.org/series/IMPGS
Verbeek, M., 2010. A guide to modern econometrics, 3rd Edition. Wiley, Chichester.
27
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。