mbreaks
packageThis vignette is intended for users who use mbreaks
to
estimate and test for linear regression models in the presence of
multiple structural changes. The package offers a set of comprehensive
tools which deal with both pure and partial structural change models. In
particular, it provides the Sup F tests for 0 versus a known number of
structural changes and the double maximum (UD max) tests for 0 versus an
unknown number of structural changes. The sequential tests for \(l\) versus \(l+1\) structural changes are also available
to determine the number of structural changes (Bai and Perron (1998), Bai and Perron (2003)). The package also
includes methods of estimating the number of structural changes via
information criteria (Yao (1988), Liu et al. (1997)) as well as a built-in
function to visualize the fit of the estimated structural break model
with the estimated number \(m^*\) of
structural changes. A comprehensive call to conduct all of the
procedures contained in package is provided.
mbreaks
packageThe package mbreaks
provides R users with minimum but
comprehensive functions to analyze multiple structural changes in linear
regression models in which the regressors and errors are non-trending.
The framework is based on the econometric model of the following
form:
\[ y_t = x_t^{\prime}\beta + z_t^{\prime} \delta_j + u_t; \\ t=T_{j-1}+1,...,T_j, {\quad}{\text{for}}{\quad}j=1,...,m+1\] where \(\underset{(p \times 1)}{x_t}\) is a vector of regressors with fixed coefficients (if any) and \(\underset{(q \times 1)}{z_t}\) is a vector of regressors with coefficients subject to change. The break dates are \(t=T_j\) for \(j=1,...,m\) and \(T_0=0\) and \(T_{m+1}=T\) so that \(T\) is the entire sample size. If \(p=0\), the model is called a pure structural change model, and if \(p>0\), the model is called a partial structural change model. The twofold goals of this package is to enable user to:
dotest()
(Bai
and Perron (1998))dotest()
(Bai
and Perron (1998))dosequa()
(Bai and Perron
(1998),Bai and Perron (2003))dorepart()
(Bai (1997))doorder()
(Yao (1988))doorder()
(Liu et al. (1997))If the number of structural changes is known (or specified), users
can use dofix()
function to estimate the model with the
corresponding number of changes. There are 3 classes in the package,
corresponding to 2 types of diagnostic tests and one for estimation
based on the selected number of structural changes. In summary:
sbtests
: S3 class returned from dotest()
function. It includes summary of the supF tests and UDmax test with test
statistics, critical values, and summary tables which could be viewed in
the console by print()
method or open in a separated tab by
View()
seqtests
: S3 class returned from
doseqtests()
function. It includes summary of the
sequential tests for \(l\) versus \(l+1\) structural changes with test
statistics, critical values and a summary table which could be viewed in
the console via print()
method or open in separated tab by
View()
model
: S3 class returned from estimation procedures
above including dosequa()
, doorder()
,
dorepart()
, and dofix()
of models with \(m^*\) structural changes. The class
model
contains numerous information which is summarized
comprehensively into 3 main tables: i) break dates (estimated break
dates and corresponding asymptotic confidence intervals based on
assumptions on the regressors and errors), ii) regime-specific
coefficients (estimates in each regime and corrected standard errors
based on assumptions on the regressors and errors) and iii) full-sample
coefficients if \(p>0\) (estimates
and corrected standard errors based on assumptions on the regressors and
errors). Besides the information presented in the 3 main tables,
model
contains fields (with majority of them are class
matrix
) which users can access by using operator
$
for further analysis in R such as fitted values,
residuals and the name of procedure used.mbreaks
The previous section introduced the framework on which
mbreaks
package is built and summarizes the classes of
procedures available to users. In this section, we will illustrate the
syntax of high-level functions and cover the arguments that users might
want to customize to match their model with data and empirical
strategy.
The mbreaks
package designs high-level functions to have
identical arguments with default values recommended by the literature to
save users the burden. Users can use mdl()
, a comprehensive
function that invokes all high-level functions explained in previous
section 1:
#the data set for the example is real.Rda
data(real)
#carry out all testing and estimating procedures via a single call
= mdl('rate',data=real,eps1=0.15)
rate #display the results from the procedures
rate-----------------------------------------------------
The number of breaks is estimated by KT 2 estimated breaks.
Pure change model with = 455.950
Minimum SSR
:
Estimated date
Break1 Break247 79
Date 95% CI (37,50) (77,82)
90% CI (40,49) (78,81)
-specific coefficients:
Estimated regime1 Regime 2 Regime 3
Regime 1.355 (0.155) -1.796 (0.511) 5.643 (0.603)
(SE)
No full sample regressors-----------------------------------------------------
) SupF tests against a fixed number of breaks
a
1 break 2 breaks 3 breaks 4 breaks 5 breaks
57.906 43.014 33.323 24.771 18.326
Sup F 10% CV 7.040 6.280 5.210 4.410 3.470
5% CV 8.580 7.220 5.960 4.990 3.910
2.5% CV 10.180 8.140 6.720 5.510 4.340
1% CV 12.290 9.360 7.600 6.190 4.910
) UDmax tests against an unknown number of breaks
b
10% CV 5% CV 2.5% CV 1% CV
UDMax 1 57.906 7.460 8.880 10.390 12.370
-----------------------------------------------------
supF(l+1|l) tests using global optimizers under the null
supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
57.906 33.927 14.725 0.033 0.000
Seq supF 10% CV 7.040 8.510 9.410 10.040 10.580
5% CV 8.580 10.130 11.140 11.830 12.250
2.5% CV 10.180 11.860 12.660 13.400 13.890
1% CV 12.290 13.890 14.800 15.280 15.760
-----------------------------------------------------
To access additional information about specific procedures+ '$' + procedure name (not included above), type stored variable name
Users should find the syntax minimal similar to lm()
in
stats
package. It is required to specify the name of
dependent variable \(y\) followed by
the two types of the regressors \(z\)
and \(x\) from the data frame. Note
that \(z\) automatically includes a
constant term. If the model is a pure structural change model, no \(x\) is specified. If none of \(z\) and \(x\) are specified, the program assumes that
this is a mean shift model (because a constant term is included in \(z\) by default). The names of regressors
must match the names used in the data frame, otherwise errors will be
displayed and execution halted. As we will explain in the following
section, the package prepares various options to be specified by users.
These are set at default value if not specified in mdl()
or
any high-level functions of the procedures: dotest()
,
dosequa()
, doorder()
, dorepart()
,
and dofix()
etc.
eps1
is invalid, it will be set to default value
eps1=0.15
. Also, if eps1=0
, users can directly
(and are required to explicitly specify) h
in parameter for
estimation. This option is only available to model selection via
information criteria like doorder()
, model estimation via
dofix()
. All procedures based on hypothesis testing such as
dorepart()
, dosequa()
, dotest()
and doseqtests()
will not work with 0 trimming level
eps1
.m
specifies the maximum number of breaks
considered in the model. This argument is automatically matched with
eps1
argument. If the program finds that m
is
invalid (non-positive or larger than that allowed by the sample size
given the trimming value eps1
), it will be set
automatically to maximal breaks allowed by the sample size and the
specified trimming level eps1
. The default value is
m=5
.robust
: Allow for heteroskedasticity and
autocorrelation in \(u_t\). The default
value is robust=1
. If set to robust=0
, the
errors are assumed to be a martingale difference sequence.hetvar
: Allow the variance of the errors \(u_t\) to be different across segments. This
option is not allowed to be 0
when
robust=1
.prewhit
: Set to 1
if users want to
prewhiten the residuals with an AR(1) process prior to estimating the
long-run covariance matrix. The default value is
prewhit=1
.hetdat
: Set to 1
to allow the second
moment matrices of \(z_t\) and \(x_t\) (if any) to be different across
segments. Set to 0
otherwise. It is recommended to set
hetdat=1
for \(p>0\).
The default value is hetdat=1
hetq
: Set to 1
to allow the second moment
matrices of \(z_t\) and \(x_t\) (if any) to be different across
segments. This is used in construction of the confidence intervals for
the break dates. If hetq=0
, the second moment matrices of
the regressors are assumed to be identical across segments. The default
value is hetq=1
.hetomega
: Set to 1
to allow the long-run
covariance matrix of \(z_t u_t\) to be
different across segments. Set to 0
otherwise. This is used
in construction of the confidence interval for the break dates. The
default value is hetomega=1
.maxi
: Maximum number of iterations if no convergence
attained when running the iterative procedure to estimate partial
structural change model. The default is maxi=20
eps
: Criterion for convergence of the iterative
procedure. The default value is eps=0.0001
fixb
: Set to 1 if users intend to provide initial
values for \(\beta\), where the initial
values are supplied as matrix betaini
of size \((p \times 1)\). If betaini
is
invalid, the program will throw an error and stop.There are two options available to all high-level functions:
const
: Set to 1
to include constant in
\(z\). The default value is
const=1
. Users can turn off the constant in \(z\) by setting const=0
.printd
: Set to 1
to print intermediate
outputs from estimation procedures of the program to console. The
default value is printd=0
Additional specific options:
doorder()
: option ic
. Users can specify
which information criterion used for selecting number of breaks.
Available information criteria are modified BIC 'KT'
following Kurozumi and Tuvaandorj (2011),
'BIC'
following Yao (1988)
and modified SIC 'LWZ'
following Liu
et al. (1997)dosequa()
and dorepart()
: option
signif
. Option to specify significant level used in the
sequential tests or the repartition method to determine the number of
structural changes. The default value is signif=2
corresponding to the 5% significance level. Other values are
signif=1
for the 10%, signif=3
for the 2.5%,
and signif=4
for the 1% significance levels,
respectively.5The vignette replicates 2 empirical exercises: i) US real interest rate and ii) New Keynesian Phillips curve.
Garcia and Perron (1996),Bai and Perron (2003) considered a mean shift model:
\[y_t = \mu_j + u_t, \quad\text{for } t = T_{j-1}+1,...,T_{j}\quad\text{and } j=1,...,m.\]
for the US real interest rate series from 1961Q1 to 1986Q3. We allow
heteroskedasticity and serial correlation in the errors \(u_t\) by using the heteroskedasticity and
autocorrelation consistent (HAC) long-run covariance estimate using the
default setting (robust=1
) with the prewhitened residuals
also by the default setting (prewhit=1
). Here, instead of
invoking mdl()
, we demonstrate the specific syntax to
obtain the model with the number of structural changes \(m^*\) selected by modified BIC information
'KT'
#estimating the mean-shift model with BIC (the default option is ic=`'KT'`, which use modified BIC as criterion)
= doorder('rate',data=real)
rate_mBIC #NOTE: equivalent to rate$KT; type rate$KT to compare with new result
# visualization of estimated model with modified BIC (in the argument, we can replace rate$KT with rate_mBIC for exact same graph; recall that `$` is the operator to refer to field BIC in return list from mdl())
plot_model(rate$KT, title = 'US Exchange rate')
The plot_model()
function takes any estimated structural
break model of class model
and makes a graph with the
following contents:
CI
argument (which is either
0.90 or 0.95) for respective dates.CI
)
for fitted values \(\hat{y}_{m^*}\)6To show flexibility of class model
in the package, we
can reproduce a similar graph using information returned from stored
variable rate_BIC
.
#collect model information
= rate_mBIC$nbreak #number of breaks
m = rate_mBIC$y #vector of dependent var
y = rate_mBIC$z #matrix of regressors with changing coefs
zreg = rate_mBIC$date #estimated date
date = rate_mBIC$fitted.values #fitted values of model
fity = length(y)
bigT #compute the null model
= solve((t(zreg) %*% zreg)) %*% t(zreg) %*% y
fixb = zreg%*%fixb #fitted values of null model
fity_fix
#plots the model
= seq(1,bigT,1)
tx = max(y)-min(y);
range_y plot(tx,y,type='l',col="black", xlab='time',ylab="y",
ylim=c(min(y)-range_y/10,max(y)+range_y/10),lty=1)
#plot fitted values series for break model
lines(tx, fity,type='l', col="blue",lty=2)
#plot fitted values series for null model
lines(tx, fity_fix,type='l', col="dark red",lty=2)
#plot estimated dates + CIs
for (i in 1:m){
abline(v=date[i,1],lty=2)
if (rate_mBIC$CI[i,1] < 0){rate_mBIC$CI[i,1] = 0}
if(rate_mBIC$CI[i,2]>bigT){ rate_mBIC$CI[i,2]=bigT}
segments(rate_mBIC$CI[i,1],min(y)*(12+i/m)/10,rate_mBIC$CI[i,2],min(y)*(12+i/m)/10,lty=1,col='red')
}
legend(0,max(y)+range_y/10,legend=c("observed y",paste(m,'break y'),"0 break y"),
lty=c(1,2,2), col=c("black","blue","red"), ncol=1)
Perron and Yamamoto (2015) investigates the stability of New Keynesian Phillips curve model proposed by Galı and Gertler (1999) via linear model:
\[\pi_t = \mu + \gamma \pi_{t-1} + \kappa x_t + \beta E_t \pi_{t+1} + u_t\]
where \(\pi_t\) is inflation rate at time t, \(E_t\) is an expectation operator conditional on information available up to \(t\), and \(x_t\) is a real determinant of inflation. In this example, we will reproduce specific results of the paper with ready-to-use dataset:
data(nkpc)
#x_t is GDP gap
= c('inflag','ygap','inffut')
z_name #we can invoke each test separately by using dotest() and doseqtests()
= dotest('inf',z_name,data=nkpc,prewhit = 0, eps1 = 0.1,m=1)
supF_ygap #z regressors' names are passed in the argument as an array, which equivalent to above argument call with z_name
= doseqtests('inf',c('inflag','ygap','inffut'),data=nkpc,prewhit = 0, eps1=0.1)
seqF_ygap #see test results
supF_ygap
) SupF tests against a fixed number of breaks
a
1 break
22.220
Sup F 10% CV 14.810
5% CV 16.760
2.5% CV 18.620
1% CV 20.750
) UDmax tests against an unknown number of breaks
b
10% CV 5% CV 2.5% CV 1% CV
UDMax 1 22.220 15.230 17.000 18.750 20.750
seqF_ygap
supF(l+1|l) tests using global optimizers under the null
supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
22.220 12.559 22.294 13.565 18.463
Seq supF 10% CV 14.810 16.700 17.840 18.510 19.130
5% CV 16.760 18.560 19.530 20.240 20.720
2.5% CV 18.620 20.300 21.180 21.860 22.400
1% CV 20.750 22.400 23.550 24.130 24.540
#x_t is labor income share
#or invoke all tests using mdl()
= mdl('inf',c('inflag','lbs','inffut'),data=nkpc,prewhit = 0, eps1=0.1, m=5)
nkpc_lbs
There are no breaks selected by BIC and estimation is skipped
There are no breaks selected by LWZ and estimation is skipped
There are no breaks selected by KT and estimation is skipped$sbtests
nkpc_lbs
) SupF tests against a fixed number of breaks
a
1 break 2 breaks 3 breaks 4 breaks 5 breaks
30.592 69.630 37.894 32.765 30.607
Sup F 10% CV 14.810 13.560 12.360 11.430 10.610
5% CV 16.760 14.720 13.300 12.250 11.290
2.5% CV 18.620 15.880 14.220 12.960 11.940
1% CV 20.750 17.240 15.300 13.930 12.780
) UDmax tests against an unknown number of breaks
b
10% CV 5% CV 2.5% CV 1% CV
UDMax 1 69.630 15.230 17.000 18.750 20.750
$seqtests
nkpc_lbs
supF(l+1|l) tests using global optimizers under the null
supF(1|0) supF(2|1) supF(3|2) supF(4|3) supF(5|4)
30.592 11.408 11.595 12.564 26.418
Seq supF 10% CV 14.810 16.700 17.840 18.510 19.130
5% CV 16.760 18.560 19.530 20.240 20.720
2.5% CV 18.620 20.300 21.180 21.860 22.400
1% CV 20.750 22.400 23.550 24.130 24.540
To replicate the results, we turn off prewhit
option.
The values of SupF 30.6 and F(2|1) 30.6 test statistics are equivalent to 30.6 and 11.4 and the values 22.2 and 22.2 are equivalent to 22.2 and 12.6 in table VI of Perron and Yamamoto (2015) . Given the Sup F(2|1) statistics in both regressions is smaller than the 10% critical values 14.81 and both Sup F test statistic of 0 versus 1 break is larger than the 1% critical values 20.75, we conclude there is only 1 break detected.
The estimated break dates 1:Q1991 also match 1991:Q1 in the reported table. The package exactly replicates results presented in Perron and Yamamoto (2015). Given the estimated date from the sequential approach, we could split the sample into two subsamples and conduct the two stage least squares (2SLS) in each subsample as suggested by Perron and Yamamoto (2015):
Using the matrix formula for 2SLS estimator below in IV regression, we are able to obtain close estimates for first subsample as reported in table VII (Perron and Yamamoto (2015)). \[ \hat{\beta}_{IV} = (X'P_ZX)^{-1}X'P_Zy \\ P_Z = Z(Z'Z)^{-1}Z' \] where \(X = [X_1 \;\; X_2]\) is matrix of both endogenous regressor \(X_1 = \pi_{t+1}\) and exogenous regressors \(X_2 = [1,\pi_{t-1},x_t]\) and \(Z = [Z_1 \;\; Z_2]\) is matrix of instruments including excluded instruments \(Z_1\) and included instruments \(Z_2 = [1,\pi_{t-1}]\). In total, the instruments used in first stage regression are lags of inflation, labor income share, output gap, interest spread, wage inflation and commodity price (6 instruments).
The IV estimates \(\hat{\theta}_{IV}=(\hat{\mu},\hat{\gamma},\hat{\kappa},\hat{\beta})\) for 1960:Q1-1991:Q1 subsample and 1991:Q2-1 are
\(\mu\) | \(\gamma(\pi_{t-1})\) | \(\kappa(x_t)\) | \(\beta(E_t\pi_{t+1})\) | |
---|---|---|---|---|
1960:Q1-1991:Q1 | 0.001 | 0.338 | -0.002 | 0.632 |
\(SE_1\) | 0.002 | 0.115 | 0.009 | 0.130 |
1991:Q2-1997:Q4 | 0.010 | 0.589 | -0.024 | -0.866 |
\(SE_2\) | 0.007 | 0.225 | 0.045 | 0.309 |
The table above replicates Perron and Yamamoto (2015) table VII exactly
Users can call independent functions to carry out
specific procedures as outlined above instead of conducting all 6 main
procedures provided by package via mdl()
↩︎
The high level functions are mdl()
,
dofix()
, dosequa()
, dorepart()
,
doorder()
, dotest()
,
doseqtests()
↩︎
This argument is different from eps
where
eps
sets the convergence criterion for iterative scheme
when estimating partial change model.↩︎
The results of pure structural change models are not affected by these options↩︎
The above options are used extensively in all high-level
functions of the program to specify required assumptions on structural
break model. Additional options relating to formatting output in
plot_model()
function will not be explained in this
section. For more information type ?plot_model
or
?mdl()
to understand the distinctions between two types of
options↩︎
The option CI
for
plot_model()
is used to specify the confidence interval
around estimates of break dates and fitted values. For fitted values, it
is computed as: \[
(\hat{y}_t^{m^*-},\hat{y}_t^{m^*+}) = (x_t'(\hat{\beta}-Z_{CI}
\hat{s.e.}(\hat{\beta}))+z_t'(\hat{\delta}-Z_{CI}\hat{s.e.}(\hat{\delta})),x_t'(\hat{\beta}+Z_{CI}
\hat{s.e.}(\hat{\beta}))+z_t'(\hat{\delta}+Z_{CI}\hat{s.e.}(\hat{\delta})))\]
where \[ Z_{CI} = \begin{cases}
1.96 & CI=0.95 \\
1.65 & CI=0.90
\end{cases}
\] For break dates, the confidence interval is obtained via
limiting distribution of \(\hat{T}_j\)
(Bai and Perron (1998)) ↩︎