statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ

class statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ(endog, k_endog_monthly=None, factors=1, factor_orders=1, factor_multiplicities=None, idiosyncratic_ar1=True, standardize=True, endog_quarterly=None, init_t0=False, obs_cov_diag=False, **kwargs)[source]

Dynamic factor model with EM algorithm; option for monthly/quarterly data.

Implementation of the dynamic factor model of Bańbura and Modugno (2014) ([1]) and Bańbura, Giannone, and Reichlin (2011) ([2]). Uses the EM algorithm for parameter fitting, and so can accommodate a large number of left-hand-side variables. Specifications can include any collection of blocks of factors, including different factor autoregression orders, and can include AR(1) processes for idiosyncratic disturbances. Can incorporate monthly/quarterly mixed frequency data along the lines of Mariano and Murasawa (2011) ([4]). A special case of this model is the Nowcasting model of Bok et al. (2017) ([3]). Moreover, this model can be used to compute the news associated with updated data releases.

Parameters:
endogarray_like

Observed time-series process \(y\). See the “Notes” section for details on how to set up a model with monthly/quarterly mixed frequency data.

k_endog_monthlyint, optional

If specifying a monthly/quarterly mixed frequency model in which the provided endog dataset contains both the monthly and quarterly data, this variable should be used to indicate how many of the variables are monthly. Note that when using the k_endog_monthly argument, the columns with monthly variables in endog should be ordered first, and the columns with quarterly variables should come afterwards. See the “Notes” section for details on how to set up a model with monthly/quarterly mixed frequency data.

factorsint, list, or dict, optional

Integer giving the number of (global) factors, a list with the names of (global) factors, or a dictionary with:

  • keys : names of endogenous variables

  • values : lists of factor names.

If this is an integer, then the factor names will be 0, 1, …. The default is a single factor that loads on all variables. Note that there cannot be more factors specified than there are monthly variables.

factor_ordersint or dict, optional

Integer describing the order of the vector autoregression (VAR) governing all factor block dynamics or dictionary with:

  • keys : factor name or tuples of factor names in a block

  • values : integer describing the VAR order for that factor block

If a dictionary, this defines the order of the factor blocks in the state vector. Otherwise, factors are ordered so that factors that load on more variables come first (and then alphabetically, to break ties).

factor_multiplicitiesint or dict, optional

This argument provides a convenient way to specify multiple factors that load identically on variables. For example, one may want two “global” factors (factors that load on all variables) that evolve jointly according to a VAR. One could specify two global factors in the factors argument and specify that they are in the same block in the factor_orders argument, but it is easier to specify a single global factor in the factors argument, and set the order in the factor_orders argument, and then set the factor multiplicity to 2.

This argument must be an integer describing the factor multiplicity for all factors or dictionary with:

  • keys : factor name

  • values : integer describing the factor multiplicity for the factors in the given block

idiosyncratic_ar1bool

Whether or not to model the idiosyncratic component for each series as an AR(1) process. If False, the idiosyncratic component is instead modeled as white noise.

standardizebool or tuple, optional

If a boolean, whether or not to standardize each endogenous variable to have mean zero and standard deviation 1 before fitting the model. See “Notes” for details about how this option works with postestimation output. If a tuple (usually only used internally), then the tuple must have length 2, with each element containing a Pandas series with index equal to the names of the endogenous variables. The first element should contain the mean values and the second element should contain the standard deviations. Default is True.

endog_quarterlypandas.Series or pandas.DataFrame

Observed quarterly variables. If provided, must be a Pandas Series or DataFrame with a DatetimeIndex or PeriodIndex at the quarterly frequency. See the “Notes” section for details on how to set up a model with monthly/quarterly mixed frequency data.

init_t0bool, optional

If True, this option initializes the Kalman filter with the distribution for \(\alpha_0\) rather than \(\alpha_1\). See the “Notes” section for more details. This option is rarely used except for testing. Default is False.

obs_cov_diagbool, optional

If True and if idiosyncratic_ar1 is True, then this option puts small positive values in the observation disturbance covariance matrix. This is not required for estimation and is rarely used except for testing. (It is sometimes used to prevent numerical errors, for example those associated with a positive semi-definite forecast error covariance matrix at the first time step when using EM initialization, but state space models in Statsmodels switch to the univariate approach in those cases, and so do not need to use this trick). Default is False.

Attributes:
endog_names

Names of endogenous variables.

exog_names

The names of the exogenous variables.

initial_variance
initialization
loglike_constant

Constant term in the joint log-likelihood function.

loglikelihood_burn
param_names

(list of str) List of human readable parameter names.

start_params

(array) Starting parameters for maximum likelihood estimation.

state_names

(list of str) List of human readable names for unobserved states.

tolerance

Notes

The basic model is:

\[\begin{split}y_t & = \Lambda f_t + \epsilon_t \\ f_t & = A_1 f_{t-1} + \dots + A_p f_{t-p} + u_t\end{split}\]

where:

  • \(y_t\) is observed data at time t

  • \(\epsilon_t\) is idiosyncratic disturbance at time t (see below for details, including modeling serial correlation in this term)

  • \(f_t\) is the unobserved factor at time t

  • \(u_t \sim N(0, Q)\) is the factor disturbance at time t

and:

  • \(\Lambda\) is referred to as the matrix of factor loadings

  • \(A_i\) are matrices of autoregression coefficients

Furthermore, we allow the idiosyncratic disturbances to be serially correlated, so that, if idiosyncratic_ar1=True, \(\epsilon_{i,t} = \rho_i \epsilon_{i,t-1} + e_{i,t}\), where \(e_{i,t} \sim N(0, \sigma_i^2)\). If idiosyncratic_ar1=False, then we instead have \(\epsilon_{i,t} = e_{i,t}\).

This basic setup can be found in [1], [2], [3], and [4].

We allow for two generalizations of this model:

  1. Following [2], we allow multiple “blocks” of factors, which are independent from the other blocks of factors. Different blocks can be set to load on different subsets of the observed variables, and can be specified with different lag orders.

  2. Following [4] and [2], we allow mixed frequency models in which both monthly and quarterly data are used. See the section on “Mixed frequency models”, below, for more details.

Additional notes:

  • The observed data may contain arbitrary patterns of missing entries.

EM algorithm

This model contains a potentially very large number of parameters, and it can be difficult and take a prohibitively long time to numerically optimize the likelihood function using quasi-Newton methods. Instead, the default fitting method in this model uses the EM algorithm, as detailed in [1]. As a result, the model can accommodate datasets with hundreds of observed variables.

Mixed frequency data

This model can handle mixed frequency data in two ways. In this section, we only briefly describe this, and refer readers to [2] and [4] for all details.

First, because there can be arbitrary patterns of missing data in the observed vector, one can simply include lower frequency variables as observed in a particular higher frequency period, and missing otherwise. For example, in a monthly model, one could include quarterly data as occurring on the third month of each quarter. To use this method, one simply needs to combine the data into a single dataset at the higher frequency that can be passed to this model as the endog argument. However, depending on the type of variables used in the analysis and the assumptions about the data generating process, this approach may not be valid.

For example, suppose that we are interested in the growth rate of real GDP, which is measured at a quarterly frequency. If the basic factor model is specified at a monthly frequency, then the quarterly growth rate in the third month of each quarter – which is what we actually observe – is approximated by a particular weighted average of unobserved monthly growth rates. We need to take this particular weight moving average into account in constructing our model, and this is what the second approach does.

The second approach follows [2] and [4] in constructing a state space form to explicitly model the quarterly growth rates in terms of the unobserved monthly growth rates. To use this approach, there are two methods:

  1. Combine the monthly and quarterly data into a single dataset at the monthly frequency, with the monthly data in the first columns and the quarterly data in the last columns. Pass this dataset to the model as the endog argument and give the number of the variables that are monthly as the k_endog_monthly argument.

  2. Construct a monthly dataset as a Pandas DataFrame with a DatetimeIndex or PeriodIndex at the monthly frequency and separately construct a quarterly dataset as a Pandas DataFrame with a DatetimeIndex or PeriodIndex at the quarterly frequency. Pass the monthly DataFrame to the model as the endog argument and pass the quarterly DataFrame to the model as the endog_quarterly argument.

Note that this only incorporates one particular type of mixed frequency data. See also Banbura et al. (2013). “Now-Casting and the Real-Time Data Flow.” for discussion about other types of mixed frequency data that are not supported by this framework.

Nowcasting and the news

Through its support for monthly/quarterly mixed frequency data, this model can allow for the nowcasting of quarterly variables based on monthly observations. In particular, [2] and [3] use this model to construct nowcasts of real GDP and analyze the impacts of “the news”, derived from incoming data on a real-time basis. This latter functionality can be accessed through the news method of the results object.

Standardizing data

As is often the case in formulating a dynamic factor model, we do not explicitly account for the mean of each observed variable. Instead, the default behavior is to standardize each variable prior to estimation. Thus if \(y_t\) are the given observed data, the dynamic factor model is actually estimated on the standardized data defined by:

\[x_{i, t} = (y_{i, t} - \bar y_i) / s_i\]

where \(\bar y_i\) is the sample mean and \(s_i\) is the sample standard deviation.

By default, if standardization is applied prior to estimation, results such as in-sample predictions, out-of-sample forecasts, and the computation of the “news” are reported in the scale of the original data (i.e. the model output has the reverse transformation applied before it is returned to the user).

Standardization can be disabled by passing standardization=False to the model constructor.

Identification of factors and loadings

The estimated factors and the factor loadings in this model are only identified up to an invertible transformation. As described in (the working paper version of) [2], while it is possible to impose normalizations to achieve identification, the EM algorithm does will converge regardless. Moreover, for nowcasting and forecasting purposes, identification is not required. This model does not impose any normalization to identify the factors and the factor loadings.

Miscellaneous

There are two arguments available in the model constructor that are rarely used but which deserve a brief mention: init_t0 and obs_cov_diag. These arguments are provided to allow exactly matching the output of other packages that have slight differences in how the underlying state space model is set up / applied.

  • init_t0: state space models in Statsmodels follow Durbin and Koopman in initializing the model with \(\alpha_1 \sim N(a_1, P_1)\). Other implementations sometimes initialize instead with \(\alpha_0 \sim N(a_0, P_0)\). We can accommodate this by prepending a row of NaNs to the observed dataset.

  • obs_cov_diag: the state space form in [1] incorporates non-zero (but very small) diagonal elements for the observation disturbance covariance matrix.

References

[1] (1,2,3,4)

Bańbura, Marta, and Michele Modugno. “Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data.” Journal of Applied Econometrics 29, no. 1 (2014): 133-160.

[2] (1,2,3,4,5,6,7,8)

Bańbura, Marta, Domenico Giannone, and Lucrezia Reichlin. “Nowcasting.” The Oxford Handbook of Economic Forecasting. July 8, 2011.

[3] (1,2,3)

Bok, Brandyn, Daniele Caratelli, Domenico Giannone, Argia M. Sbordone, and Andrea Tambalotti. 2018. “Macroeconomic Nowcasting and Forecasting with Big Data.” Annual Review of Economics 10 (1): 615-43. https://doi.org/10.1146/annurev-economics-080217-053214.

[4] (1,2,3,4,5)

Mariano, Roberto S., and Yasutomo Murasawa. “A coincident index, common factors, and monthly real GDP.” Oxford Bulletin of Economics and Statistics 72, no. 1 (2010): 27-46.

Examples

Constructing and fitting a DynamicFactorMQ model.

>>> data = sm.datasets.macrodata.load_pandas().data.iloc[-100:]
>>> data.index = pd.period_range(start='1984Q4', end='2009Q3', freq='Q')
>>> endog = data[['infl', 'tbilrate']].resample('M').last()
>>> endog_Q = np.log(data[['realgdp', 'realcons']]).diff().iloc[1:] * 400

Basic usage

In the simplest case, passing only the endog argument results in a model with a single factor that follows an AR(1) process. Note that because we are not also providing an endog_quarterly dataset, endog can be a numpy array or Pandas DataFrame with any index (it does not have to be monthly).

The summary method can be useful in checking the model specification.

>>> mod = sm.tsa.DynamicFactorMQ(endog)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 1 factors in 1 blocks   # of factors:                    1
              + AR(1) idiosyncratic   Idiosyncratic disturbances:  AR(1)
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
========================
Dep. variable          0
------------------------
         infl          X
     tbilrate          X
    Factor blocks:
=====================
     block      order
---------------------
         0          1
=====================

Factors

With factors=2, there will be two independent factors that will each evolve according to separate AR(1) processes.

>>> mod = sm.tsa.DynamicFactorMQ(endog, factors=2)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 2 factors in 2 blocks   # of factors:                    2
              + AR(1) idiosyncratic   Idiosyncratic disturbances:  AR(1)
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
===================================
Dep. variable          0          1
-----------------------------------
         infl          X          X
     tbilrate          X          X
    Factor blocks:
=====================
     block      order
---------------------
         0          1
         1          1
=====================

Factor multiplicities

By instead specifying factor_multiplicities=2, we would still have two factors, but they would be dependent and would evolve jointly according to a VAR(1) process.

>>> mod = sm.tsa.DynamicFactorMQ(endog, factor_multiplicities=2)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 2 factors in 1 blocks   # of factors:                    2
              + AR(1) idiosyncratic   Idiosyncratic disturbances:  AR(1)
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
===================================
Dep. variable        0.1        0.2
-----------------------------------
         infl         X          X
     tbilrate         X          X
    Factor blocks:
=====================
     block      order
---------------------
  0.1, 0.2          1
=====================

Factor orders

In either of the above cases, we could extend the order of the (vector) autoregressions by using the factor_orders argument. For example, the below model would contain two independent factors that each evolve according to a separate AR(2) process:

>>> mod = sm.tsa.DynamicFactorMQ(endog, factors=2, factor_orders=2)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 2 factors in 2 blocks   # of factors:                    2
              + AR(1) idiosyncratic   Idiosyncratic disturbances:  AR(1)
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
===================================
Dep. variable          0          1
-----------------------------------
         infl          X          X
     tbilrate          X          X
    Factor blocks:
=====================
     block      order
---------------------
         0          2
         1          2
=====================

Serial correlation in the idiosyncratic disturbances

By default, the model allows each idiosyncratic disturbance terms to evolve according to an AR(1) process. If preferred, they can instead be specified to be serially independent by passing ididosyncratic_ar1=False.

>>> mod = sm.tsa.DynamicFactorMQ(endog, idiosyncratic_ar1=False)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 1 factors in 1 blocks   # of factors:                    1
                + iid idiosyncratic   Idiosyncratic disturbances:    iid
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
========================
Dep. variable          0
------------------------
         infl          X
     tbilrate          X
    Factor blocks:
=====================
     block      order
---------------------
         0          1
=====================

Monthly / Quarterly mixed frequency

To specify a monthly / quarterly mixed frequency model see the (Notes section for more details about these models):

>>> mod = sm.tsa.DynamicFactorMQ(endog, endog_quarterly=endog_Q)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 1 factors in 1 blocks   # of quarterly variables:        2
            + Mixed frequency (M/Q)   # of factors:                    1
              + AR(1) idiosyncratic   Idiosyncratic disturbances:  AR(1)
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
========================
Dep. variable          0
------------------------
         infl          X
     tbilrate          X
      realgdp          X
     realcons          X
    Factor blocks:
=====================
     block      order
---------------------
         0          1
=====================

Customize observed variable / factor loadings

To specify that certain that certain observed variables only load on certain factors, it is possible to pass a dictionary to the factors argument.

>>> factors = {'infl': ['global']
...            'tbilrate': ['global']
...            'realgdp': ['global', 'real']
...            'realcons': ['global', 'real']}
>>> mod = sm.tsa.DynamicFactorMQ(endog, endog_quarterly=endog_Q)
>>> print(mod.summary())
                    Model Specification: Dynamic Factor Model
==========================================================================
Model:         Dynamic Factor Model   # of monthly variables:          2
            + 2 factors in 2 blocks   # of quarterly variables:        2
            + Mixed frequency (M/Q)   # of factor blocks:              2
              + AR(1) idiosyncratic   Idiosyncratic disturbances:  AR(1)
Sample:                     1984-10   Standardize variables:        True
                          - 2009-09
Observed variables / factor loadings
===================================
Dep. variable     global       real
-----------------------------------
         infl       X
     tbilrate       X
      realgdp       X           X
     realcons       X           X
    Factor blocks:
=====================
     block      order
---------------------
    global          1
      real          1
=====================

Fitting parameters

To fit the model, use the fit method. This method uses the EM algorithm by default.

>>> mod = sm.tsa.DynamicFactorMQ(endog)
>>> res = mod.fit()
>>> print(res.summary())
                          Dynamic Factor Results
==========================================================================
Dep. Variable:      ['infl', 'tbilrate']   No. Observations:         300
Model:              Dynamic Factor Model   Log Likelihood       -127.909
                 + 1 factors in 1 blocks   AIC                   271.817
                   + AR(1) idiosyncratic   BIC                   301.447
Date:                   Tue, 04 Aug 2020   HQIC                  283.675
Time:                           15:59:11   EM Iterations              83
Sample:                       10-31-1984
                            - 09-30-2009
Covariance Type:            Not computed
                    Observation equation:
==============================================================
Factor loadings:          0    idiosyncratic: AR(1)       var.
--------------------------------------------------------------
            infl      -0.67                    0.39       0.73
        tbilrate      -0.63                    0.99       0.01
       Transition: Factor block 0
=======================================
                 L1.0    error variance
---------------------------------------
         0       0.98              0.01
=======================================
Warnings:
[1] Covariance matrix not calculated.

Displaying iteration progress

To display information about the EM iterations, use the disp argument.

>>> mod = sm.tsa.DynamicFactorMQ(endog)
>>> res = mod.fit(disp=10)
EM start iterations, llf=-291.21
EM iteration 10, llf=-157.17, convergence criterion=0.053801
EM iteration 20, llf=-128.99, convergence criterion=0.0035545
EM iteration 30, llf=-127.97, convergence criterion=0.00010224
EM iteration 40, llf=-127.93, convergence criterion=1.3281e-05
EM iteration 50, llf=-127.92, convergence criterion=5.4725e-06
EM iteration 60, llf=-127.91, convergence criterion=2.8665e-06
EM iteration 70, llf=-127.91, convergence criterion=1.6999e-06
EM iteration 80, llf=-127.91, convergence criterion=1.1085e-06
EM converged at iteration 83, llf=-127.91,
   convergence criterion=9.9004e-07 < tolerance=1e-06

Results: forecasting, impulse responses, and more

One the model is fitted, there are a number of methods available from the results object. Some examples include:

Forecasting

>>> mod = sm.tsa.DynamicFactorMQ(endog)
>>> res = mod.fit()
>>> print(res.forecast(steps=5))
             infl  tbilrate
2009-10  1.784169  0.260401
2009-11  1.735848  0.305981
2009-12  1.730674  0.350968
2010-01  1.742110  0.395369
2010-02  1.759786  0.439194

Impulse responses

>>> mod = sm.tsa.DynamicFactorMQ(endog)
>>> res = mod.fit()
>>> print(res.impulse_responses(steps=5))
       infl  tbilrate
0 -1.511956 -1.341498
1 -1.483172 -1.315960
2 -1.454937 -1.290908
3 -1.427240 -1.266333
4 -1.400069 -1.242226
5 -1.373416 -1.218578

For other available methods (including in-sample prediction, simulation of time series, extending the results to incorporate new data, and the news), see the documentation for state space models.

Methods

clone(endog[, k_endog_monthly, ...])

Clone state space model with new data and optionally new specification.

construct_endog(endog_monthly, endog_quarterly)

Construct a combined dataset from separate monthly and quarterly data.

filter(params[, transformed, ...])

Kalman filtering.

fit([start_params, transformed, ...])

Fits the model by maximum likelihood via Kalman filter.

fit_constrained(constraints[, start_params])

Fit the model with some parameters subject to equality constraints.

fit_em([start_params, transformed, ...])

Fits the model by maximum likelihood via the EM algorithm.

fix_params(params)

Fix parameters to specific values (context manager)

from_formula(formula, data[, subset])

Not implemented for state space models

handle_params(params[, transformed, ...])

Ensure model parameters satisfy shape and other requirements

hessian(params, *args, **kwargs)

Hessian matrix of the likelihood function, evaluated at the given parameters

impulse_responses(params[, steps, impulse, ...])

Impulse response function.

information(params)

Fisher information matrix of model.

initialize()

Initialize (possibly re-initialize) a Model instance.

initialize_approximate_diffuse([variance])

Initialize approximate diffuse

initialize_known(initial_state, ...)

Initialize known

initialize_statespace(**kwargs)

Initialize the state space representation

initialize_stationary()

Initialize stationary

loading_constraints(i)

Matrix formulation of quarterly variables' factor loading constraints.

loglike(params, *args, **kwargs)

Loglikelihood evaluation

loglikeobs(params[, transformed, ...])

Loglikelihood evaluation

observed_information_matrix(params[, ...])

Observed information matrix

opg_information_matrix(params[, ...])

Outer product of gradients information matrix

predict(params[, exog])

After a model has been fit predict returns the fitted values.

prepare_data()

Prepare data for use in the state space representation

score(params, *args, **kwargs)

Compute the score function at params.

score_obs(params[, method, transformed, ...])

Compute the score per observation, evaluated at params

set_conserve_memory([conserve_memory])

Set the memory conservation method

set_filter_method([filter_method])

Set the filtering method

set_inversion_method([inversion_method])

Set the inversion method

set_smoother_output([smoother_output])

Set the smoother output

set_stability_method([stability_method])

Set the numerical stability method

simulate(params, nsimulations[, ...])

Simulate a new time series following the state space model.

simulation_smoother([simulation_output])

Retrieve a simulation smoother for the state space model.

smooth(params[, transformed, ...])

Kalman smoothing.

summary([truncate_endog_names])

Create a summary table describing the model.

transform_jacobian(unconstrained[, ...])

Jacobian matrix for the parameter transformation function

transform_params(unconstrained)

Transform parameters from optimizer space to model space.

untransform_params(constrained)

Transform parameters from model space to optimizer space.

update(params, **kwargs)

Update the parameters of the model.

Properties

endog_names

Names of endogenous variables.

exog_names

The names of the exogenous variables.

initial_variance

initialization

loglike_constant

Constant term in the joint log-likelihood function.

loglikelihood_burn

param_names

(list of str) List of human readable parameter names.

start_params

(array) Starting parameters for maximum likelihood estimation.

state_names

(list of str) List of human readable names for unobserved states.

tolerance


Last update: Dec 16, 2024