Optimization

statsmodels uses three types of algorithms for the estimation of the parameters of a model.

  1. Basic linear models such as WLS and OLS are directly estimated using appropriate linear algebra.

  2. RLM and GLM, use iteratively re-weighted least squares. However, you can optionally select one of the scipy optimizers discussed below.

  3. For all other models, we use optimizers from scipy.

Where practical, certain models allow for the optional selection of a scipy optimizer. A particular scipy optimizer might be default or an option. Depending on the model and the data, choosing an appropriate scipy optimizer enables avoidance of a local minima, fitting models in less time, or fitting a model with less memory.

statsmodels supports the following optimizers along with keyword arguments associated with that specific optimizer:

  • newton - Newton-Raphson iteration. While not directly from scipy, we consider it an optimizer because only the score and hessian are required.

    tolfloat

    Relative error in params acceptable for convergence.

  • nm - scipy’s fmin_nm

    xtolfloat

    Relative error in params acceptable for convergence

    ftolfloat

    Relative error in loglike(params) acceptable for convergence

    maxfunint

    Maximum number of function evaluations to make.

  • bfgs - Broyden–Fletcher–Goldfarb–Shanno optimization, scipy’s fmin_bfgs.

    gtolfloat

    Stop when norm of gradient is less than gtol.

    normfloat

    Order of norm (np.inf is max, -np.inf is min)

    epsilon

    If fprime is approximated, use this value for the step size. Only relevant if LikelihoodModel.score is None.

  • lbfgs - A more memory-efficient (limited memory) implementation of bfgs. Scipy’s fmin_l_bfgs_b.

    mint

    The maximum number of variable metric corrections used to define the limited memory matrix. (The limited memory BFGS method does not store the full hessian but uses this many terms in an approximation to it.)

    pgtolfloat

    The iteration will stop when max{|proj g_i | i = 1, ..., n} <= pgtol where pg_i is the i-th component of the projected gradient.

    factrfloat

    The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= factr * eps, where eps is the machine precision, which is automatically generated by the code. Typical values for factr are: 1e12 for low accuracy; 1e7 for moderate accuracy; 10.0 for extremely high accuracy. See Notes for relationship to ftol, which is exposed (instead of factr) by the scipy.optimize.minimize interface to L-BFGS-B.

    maxfunint

    Maximum number of iterations.

    epsilonfloat

    Step size used when approx_grad is True, for numerically calculating the gradient

    approx_gradbool

    Whether to approximate the gradient numerically (in which case func returns only the function value).

  • cg - Conjugate gradient optimization. Scipy’s fmin_cg.

    gtolfloat

    Stop when norm of gradient is less than gtol.

    normfloat

    Order of norm (np.inf is max, -np.inf is min)

    epsilonfloat

    If fprime is approximated, use this value for the step size. Can be scalar or vector. Only relevant if Likelihoodmodel.score is None.

  • ncg - Newton conjugate gradient. Scipy’s fmin_ncg.

    fhess_pcallable f’(x, *args)

    Function which computes the Hessian of f times an arbitrary vector, p. Should only be supplied if LikelihoodModel.hessian is None.

    avextolfloat

    Stop when the average relative error in the minimizer falls below this amount.

    epsilonfloat or ndarray

    If fhess is approximated, use this value for the step size. Only relevant if Likelihoodmodel.hessian is None.

  • powell - Powell’s method. Scipy’s fmin_powell.

    xtolfloat

    Line-search error tolerance

    ftolfloat

    Relative error in loglike(params) for acceptable for convergence.

    maxfunint

    Maximum number of function evaluations to make.

    start_direcndarray

    Initial direction set.

  • basinhopping - Basin hopping. This is part of scipy’s basinhopping tools.

    niterinteger

    The number of basin hopping iterations.

    niter_successinteger

    Stop the run if the global minimum candidate remains the same for this number of iterations.

    Tfloat

    The “temperature” parameter for the accept or reject criterion. Higher “temperatures” mean that larger jumps in function value will be accepted. For best results T should be comparable to the separation (in function value) between local minima.

    stepsizefloat

    Initial step size for use in the random displacement.

    intervalinteger

    The interval for how often to update the stepsize.

    minimizerdict

    Extra keyword arguments to be passed to the minimizer scipy.optimize.minimize(), for example ‘method’ - the minimization method (e.g. ‘L-BFGS-B’), or ‘tol’ - the tolerance for termination. Other arguments are mapped from explicit argument of fit: - args <- fargs - jac <- score - hess <- hess

  • minimize - Allows the use of any scipy optimizer.

    min_methodstr, optional

    Name of minimization method to use. Any method specific arguments can be passed directly. For a list of methods and their arguments, see documentation of scipy.optimize.minimize. If no method is specified, then BFGS is used.

Model Class

Generally, there is no need for an end-user to directly call these functions and classes. However, we provide the class because the different optimization techniques have unique keyword arguments that may be useful to the user.

Optimizer()

_fit_newton(f, score, start_params, fargs, ...)

Fit using Newton-Raphson algorithm.

_fit_bfgs(f, score, start_params, fargs, kwargs)

Fit using Broyden-Fletcher-Goldfarb-Shannon algorithm.

_fit_lbfgs(f, score, start_params, fargs, kwargs)

Fit using Limited-memory Broyden-Fletcher-Goldfarb-Shannon algorithm.

_fit_nm(f, score, start_params, fargs, kwargs)

Fit using Nelder-Mead algorithm.

_fit_cg(f, score, start_params, fargs, kwargs)

Fit using Conjugate Gradient algorithm.

_fit_ncg(f, score, start_params, fargs, kwargs)

Fit using Newton Conjugate Gradient algorithm.

_fit_powell(f, score, start_params, fargs, ...)

Fit using Powell's conjugate direction algorithm.

_fit_basinhopping(f, score, start_params, ...)

Fit using Basin-hopping algorithm.


Last update: Jan 20, 2025