{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deterministic Terms in Time Series Models" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:18.230527Z", "iopub.status.busy": "2021-02-02T06:55:18.229814Z", "iopub.status.idle": "2021-02-02T06:55:18.754169Z", "shell.execute_reply": "2021-02-02T06:55:18.754576Z" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "plt.rc(\"figure\", figsize=(16, 9))\n", "plt.rc(\"font\", size=16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Use\n", "\n", "Basic configurations can be directly constructed through `DeterministicProcess`. These can include a constant, a time trend of any order, and either a seasonal or a Fourier component.\n", "\n", "The process requires an index, which is the index of the full-sample (or in-sample).\n", "\n", "First, we initialize a deterministic process with a constant, a linear time trend, and a 5-period seasonal term. The `in_sample` method returns the full set of values that match the index." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:18.757763Z", "iopub.status.busy": "2021-02-02T06:55:18.757082Z", "iopub.status.idle": "2021-02-02T06:55:18.895967Z", "shell.execute_reply": "2021-02-02T06:55:18.896459Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consttrends(2,5)s(3,5)s(4,5)s(5,5)
01.01.00.00.00.00.0
11.02.01.00.00.00.0
21.03.00.01.00.00.0
31.04.00.00.01.00.0
41.05.00.00.00.01.0
.....................
951.096.00.00.00.00.0
961.097.01.00.00.00.0
971.098.00.01.00.00.0
981.099.00.00.01.00.0
991.0100.00.00.00.01.0
\n", "

100 rows × 6 columns

\n", "
" ], "text/plain": [ " const trend s(2,5) s(3,5) s(4,5) s(5,5)\n", "0 1.0 1.0 0.0 0.0 0.0 0.0\n", "1 1.0 2.0 1.0 0.0 0.0 0.0\n", "2 1.0 3.0 0.0 1.0 0.0 0.0\n", "3 1.0 4.0 0.0 0.0 1.0 0.0\n", "4 1.0 5.0 0.0 0.0 0.0 1.0\n", ".. ... ... ... ... ... ...\n", "95 1.0 96.0 0.0 0.0 0.0 0.0\n", "96 1.0 97.0 1.0 0.0 0.0 0.0\n", "97 1.0 98.0 0.0 1.0 0.0 0.0\n", "98 1.0 99.0 0.0 0.0 1.0 0.0\n", "99 1.0 100.0 0.0 0.0 0.0 1.0\n", "\n", "[100 rows x 6 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from statsmodels.tsa.deterministic import DeterministicProcess\n", "\n", "index = pd.RangeIndex(0, 100)\n", "det_proc = DeterministicProcess(\n", " index, constant=True, order=1, seasonal=True, period=5\n", ")\n", "det_proc.in_sample()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `out_of_sample` returns the next `steps` values after the end of the in-sample." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:18.900423Z", "iopub.status.busy": "2021-02-02T06:55:18.899263Z", "iopub.status.idle": "2021-02-02T06:55:18.932619Z", "shell.execute_reply": "2021-02-02T06:55:18.933212Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consttrends(2,5)s(3,5)s(4,5)s(5,5)
1001.0101.00.00.00.00.0
1011.0102.01.00.00.00.0
1021.0103.00.01.00.00.0
1031.0104.00.00.01.00.0
1041.0105.00.00.00.01.0
1051.0106.00.00.00.00.0
1061.0107.01.00.00.00.0
1071.0108.00.01.00.00.0
1081.0109.00.00.01.00.0
1091.0110.00.00.00.01.0
1101.0111.00.00.00.00.0
1111.0112.01.00.00.00.0
1121.0113.00.01.00.00.0
1131.0114.00.00.01.00.0
1141.0115.00.00.00.01.0
\n", "
" ], "text/plain": [ " const trend s(2,5) s(3,5) s(4,5) s(5,5)\n", "100 1.0 101.0 0.0 0.0 0.0 0.0\n", "101 1.0 102.0 1.0 0.0 0.0 0.0\n", "102 1.0 103.0 0.0 1.0 0.0 0.0\n", "103 1.0 104.0 0.0 0.0 1.0 0.0\n", "104 1.0 105.0 0.0 0.0 0.0 1.0\n", "105 1.0 106.0 0.0 0.0 0.0 0.0\n", "106 1.0 107.0 1.0 0.0 0.0 0.0\n", "107 1.0 108.0 0.0 1.0 0.0 0.0\n", "108 1.0 109.0 0.0 0.0 1.0 0.0\n", "109 1.0 110.0 0.0 0.0 0.0 1.0\n", "110 1.0 111.0 0.0 0.0 0.0 0.0\n", "111 1.0 112.0 1.0 0.0 0.0 0.0\n", "112 1.0 113.0 0.0 1.0 0.0 0.0\n", "113 1.0 114.0 0.0 0.0 1.0 0.0\n", "114 1.0 115.0 0.0 0.0 0.0 1.0" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.out_of_sample(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`range(start, stop)` can also be used to produce the deterministic terms over any range including in- and out-of-sample.\n", "\n", "### Notes\n", "\n", "* When the index is a pandas `DatetimeIndex` or a `PeriodIndex`, then `start` and `stop` can be date-like (strings, e.g., \"2020-06-01\", or Timestamp) or integers.\n", "* `stop` is always included in the range. While this is not very Pythonic, it is needed since both statsmodels and Pandas include `stop` when working with date-like slices." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:18.937137Z", "iopub.status.busy": "2021-02-02T06:55:18.935951Z", "iopub.status.idle": "2021-02-02T06:55:18.967543Z", "shell.execute_reply": "2021-02-02T06:55:18.968440Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consttrends(2,5)s(3,5)s(4,5)s(5,5)
1901.0191.00.00.00.00.0
1911.0192.01.00.00.00.0
1921.0193.00.01.00.00.0
1931.0194.00.00.01.00.0
1941.0195.00.00.00.01.0
1951.0196.00.00.00.00.0
1961.0197.01.00.00.00.0
1971.0198.00.01.00.00.0
1981.0199.00.00.01.00.0
1991.0200.00.00.00.01.0
2001.0201.00.00.00.00.0
2011.0202.01.00.00.00.0
2021.0203.00.01.00.00.0
2031.0204.00.00.01.00.0
2041.0205.00.00.00.01.0
2051.0206.00.00.00.00.0
2061.0207.01.00.00.00.0
2071.0208.00.01.00.00.0
2081.0209.00.00.01.00.0
2091.0210.00.00.00.01.0
2101.0211.00.00.00.00.0
\n", "
" ], "text/plain": [ " const trend s(2,5) s(3,5) s(4,5) s(5,5)\n", "190 1.0 191.0 0.0 0.0 0.0 0.0\n", "191 1.0 192.0 1.0 0.0 0.0 0.0\n", "192 1.0 193.0 0.0 1.0 0.0 0.0\n", "193 1.0 194.0 0.0 0.0 1.0 0.0\n", "194 1.0 195.0 0.0 0.0 0.0 1.0\n", "195 1.0 196.0 0.0 0.0 0.0 0.0\n", "196 1.0 197.0 1.0 0.0 0.0 0.0\n", "197 1.0 198.0 0.0 1.0 0.0 0.0\n", "198 1.0 199.0 0.0 0.0 1.0 0.0\n", "199 1.0 200.0 0.0 0.0 0.0 1.0\n", "200 1.0 201.0 0.0 0.0 0.0 0.0\n", "201 1.0 202.0 1.0 0.0 0.0 0.0\n", "202 1.0 203.0 0.0 1.0 0.0 0.0\n", "203 1.0 204.0 0.0 0.0 1.0 0.0\n", "204 1.0 205.0 0.0 0.0 0.0 1.0\n", "205 1.0 206.0 0.0 0.0 0.0 0.0\n", "206 1.0 207.0 1.0 0.0 0.0 0.0\n", "207 1.0 208.0 0.0 1.0 0.0 0.0\n", "208 1.0 209.0 0.0 0.0 1.0 0.0\n", "209 1.0 210.0 0.0 0.0 0.0 1.0\n", "210 1.0 211.0 0.0 0.0 0.0 0.0" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.range(190, 210)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a Date-like Index\n", "\n", "Next, we show the same steps using a `PeriodIndex`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:18.972435Z", "iopub.status.busy": "2021-02-02T06:55:18.971206Z", "iopub.status.idle": "2021-02-02T06:55:18.999294Z", "shell.execute_reply": "2021-02-02T06:55:19.000165Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
constsin(1,12)cos(1,12)sin(2,12)cos(2,12)
2020-031.00.000000e+001.000000e+000.000000e+001.0
2020-041.05.000000e-018.660254e-018.660254e-010.5
2020-051.08.660254e-015.000000e-018.660254e-01-0.5
2020-061.01.000000e+006.123234e-171.224647e-16-1.0
2020-071.08.660254e-01-5.000000e-01-8.660254e-01-0.5
2020-081.05.000000e-01-8.660254e-01-8.660254e-010.5
2020-091.01.224647e-16-1.000000e+00-2.449294e-161.0
2020-101.0-5.000000e-01-8.660254e-018.660254e-010.5
2020-111.0-8.660254e-01-5.000000e-018.660254e-01-0.5
2020-121.0-1.000000e+00-1.836970e-163.673940e-16-1.0
2021-011.0-8.660254e-015.000000e-01-8.660254e-01-0.5
2021-021.0-5.000000e-018.660254e-01-8.660254e-010.5
\n", "
" ], "text/plain": [ " const sin(1,12) cos(1,12) sin(2,12) cos(2,12)\n", "2020-03 1.0 0.000000e+00 1.000000e+00 0.000000e+00 1.0\n", "2020-04 1.0 5.000000e-01 8.660254e-01 8.660254e-01 0.5\n", "2020-05 1.0 8.660254e-01 5.000000e-01 8.660254e-01 -0.5\n", "2020-06 1.0 1.000000e+00 6.123234e-17 1.224647e-16 -1.0\n", "2020-07 1.0 8.660254e-01 -5.000000e-01 -8.660254e-01 -0.5\n", "2020-08 1.0 5.000000e-01 -8.660254e-01 -8.660254e-01 0.5\n", "2020-09 1.0 1.224647e-16 -1.000000e+00 -2.449294e-16 1.0\n", "2020-10 1.0 -5.000000e-01 -8.660254e-01 8.660254e-01 0.5\n", "2020-11 1.0 -8.660254e-01 -5.000000e-01 8.660254e-01 -0.5\n", "2020-12 1.0 -1.000000e+00 -1.836970e-16 3.673940e-16 -1.0\n", "2021-01 1.0 -8.660254e-01 5.000000e-01 -8.660254e-01 -0.5\n", "2021-02 1.0 -5.000000e-01 8.660254e-01 -8.660254e-01 0.5" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index = pd.period_range(\"2020-03-01\", freq=\"M\", periods=60)\n", "det_proc = DeterministicProcess(index, constant=True, fourier=2)\n", "det_proc.in_sample().head(12)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.003965Z", "iopub.status.busy": "2021-02-02T06:55:19.002754Z", "iopub.status.idle": "2021-02-02T06:55:19.023876Z", "shell.execute_reply": "2021-02-02T06:55:19.024737Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
constsin(1,12)cos(1,12)sin(2,12)cos(2,12)
2025-031.0-1.224647e-151.000000e+00-2.449294e-151.0
2025-041.05.000000e-018.660254e-018.660254e-010.5
2025-051.08.660254e-015.000000e-018.660254e-01-0.5
2025-061.01.000000e+00-4.904777e-16-9.809554e-16-1.0
2025-071.08.660254e-01-5.000000e-01-8.660254e-01-0.5
2025-081.05.000000e-01-8.660254e-01-8.660254e-010.5
2025-091.04.899825e-15-1.000000e+00-9.799650e-151.0
2025-101.0-5.000000e-01-8.660254e-018.660254e-010.5
2025-111.0-8.660254e-01-5.000000e-018.660254e-01-0.5
2025-121.0-1.000000e+00-3.184701e-156.369401e-15-1.0
2026-011.0-8.660254e-015.000000e-01-8.660254e-01-0.5
2026-021.0-5.000000e-018.660254e-01-8.660254e-010.5
\n", "
" ], "text/plain": [ " const sin(1,12) cos(1,12) sin(2,12) cos(2,12)\n", "2025-03 1.0 -1.224647e-15 1.000000e+00 -2.449294e-15 1.0\n", "2025-04 1.0 5.000000e-01 8.660254e-01 8.660254e-01 0.5\n", "2025-05 1.0 8.660254e-01 5.000000e-01 8.660254e-01 -0.5\n", "2025-06 1.0 1.000000e+00 -4.904777e-16 -9.809554e-16 -1.0\n", "2025-07 1.0 8.660254e-01 -5.000000e-01 -8.660254e-01 -0.5\n", "2025-08 1.0 5.000000e-01 -8.660254e-01 -8.660254e-01 0.5\n", "2025-09 1.0 4.899825e-15 -1.000000e+00 -9.799650e-15 1.0\n", "2025-10 1.0 -5.000000e-01 -8.660254e-01 8.660254e-01 0.5\n", "2025-11 1.0 -8.660254e-01 -5.000000e-01 8.660254e-01 -0.5\n", "2025-12 1.0 -1.000000e+00 -3.184701e-15 6.369401e-15 -1.0\n", "2026-01 1.0 -8.660254e-01 5.000000e-01 -8.660254e-01 -0.5\n", "2026-02 1.0 -5.000000e-01 8.660254e-01 -8.660254e-01 0.5" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.out_of_sample(12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`range` accepts date-like arguments, which are usually given as strings." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.029461Z", "iopub.status.busy": "2021-02-02T06:55:19.028243Z", "iopub.status.idle": "2021-02-02T06:55:19.050651Z", "shell.execute_reply": "2021-02-02T06:55:19.051421Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
constsin(1,12)cos(1,12)sin(2,12)cos(2,12)
2025-011.0-8.660254e-015.000000e-01-8.660254e-01-0.5
2025-021.0-5.000000e-018.660254e-01-8.660254e-010.5
2025-031.0-1.224647e-151.000000e+00-2.449294e-151.0
2025-041.05.000000e-018.660254e-018.660254e-010.5
2025-051.08.660254e-015.000000e-018.660254e-01-0.5
2025-061.01.000000e+00-4.904777e-16-9.809554e-16-1.0
2025-071.08.660254e-01-5.000000e-01-8.660254e-01-0.5
2025-081.05.000000e-01-8.660254e-01-8.660254e-010.5
2025-091.04.899825e-15-1.000000e+00-9.799650e-151.0
2025-101.0-5.000000e-01-8.660254e-018.660254e-010.5
2025-111.0-8.660254e-01-5.000000e-018.660254e-01-0.5
2025-121.0-1.000000e+00-3.184701e-156.369401e-15-1.0
2026-011.0-8.660254e-015.000000e-01-8.660254e-01-0.5
\n", "
" ], "text/plain": [ " const sin(1,12) cos(1,12) sin(2,12) cos(2,12)\n", "2025-01 1.0 -8.660254e-01 5.000000e-01 -8.660254e-01 -0.5\n", "2025-02 1.0 -5.000000e-01 8.660254e-01 -8.660254e-01 0.5\n", "2025-03 1.0 -1.224647e-15 1.000000e+00 -2.449294e-15 1.0\n", "2025-04 1.0 5.000000e-01 8.660254e-01 8.660254e-01 0.5\n", "2025-05 1.0 8.660254e-01 5.000000e-01 8.660254e-01 -0.5\n", "2025-06 1.0 1.000000e+00 -4.904777e-16 -9.809554e-16 -1.0\n", "2025-07 1.0 8.660254e-01 -5.000000e-01 -8.660254e-01 -0.5\n", "2025-08 1.0 5.000000e-01 -8.660254e-01 -8.660254e-01 0.5\n", "2025-09 1.0 4.899825e-15 -1.000000e+00 -9.799650e-15 1.0\n", "2025-10 1.0 -5.000000e-01 -8.660254e-01 8.660254e-01 0.5\n", "2025-11 1.0 -8.660254e-01 -5.000000e-01 8.660254e-01 -0.5\n", "2025-12 1.0 -1.000000e+00 -3.184701e-15 6.369401e-15 -1.0\n", "2026-01 1.0 -8.660254e-01 5.000000e-01 -8.660254e-01 -0.5" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.range(\"2025-01\", \"2026-01\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is equivalent to using the integer values 58 and 70." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.054946Z", "iopub.status.busy": "2021-02-02T06:55:19.053855Z", "iopub.status.idle": "2021-02-02T06:55:19.074663Z", "shell.execute_reply": "2021-02-02T06:55:19.075468Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
constsin(1,12)cos(1,12)sin(2,12)cos(2,12)
2025-011.0-8.660254e-015.000000e-01-8.660254e-01-0.5
2025-021.0-5.000000e-018.660254e-01-8.660254e-010.5
2025-031.0-1.224647e-151.000000e+00-2.449294e-151.0
2025-041.05.000000e-018.660254e-018.660254e-010.5
2025-051.08.660254e-015.000000e-018.660254e-01-0.5
2025-061.01.000000e+00-4.904777e-16-9.809554e-16-1.0
2025-071.08.660254e-01-5.000000e-01-8.660254e-01-0.5
2025-081.05.000000e-01-8.660254e-01-8.660254e-010.5
2025-091.04.899825e-15-1.000000e+00-9.799650e-151.0
2025-101.0-5.000000e-01-8.660254e-018.660254e-010.5
2025-111.0-8.660254e-01-5.000000e-018.660254e-01-0.5
2025-121.0-1.000000e+00-3.184701e-156.369401e-15-1.0
2026-011.0-8.660254e-015.000000e-01-8.660254e-01-0.5
\n", "
" ], "text/plain": [ " const sin(1,12) cos(1,12) sin(2,12) cos(2,12)\n", "2025-01 1.0 -8.660254e-01 5.000000e-01 -8.660254e-01 -0.5\n", "2025-02 1.0 -5.000000e-01 8.660254e-01 -8.660254e-01 0.5\n", "2025-03 1.0 -1.224647e-15 1.000000e+00 -2.449294e-15 1.0\n", "2025-04 1.0 5.000000e-01 8.660254e-01 8.660254e-01 0.5\n", "2025-05 1.0 8.660254e-01 5.000000e-01 8.660254e-01 -0.5\n", "2025-06 1.0 1.000000e+00 -4.904777e-16 -9.809554e-16 -1.0\n", "2025-07 1.0 8.660254e-01 -5.000000e-01 -8.660254e-01 -0.5\n", "2025-08 1.0 5.000000e-01 -8.660254e-01 -8.660254e-01 0.5\n", "2025-09 1.0 4.899825e-15 -1.000000e+00 -9.799650e-15 1.0\n", "2025-10 1.0 -5.000000e-01 -8.660254e-01 8.660254e-01 0.5\n", "2025-11 1.0 -8.660254e-01 -5.000000e-01 8.660254e-01 -0.5\n", "2025-12 1.0 -1.000000e+00 -3.184701e-15 6.369401e-15 -1.0\n", "2026-01 1.0 -8.660254e-01 5.000000e-01 -8.660254e-01 -0.5" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.range(58, 70)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Construction\n", "\n", "Deterministic processes with features not supported directly through the constructor can be created using `additional_terms` which accepts a list of `DetermisticTerm`. Here we create a deterministic process with two seasonal components: day-of-week with a 5 day period and an annual captured through a Fourier component with a period of 365.25 days." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.079699Z", "iopub.status.busy": "2021-02-02T06:55:19.078374Z", "iopub.status.idle": "2021-02-02T06:55:19.136391Z", "shell.execute_reply": "2021-02-02T06:55:19.137288Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consts(2,7)s(3,7)s(4,7)s(5,7)s(6,7)s(7,7)sin(1,365.25)cos(1,365.25)sin(2,365.25)cos(2,365.25)
2020-03-011.00.00.00.00.00.00.00.0000001.0000000.0000001.000000
2020-03-021.01.00.00.00.00.00.00.0172020.9998520.0343980.999408
2020-03-031.00.01.00.00.00.00.00.0343980.9994080.0687550.997634
2020-03-041.00.00.01.00.00.00.00.0515840.9986690.1030310.994678
2020-03-051.00.00.00.01.00.00.00.0687550.9976340.1371850.990545
2020-03-061.00.00.00.00.01.00.00.0859060.9963030.1711770.985240
2020-03-071.00.00.00.00.00.01.00.1030310.9946780.2049660.978769
2020-03-081.00.00.00.00.00.00.00.1201260.9927590.2385130.971139
2020-03-091.01.00.00.00.00.00.00.1371850.9905450.2717770.962360
2020-03-101.00.01.00.00.00.00.00.1542040.9880390.3047190.952442
2020-03-111.00.00.01.00.00.00.00.1711770.9852400.3373010.941397
2020-03-121.00.00.00.01.00.00.00.1880990.9821500.3694840.929237
2020-03-131.00.00.00.00.01.00.00.2049660.9787690.4012290.915978
2020-03-141.00.00.00.00.00.01.00.2217720.9750990.4324990.901634
2020-03-151.00.00.00.00.00.00.00.2385130.9711390.4632580.886224
2020-03-161.01.00.00.00.00.00.00.2551820.9668930.4934680.869764
2020-03-171.00.01.00.00.00.00.00.2717770.9623600.5230940.852275
2020-03-181.00.00.01.00.00.00.00.2882910.9575430.5521010.833777
2020-03-191.00.00.00.01.00.00.00.3047190.9524420.5804550.814292
2020-03-201.00.00.00.00.01.00.00.3210580.9470600.6081210.793844
2020-03-211.00.00.00.00.00.01.00.3373010.9413970.6350680.772456
2020-03-221.00.00.00.00.00.00.00.3534450.9354550.6612630.750154
2020-03-231.01.00.00.00.00.00.00.3694840.9292370.6866760.726964
2020-03-241.00.01.00.00.00.00.00.3854130.9227440.7112760.702913
2020-03-251.00.00.01.00.00.00.00.4012290.9159780.7350340.678031
2020-03-261.00.00.00.01.00.00.00.4169260.9089400.7579220.652346
2020-03-271.00.00.00.00.01.00.00.4324990.9016340.7799130.625889
2020-03-281.00.00.00.00.00.01.00.4479450.8940610.8009800.598691
\n", "
" ], "text/plain": [ " const s(2,7) s(3,7) s(4,7) s(5,7) s(6,7) s(7,7) \\\n", "2020-03-01 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-02 1.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-03 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2020-03-04 1.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "2020-03-05 1.0 0.0 0.0 0.0 1.0 0.0 0.0 \n", "2020-03-06 1.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "2020-03-07 1.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "2020-03-08 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-09 1.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-10 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2020-03-11 1.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "2020-03-12 1.0 0.0 0.0 0.0 1.0 0.0 0.0 \n", "2020-03-13 1.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "2020-03-14 1.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "2020-03-15 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-16 1.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-17 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2020-03-18 1.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "2020-03-19 1.0 0.0 0.0 0.0 1.0 0.0 0.0 \n", "2020-03-20 1.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "2020-03-21 1.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "2020-03-22 1.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-23 1.0 1.0 0.0 0.0 0.0 0.0 0.0 \n", "2020-03-24 1.0 0.0 1.0 0.0 0.0 0.0 0.0 \n", "2020-03-25 1.0 0.0 0.0 1.0 0.0 0.0 0.0 \n", "2020-03-26 1.0 0.0 0.0 0.0 1.0 0.0 0.0 \n", "2020-03-27 1.0 0.0 0.0 0.0 0.0 1.0 0.0 \n", "2020-03-28 1.0 0.0 0.0 0.0 0.0 0.0 1.0 \n", "\n", " sin(1,365.25) cos(1,365.25) sin(2,365.25) cos(2,365.25) \n", "2020-03-01 0.000000 1.000000 0.000000 1.000000 \n", "2020-03-02 0.017202 0.999852 0.034398 0.999408 \n", "2020-03-03 0.034398 0.999408 0.068755 0.997634 \n", "2020-03-04 0.051584 0.998669 0.103031 0.994678 \n", "2020-03-05 0.068755 0.997634 0.137185 0.990545 \n", "2020-03-06 0.085906 0.996303 0.171177 0.985240 \n", "2020-03-07 0.103031 0.994678 0.204966 0.978769 \n", "2020-03-08 0.120126 0.992759 0.238513 0.971139 \n", "2020-03-09 0.137185 0.990545 0.271777 0.962360 \n", "2020-03-10 0.154204 0.988039 0.304719 0.952442 \n", "2020-03-11 0.171177 0.985240 0.337301 0.941397 \n", "2020-03-12 0.188099 0.982150 0.369484 0.929237 \n", "2020-03-13 0.204966 0.978769 0.401229 0.915978 \n", "2020-03-14 0.221772 0.975099 0.432499 0.901634 \n", "2020-03-15 0.238513 0.971139 0.463258 0.886224 \n", "2020-03-16 0.255182 0.966893 0.493468 0.869764 \n", "2020-03-17 0.271777 0.962360 0.523094 0.852275 \n", "2020-03-18 0.288291 0.957543 0.552101 0.833777 \n", "2020-03-19 0.304719 0.952442 0.580455 0.814292 \n", "2020-03-20 0.321058 0.947060 0.608121 0.793844 \n", "2020-03-21 0.337301 0.941397 0.635068 0.772456 \n", "2020-03-22 0.353445 0.935455 0.661263 0.750154 \n", "2020-03-23 0.369484 0.929237 0.686676 0.726964 \n", "2020-03-24 0.385413 0.922744 0.711276 0.702913 \n", "2020-03-25 0.401229 0.915978 0.735034 0.678031 \n", "2020-03-26 0.416926 0.908940 0.757922 0.652346 \n", "2020-03-27 0.432499 0.901634 0.779913 0.625889 \n", "2020-03-28 0.447945 0.894061 0.800980 0.598691 " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from statsmodels.tsa.deterministic import Fourier, Seasonality, TimeTrend\n", "\n", "index = pd.period_range(\"2020-03-01\", freq=\"D\", periods=2 * 365)\n", "tt = TimeTrend(constant=True)\n", "four = Fourier(period=365.25, order=2)\n", "seas = Seasonality(period=7)\n", "det_proc = DeterministicProcess(index, additional_terms=[tt, seas, four])\n", "det_proc.in_sample().head(28)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Custom Deterministic Terms\n", "\n", "The `DetermisticTerm` Abstract Base Class is designed to be subclassed to help users write custom deterministic terms. We next show two examples. The first is a broken time trend that allows a break after a fixed number of periods. The second is a \"trick\" deterministic term that allows exogenous data, which is not really a deterministic process, to be treated as if was deterministic. This lets use simplify gathering the terms needed for forecasting.\n", "\n", "These are intended to demonstrate the construction of custom terms. They can definitely be improved in terms of input validation." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.141353Z", "iopub.status.busy": "2021-02-02T06:55:19.140125Z", "iopub.status.idle": "2021-02-02T06:55:19.150826Z", "shell.execute_reply": "2021-02-02T06:55:19.151671Z" } }, "outputs": [], "source": [ "from statsmodels.tsa.deterministic import DeterministicTerm\n", "\n", "\n", "class BrokenTimeTrend(DeterministicTerm):\n", " def __init__(self, break_period: int):\n", " self._break_period = break_period\n", "\n", " def __str__(self):\n", " return \"Broken Time Trend\"\n", "\n", " def _eq_attr(self):\n", " return (self._break_period,)\n", "\n", " def in_sample(self, index: pd.Index):\n", " nobs = index.shape[0]\n", " terms = np.zeros((nobs, 2))\n", " terms[self._break_period :, 0] = 1\n", " terms[self._break_period :, 1] = np.arange(\n", " self._break_period + 1, nobs + 1\n", " )\n", " return pd.DataFrame(\n", " terms, columns=[\"const_break\", \"trend_break\"], index=index\n", " )\n", "\n", " def out_of_sample(\n", " self, steps: int, index: pd.Index, forecast_index: pd.Index = None\n", " ):\n", " # Always call extend index first\n", " fcast_index = self._extend_index(index, steps, forecast_index)\n", " nobs = index.shape[0]\n", " terms = np.zeros((steps, 2))\n", " # Assume break period is in-sample\n", " terms[:, 0] = 1\n", " terms[:, 1] = np.arange(nobs + 1, nobs + steps + 1)\n", " return pd.DataFrame(\n", " terms, columns=[\"const_break\", \"trend_break\"], index=fcast_index\n", " )" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.155412Z", "iopub.status.busy": "2021-02-02T06:55:19.154236Z", "iopub.status.idle": "2021-02-02T06:55:19.177652Z", "shell.execute_reply": "2021-02-02T06:55:19.178485Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consttrendconst_breaktrend_break
551.056.00.00.0
561.057.00.00.0
571.058.00.00.0
581.059.00.00.0
591.060.00.00.0
601.061.01.061.0
611.062.01.062.0
621.063.01.063.0
631.064.01.064.0
641.065.01.065.0
651.066.01.066.0
\n", "
" ], "text/plain": [ " const trend const_break trend_break\n", "55 1.0 56.0 0.0 0.0\n", "56 1.0 57.0 0.0 0.0\n", "57 1.0 58.0 0.0 0.0\n", "58 1.0 59.0 0.0 0.0\n", "59 1.0 60.0 0.0 0.0\n", "60 1.0 61.0 1.0 61.0\n", "61 1.0 62.0 1.0 62.0\n", "62 1.0 63.0 1.0 63.0\n", "63 1.0 64.0 1.0 64.0\n", "64 1.0 65.0 1.0 65.0\n", "65 1.0 66.0 1.0 66.0" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "btt = BrokenTimeTrend(60)\n", "tt = TimeTrend(constant=True, order=1)\n", "index = pd.RangeIndex(100)\n", "det_proc = DeterministicProcess(index, additional_terms=[tt, btt])\n", "det_proc.range(55, 65)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we write a simple \"wrapper\" for some actual exogenous data that simplifies constructing out-of-sample exogenous arrays for forecasting." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.182275Z", "iopub.status.busy": "2021-02-02T06:55:19.181104Z", "iopub.status.idle": "2021-02-02T06:55:19.188280Z", "shell.execute_reply": "2021-02-02T06:55:19.189143Z" } }, "outputs": [], "source": [ "class ExogenousProcess(DeterministicTerm):\n", " def __init__(self, data):\n", " self._data = data\n", "\n", " def __str__(self):\n", " return \"Custom Exog Process\"\n", "\n", " def _eq_attr(self):\n", " return (id(self._data),)\n", "\n", " def in_sample(self, index: pd.Index):\n", " return self._data.loc[index]\n", "\n", " def out_of_sample(\n", " self, steps: int, index: pd.Index, forecast_index: pd.Index = None\n", " ):\n", " forecast_index = self._extend_index(index, steps, forecast_index)\n", " return self._data.loc[forecast_index]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.192761Z", "iopub.status.busy": "2021-02-02T06:55:19.191576Z", "iopub.status.idle": "2021-02-02T06:55:19.202823Z", "shell.execute_reply": "2021-02-02T06:55:19.203652Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
exog1exog2
0699
16428
21581
3548
4128
\n", "
" ], "text/plain": [ " exog1 exog2\n", "0 6 99\n", "1 64 28\n", "2 15 81\n", "3 54 8\n", "4 12 8" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "gen = np.random.default_rng(98765432101234567890)\n", "exog = pd.DataFrame(\n", " gen.integers(100, size=(300, 2)), columns=[\"exog1\", \"exog2\"]\n", ")\n", "exog.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.207360Z", "iopub.status.busy": "2021-02-02T06:55:19.206167Z", "iopub.status.idle": "2021-02-02T06:55:19.211743Z", "shell.execute_reply": "2021-02-02T06:55:19.212562Z" } }, "outputs": [], "source": [ "ep = ExogenousProcess(exog)\n", "tt = TimeTrend(constant=True, order=1)\n", "# The in-sample index\n", "idx = exog.index[:200]\n", "det_proc = DeterministicProcess(idx, additional_terms=[tt, ep])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.216209Z", "iopub.status.busy": "2021-02-02T06:55:19.215029Z", "iopub.status.idle": "2021-02-02T06:55:19.232952Z", "shell.execute_reply": "2021-02-02T06:55:19.233857Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consttrendexog1exog2
01.01.0699
11.02.06428
21.03.01581
31.04.0548
41.05.0128
\n", "
" ], "text/plain": [ " const trend exog1 exog2\n", "0 1.0 1.0 6 99\n", "1 1.0 2.0 64 28\n", "2 1.0 3.0 15 81\n", "3 1.0 4.0 54 8\n", "4 1.0 5.0 12 8" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.in_sample().head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.237802Z", "iopub.status.busy": "2021-02-02T06:55:19.236531Z", "iopub.status.idle": "2021-02-02T06:55:19.254516Z", "shell.execute_reply": "2021-02-02T06:55:19.255421Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
consttrendexog1exog2
2001.0201.05688
2011.0202.04884
2021.0203.0445
2031.0204.06563
2041.0205.06339
2051.0206.08939
2061.0207.04154
2071.0208.0715
2081.0209.0896
2091.0210.05863
\n", "
" ], "text/plain": [ " const trend exog1 exog2\n", "200 1.0 201.0 56 88\n", "201 1.0 202.0 48 84\n", "202 1.0 203.0 44 5\n", "203 1.0 204.0 65 63\n", "204 1.0 205.0 63 39\n", "205 1.0 206.0 89 39\n", "206 1.0 207.0 41 54\n", "207 1.0 208.0 71 5\n", "208 1.0 209.0 89 6\n", "209 1.0 210.0 58 63" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det_proc.out_of_sample(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Support\n", "\n", "The only model that directly supports `DeterministicProcess` is `AutoReg`. A custom term can be set using the `deterministic` keyword argument. \n", "\n", "**Note**: Using a custom term requires that `trend=\"n\"` and `seasonal=False` so that all deterministic components must come from the custom deterministic term." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate Some Data\n", "\n", "Here we simulate some data that has an weekly seasonality captured by a Fourier series." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.259657Z", "iopub.status.busy": "2021-02-02T06:55:19.258332Z", "iopub.status.idle": "2021-02-02T06:55:19.636769Z", "shell.execute_reply": "2021-02-02T06:55:19.637161Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "gen = np.random.default_rng(98765432101234567890)\n", "idx = pd.RangeIndex(200)\n", "det_proc = DeterministicProcess(idx, constant=True, period=52, fourier=2)\n", "det_terms = det_proc.in_sample().to_numpy()\n", "params = np.array([1.0, 3, -1, 4, -2])\n", "exog = det_terms @ params\n", "y = np.empty(200)\n", "y[0] = det_terms[0] @ params + gen.standard_normal()\n", "for i in range(1, 200):\n", " y[i] = 0.9 * y[i - 1] + det_terms[i] @ params + gen.standard_normal()\n", "y = pd.Series(y, index=idx)\n", "ax = y.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The model is then fit using the `deterministic` keyword argument. `seasonal` defaults to False but `trend` defaults to `\"c\"` so this needs to be changed." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:19.642003Z", "iopub.status.busy": "2021-02-02T06:55:19.639708Z", "iopub.status.idle": "2021-02-02T06:55:20.027317Z", "shell.execute_reply": "2021-02-02T06:55:20.027644Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " AutoReg Model Results \n", "==============================================================================\n", "Dep. Variable: y No. Observations: 200\n", "Model: AutoReg(1) Log Likelihood -270.964\n", "Method: Conditional MLE S.D. of innovations 0.944\n", "Date: Tue, 02 Feb 2021 AIC -0.044\n", "Time: 06:55:20 BIC 0.072\n", "Sample: 1 HQIC 0.003\n", " 200 \n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 0.8436 0.172 4.916 0.000 0.507 1.180\n", "sin(1,52) 2.9738 0.160 18.587 0.000 2.660 3.287\n", "cos(1,52) -0.6771 0.284 -2.380 0.017 -1.235 -0.120\n", "sin(2,52) 3.9951 0.099 40.336 0.000 3.801 4.189\n", "cos(2,52) -1.7206 0.264 -6.519 0.000 -2.238 -1.203\n", "y.L1 0.9116 0.014 63.264 0.000 0.883 0.940\n", " Roots \n", "=============================================================================\n", " Real Imaginary Modulus Frequency\n", "-----------------------------------------------------------------------------\n", "AR.1 1.0970 +0.0000j 1.0970 0.0000\n", "-----------------------------------------------------------------------------\n" ] } ], "source": [ "from statsmodels.tsa.api import AutoReg\n", "\n", "mod = AutoReg(y, 1, trend=\"n\", deterministic=det_proc)\n", "res = mod.fit()\n", "print(res.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `plot_predict` to show the predicted values and their prediction interval. The out-of-sample deterministic values are automatically produced by the deterministic process passed to `AutoReg`." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:20.033666Z", "iopub.status.busy": "2021-02-02T06:55:20.032836Z", "iopub.status.idle": "2021-02-02T06:55:20.303344Z", "shell.execute_reply": "2021-02-02T06:55:20.302149Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig = res.plot_predict(200, 200 + 2 * 52, True)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:20.310300Z", "iopub.status.busy": "2021-02-02T06:55:20.309485Z", "iopub.status.idle": "2021-02-02T06:55:20.318956Z", "shell.execute_reply": "2021-02-02T06:55:20.320032Z" } }, "outputs": [ { "data": { "text/plain": [ "200 -3.253482\n", "201 -8.555660\n", "202 -13.607557\n", "203 -18.152622\n", "204 -21.950370\n", "205 -24.790116\n", "206 -26.503171\n", "207 -26.972781\n", "208 -26.141244\n", "209 -24.013773\n", "210 -20.658891\n", "211 -16.205310\n", "dtype: float64" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "auto_reg_forecast = res.predict(200, 211)\n", "auto_reg_forecast" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using with other models\n", "\n", "Other models do not support `DeterministicProcess` directly. We can instead manually pass any deterministic terms as `exog` to model that support exogenous values.\n", "\n", "Note that `SARIMAX` with exogenous variables is OLS with SARIMA errors so that the model is \n", "\n", "$$\n", "\\begin{align*}\n", "\\nu_t & = y_t - x_t \\beta \\\\\n", "(1-\\phi(L))\\nu_t & = (1+\\theta(L))\\epsilon_t.\n", "\\end{align*}\n", "$$\n", "\n", "The parameters on deterministic terms are not directly comparable to `AutoReg` which evolves according to the equation\n", "\n", "$$\n", "(1-\\phi(L)) y_t = x_t \\beta + \\epsilon_t.\n", "$$\n", "\n", "When $x_t$ contains only deterministic terms, these two representation are equivalent (assuming $\\theta(L)=0$ so that there is no MA).\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:20.324626Z", "iopub.status.busy": "2021-02-02T06:55:20.323078Z", "iopub.status.idle": "2021-02-02T06:55:20.815136Z", "shell.execute_reply": "2021-02-02T06:55:20.816073Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " SARIMAX Results \n", "==============================================================================\n", "Dep. Variable: y No. Observations: 200\n", "Model: SARIMAX(1, 0, 0) Log Likelihood -293.381\n", "Date: Tue, 02 Feb 2021 AIC 600.763\n", "Time: 06:55:20 BIC 623.851\n", "Sample: 0 HQIC 610.106\n", " - 200 \n", "Covariance Type: opg \n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "intercept 0.0796 0.140 0.567 0.570 -0.196 0.355\n", "sin(1,52) 9.1916 0.876 10.492 0.000 7.475 10.909\n", "cos(1,52) -17.4348 0.891 -19.576 0.000 -19.180 -15.689\n", "sin(2,52) 1.2512 0.466 2.683 0.007 0.337 2.165\n", "cos(2,52) -17.1863 0.434 -39.583 0.000 -18.037 -16.335\n", "ar.L1 0.9957 0.007 150.761 0.000 0.983 1.009\n", "sigma2 1.0748 0.119 9.068 0.000 0.842 1.307\n", "===================================================================================\n", "Ljung-Box (L1) (Q): 2.16 Jarque-Bera (JB): 1.03\n", "Prob(Q): 0.14 Prob(JB): 0.60\n", "Heteroskedasticity (H): 0.71 Skew: -0.14\n", "Prob(H) (two-sided): 0.16 Kurtosis: 2.78\n", "===================================================================================\n", "\n", "Warnings:\n", "[1] Covariance matrix calculated using the outer product of gradients (complex-step).\n" ] } ], "source": [ "from statsmodels.tsa.api import SARIMAX\n", "\n", "det_proc = DeterministicProcess(idx, period=52, fourier=2)\n", "det_terms = det_proc.in_sample()\n", "\n", "mod = SARIMAX(y, order=(1, 0, 0), trend=\"c\", exog=det_terms)\n", "res = mod.fit(disp=False)\n", "print(res.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The forecasts are similar but differ since the parameters of the `SARIMAX` are estimated using MLE while `AutoReg` uses OLS." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2021-02-02T06:55:20.820333Z", "iopub.status.busy": "2021-02-02T06:55:20.819091Z", "iopub.status.idle": "2021-02-02T06:55:20.840970Z", "shell.execute_reply": "2021-02-02T06:55:20.842005Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AutoRegSARIMAX
200-3.253482-2.956563
201-8.555660-7.985590
202-13.607557-12.794074
203-18.152622-17.130966
204-21.950370-20.760476
205-24.790116-23.475515
206-26.503171-25.109633
207-26.972781-25.546794
208-26.141244-24.728390
209-24.013773-22.657099
210-20.658891-19.397357
211-16.205310-15.072390
\n", "
" ], "text/plain": [ " AutoReg SARIMAX\n", "200 -3.253482 -2.956563\n", "201 -8.555660 -7.985590\n", "202 -13.607557 -12.794074\n", "203 -18.152622 -17.130966\n", "204 -21.950370 -20.760476\n", "205 -24.790116 -23.475515\n", "206 -26.503171 -25.109633\n", "207 -26.972781 -25.546794\n", "208 -26.141244 -24.728390\n", "209 -24.013773 -22.657099\n", "210 -20.658891 -19.397357\n", "211 -16.205310 -15.072390" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sarimax_forecast = res.forecast(12, exog=det_proc.out_of_sample(12))\n", "df = pd.concat([auto_reg_forecast, sarimax_forecast], axis=1)\n", "df.columns = columns = [\"AutoReg\", \"SARIMAX\"]\n", "df" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 4 }