{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Least squares fitting of models to data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a quick introduction to `statsmodels` for physical scientists (e.g. physicists, astronomers) or engineers.\n", "\n", "Why is this needed?\n", "\n", "Because most of `statsmodels` was written by statisticians and they use a different terminology and sometimes methods, making it hard to know which classes and functions are relevant and what their inputs and outputs mean." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2022-11-02T17:02:50.201144Z", "iopub.status.busy": "2022-11-02T17:02:50.200615Z", "iopub.status.idle": "2022-11-02T17:02:51.090486Z", "shell.execute_reply": "2022-11-02T17:02:51.089729Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.api as sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assume you have data points with measurements `y` at positions `x` as well as measurement errors `y_err`.\n", "\n", "How can you use `statsmodels` to fit a straight line model to this data?\n", "\n", "For an extensive discussion see [Hogg et al. (2010), \"Data analysis recipes: Fitting a model to data\"](https://arxiv.org/abs/1008.4686) ... we'll use the example data given by them in Table 1.\n", "\n", "So the model is `f(x) = a * x + b` and on Figure 1 they print the result we want to reproduce ... the best-fit parameter and the parameter errors for a \"standard weighted least-squares fit\" for this data are:\n", "* `a = 2.24 +- 0.11`\n", "* `b = 34 +- 18`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2022-11-02T17:02:51.096800Z", "iopub.status.busy": "2022-11-02T17:02:51.095402Z", "iopub.status.idle": "2022-11-02T17:02:51.119732Z", "shell.execute_reply": "2022-11-02T17:02:51.119097Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", " | x | \n", "y | \n", "y_err | \n", "
---|---|---|---|
0 | \n", "201.0 | \n", "592.0 | \n", "61.0 | \n", "
1 | \n", "244.0 | \n", "401.0 | \n", "25.0 | \n", "
2 | \n", "47.0 | \n", "583.0 | \n", "38.0 | \n", "
3 | \n", "287.0 | \n", "402.0 | \n", "15.0 | \n", "
4 | \n", "203.0 | \n", "495.0 | \n", "21.0 | \n", "