The Datasets Package¶
statsmodels
provides data sets (i.e. data and meta-data) for use in
examples, tutorials, model testing, etc.
Using Datasets from Stata¶
webuse (data[, baseurl, as_df]) |
Download and return an example dataset from Stata. |
Using Datasets from R¶
The Rdatasets project gives access to the datasets available in R’s core datasets package and many other common R packages. All of these datasets are available to statsmodels by using the get_rdataset
function. The actual data is accessible by the data
attribute. For example:
In [1]: import statsmodels.api as sm
In [2]: duncan_prestige = sm.datasets.get_rdataset("Duncan", "car")
In [3]: print(duncan_prestige.__doc__)
+--------+-----------------+
| Duncan | R Documentation |
+--------+-----------------+
Duncan's Occupational Prestige Data
-----------------------------------
Description
~~~~~~~~~~~
The ``Duncan`` data frame has 45 rows and 4 columns. Data on the
prestige and other characteristics of 45 U. S. occupations in 1950.
Usage
~~~~~
::
Duncan
Format
~~~~~~
This data frame contains the following columns:
type
Type of occupation. A factor with the following levels: ``prof``,
professional and managerial; ``wc``, white-collar; ``bc``,
blue-collar.
income
Percent of males in occupation earning $3500 or more in 1950.
education
Percent of males in occupation in 1950 who were high-school
graduates.
prestige
Percent of raters in NORC study rating occupation as excellent or
good in prestige.
Source
~~~~~~
Duncan, O. D. (1961) A socioeconomic index for all occupations. In
Reiss, A. J., Jr. (Ed.) *Occupations and Social Status.* Free Press
[Table VI-1].
References
~~~~~~~~~~
Fox, J. (2008) *Applied Regression Analysis and Generalized Linear
Models*, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) *An R Companion to Applied Regression*,
Second Edition, Sage.
In [4]: duncan_prestige.data.head(5)