The Datasets Package¶
statsmodels
provides data sets (i.e. data and meta-data) for use in
examples, tutorials, model testing, etc.
Using Datasets from Stata¶
Using Datasets from R¶
The Rdatasets project gives access to the datasets available in R’s core datasets package and many other common R packages. All of these datasets are available to statsmodels by using the get_rdataset
function. The actual data is accessible by the data
attribute. For example:
In [1]: import statsmodels.api as sm
In [2]: duncan_prestige = sm.datasets.get_rdataset("Duncan", "car")
In [3]: print(duncan_prestige.__doc__)
+----------+-------------------+
| Duncan | R Documentation |
+----------+-------------------+
Duncan's Occupational Prestige Data
-----------------------------------
Description
~~~~~~~~~~~
The ``Duncan`` data frame has 45 rows and 4 columns. Data on the
prestige and other characteristics of 45 U. S. occupations in 1950.
Usage
~~~~~
::
Duncan
Format
~~~~~~
This data frame contains the following columns:
type
Type of occupation. A factor with the following levels: ``prof``,
professional and managerial; ``wc``, white-collar; ``bc``,
blue-collar.
income
Percent of males in occupation earning $3500 or more in 1950.
education
Percent of males in occupation in 1950 who were high-school
graduates.
prestige
Percent of raters in NORC study rating occupation as excellent or
good in prestige.
Source
~~~~~~
Duncan, O. D. (1961) A socioeconomic index for all occupations. In
Reiss, A. J., Jr. (Ed.) *Occupations and Social Status.* Free Press
[Table VI-1].
References
~~~~~~~~~~
Fox, J. (2008) *Applied Regression Analysis and Generalized Linear
Models*, Second Edition. Sage.
Fox, J. and Weisberg, S. (2011) *An R Companion to Applied Regression*,
Second Edition, Sage.
In [4]: duncan_prestige.data.head(5)