Mediation analysis with duration dataΒΆ
This notebook demonstrates mediation analysis when the mediator and outcome are duration variables, modeled using proportional hazards regression. These examples are based on simulated data.
[1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.mediation import Mediation
Make the notebook reproducible.
[2]:
np.random.seed(3424)
Specify a sample size.
[3]:
n = 1000
Generate an exposure variable.
[4]:
exp = np.random.normal(size=n)
Generate a mediator variable.
[5]:
def gen_mediator():
mn = np.exp(exp)
mtime0 = -mn * np.log(np.random.uniform(size=n))
ctime = -2 * mn * np.log(np.random.uniform(size=n))
mstatus = (ctime >= mtime0).astype(np.int)
mtime = np.where(mtime0 <= ctime, mtime0, ctime)
return mtime0, mtime, mstatus
Generate an outcome variable.
[6]:
def gen_outcome(otype, mtime0):
if otype == "full":
lp = 0.5*mtime0
elif otype == "no":
lp = exp
else:
lp = exp + mtime0
mn = np.exp(-lp)
ytime0 = -mn * np.log(np.random.uniform(size=n))
ctime = -2 * mn * np.log(np.random.uniform(size=n))
ystatus = (ctime >= ytime0).astype(np.int)
ytime = np.where(ytime0 <= ctime, ytime0, ctime)
return ytime, ystatus
Build a dataframe containing all the relevant variables.
[7]:
def build_df(ytime, ystatus, mtime0, mtime, mstatus):
df = pd.DataFrame({"ytime": ytime, "ystatus": ystatus,
"mtime": mtime, "mstatus": mstatus,
"exp": exp})
return df
Run the full simulation and analysis, under a particular population structure of mediation.
[8]:
def run(otype):
mtime0, mtime, mstatus = gen_mediator()
ytime, ystatus = gen_outcome(otype, mtime0)
df = build_df(ytime, ystatus, mtime0, mtime, mstatus)
outcome_model = sm.PHReg.from_formula("ytime ~ exp + mtime", status="ystatus", data=df)
mediator_model = sm.PHReg.from_formula("mtime ~ exp", status="mstatus", data=df)
med = Mediation(outcome_model, mediator_model, "exp", "mtime",
outcome_predict_kwargs={"pred_only": True})
med_result = med.fit(n_rep=20)
print(med_result.summary())
Run the example with full mediation
[9]:
run("full")
Estimate Lower CI bound Upper CI bound P-value
ACME (control) 0.742427 0.643339 0.862745 0.0
ACME (treated) 0.742427 0.643339 0.862745 0.0
ADE (control) 0.073017 -0.016189 0.155321 0.1
ADE (treated) 0.073017 -0.016189 0.155321 0.1
Total effect 0.815444 0.675214 0.919580 0.0
Prop. mediated (control) 0.912695 0.814965 1.025747 0.0
Prop. mediated (treated) 0.912695 0.814965 1.025747 0.0
ACME (average) 0.742427 0.643339 0.862745 0.0
ADE (average) 0.073017 -0.016189 0.155321 0.1
Prop. mediated (average) 0.912695 0.814965 1.025747 0.0
Run the example with partial mediation
[10]:
run("partial")
Estimate Lower CI bound Upper CI bound P-value
ACME (control) 0.987067 0.801560 1.192019 0.0
ACME (treated) 0.987067 0.801560 1.192019 0.0
ADE (control) 1.071734 0.964214 1.150352 0.0
ADE (treated) 1.071734 0.964214 1.150352 0.0
Total effect 2.058801 1.862231 2.288170 0.0
Prop. mediated (control) 0.481807 0.417501 0.533773 0.0
Prop. mediated (treated) 0.481807 0.417501 0.533773 0.0
ACME (average) 0.987067 0.801560 1.192019 0.0
ADE (average) 1.071734 0.964214 1.150352 0.0
Prop. mediated (average) 0.481807 0.417501 0.533773 0.0
Run the example with no mediation
[11]:
run("no")
Estimate Lower CI bound Upper CI bound P-value
ACME (control) 0.010200 -0.039434 0.065176 1.0
ACME (treated) 0.010200 -0.039434 0.065176 1.0
ADE (control) 0.902295 0.824526 0.984934 0.0
ADE (treated) 0.902295 0.824526 0.984934 0.0
Total effect 0.912495 0.834728 1.009958 0.0
Prop. mediated (control) 0.003763 -0.044186 0.065520 1.0
Prop. mediated (treated) 0.003763 -0.044186 0.065520 1.0
ACME (average) 0.010200 -0.039434 0.065176 1.0
ADE (average) 0.902295 0.824526 0.984934 0.0
Prop. mediated (average) 0.003763 -0.044186 0.065520 1.0