Implementing TMLE in R based on Luque‐Fernandez et al (2018) and Benkeser et al (2017)
\(\DeclareMathOperator*{\argmin}{argmin}\) \(\newcommand{\var}{\mathrm{Var}}\) \(\newcommand{\epsi}{\varepsilon}\) \(\newcommand{\phii}{\varphi}\) \(\newcommand\Bigpar[1]{\left( #1 \right )}\) \(\newcommand\Bigbr[1]{\left[ #1 \right ]}\) \(\newcommand\Bigcr[1]{\left\{ #1 \right \}}\) \(\newcommand\SetB[1]{\left\{ #1 \right\}}\) \(\newcommand\Sett[1]{\mathcal{#1}}\) \(\newcommand{\Data}{\mathcal{D}}\) \(\newcommand{\Ubr}[2]{\underbrace{#1}_{\text{#2}}}\) \(\newcommand{\Obr}[2]{ \overbrace{#1}^{\text{#2}}}\) \(\newcommand{\sumiN}{\sum_{i=1}^N}\) \(\newcommand{\dydx}[2]{\frac{\partial #1}{\partial #2}}\) \(\newcommand\Indic[1]{\mathds{1}_{#1}}\) \(\newcommand{\Realm}[1]{\mathbb{R}^{#1}}\) \(\newcommand{\Exp}[1]{\mathbb{E}\left[#1\right]}\) \(\newcommand{\Expt}[2]{\mathbb{E}_{#1}\left[#2\right]}\) \(\newcommand{\Var}[1]{\mathbb{V}\left[#1\right]}\) \(\newcommand{\Covar}[1]{\text{Cov}\left[#1\right]}\) \(\newcommand{\Prob}[1]{\mathbf{Pr}\left(#1\right)}\) \(\newcommand{\Supp}[1]{\text{Supp}\left[#1\right]}\) \(\newcommand{\doyx}{\Prob{Y \, |\,\mathsf{do} (X = x)}}\) \(\newcommand{\doo}[1]{\Prob{Y |\,\mathsf{do} (#1) }}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\Z}{\mathbb{Z}}\) \(\newcommand{\wh}[1]{\widehat{#1}} % Wide hat\) \(\newcommand{\wt}[1]{\widetilde{#1}} % Wide tilde\) \(\newcommand{\wb}[1]{\overline{#1}} % Wide bar\) \(\newcommand\Ol[1]{\overline{#1}}\) \(\newcommand\Ul[1]{\underline{#1}}\) \(\newcommand\Str[1]{#1^{*}}\) \(\newcommand{\F}{\mathsf{F}}\) \(\newcommand{\ff}{\mathsf{f}}\) \(\newcommand{\Cdf}[1]{\mathbb{F}\left(#1\right)}\) \(\newcommand{\Cdff}[2]{\mathbb{F}_{#1}\left(#2\right)}\) \(\newcommand{\Pdf}[1]{\mathsf{f}\left(#1\right)}\) \(\newcommand{\Pdff}[2]{\mathsf{f}_{#1}\left(#2\right)}\) \(\newcommand{\dd}{\mathsf{d}}\) \(\newcommand\Normal[1]{\mathcal{N} \left( #1 \right )}\) \(\newcommand\Unif[1]{\mathsf{U} \left[ #1 \right ]}\) \(\newcommand\Bern[1]{\mathsf{Bernoulli} \left( #1 \right )}\) \(\newcommand\Binom[1]{\mathsf{Bin} \left( #1 \right )}\) \(\newcommand\Pois[1]{\mathsf{Poi} \left( #1 \right )}\) \(\newcommand\BetaD[1]{\mathsf{Beta} \left( #1 \right )}\) \(\newcommand\Diri[1]{\mathsf{Dir} \left( #1 \right )}\) \(\newcommand\Gdist[1]{\mathsf{Gamma} \left( #1 \right )}\) \(\def\mbf#1{\mathbf{#1}}\) \(\def\mrm#1{\mathrm{#1}}\) \(\def\mbi#1{\boldsymbol{#1}}\) \(\def\ve#1{\mbi{#1}} % Vector notation\) \(\def\vee#1{\mathbf{#1}} % Vector notation\) \(\newcommand{\Mat}[1]{\mathbf{#1}}\) \(\newcommand{\eucN}[1]{\norm{#1}}\) \(\newcommand{\lzero}[1]{\norm{#1}_0}\) \(\newcommand{\lone}[1]{\norm{#1}_1}\) \(\newcommand{\ltwo}[1]{\norm{#1}_2}\) \(\newcommand{\pnorm}[1]{\norm{#1}_p}\)
Observed data is \(\Sett{O} := \SetB{Y, A, \vee{W}}_{i=1}^N\), where \(A\) is a binary treatment (‘action’), \(Y\) is an outcome, and \(\vee{W}\) is a vector of covariates / potential confounders.
Parameters of interest are
\[ \psi_{0}(a)=E_{0}\{\bar{Q}(a, W)\}=\int \bar{Q}_{0}(a, w) d Q_{W, 0}(w), \text { for } \]
where \(\bar{Q}_{0}(a, w)=E_{0}(Y \mid A=a, W=w)\) is the so-called outcome regression and \(Q_{W}(w)=\operatorname{Pr}_{0}(W \leq w)\) is the distribution function of the covariates. The value \(\psi_{0}(a)\) is the marginal mean.
Where \(\Ol{Q}_0(a, \vee{w}) = \mathbb{E}_0(Y | A = a , \vee{W = w})\) is the outcome regression and \(Q_{\vee{W}}(\vee{w}) = \Prob{\vee{W} \leq \vee{w}}\).
The Gcomp estimator is the average outcome regression prediction for each value of \(a\)
\[ \psi_{n, \text{GCOMP}}(a)=\frac{1}{n} \sum_{i=1}^{n} \bar{Q}_{n}\left(a, W_{i}\right) \]
The G Formula for ATE is
\[ \tau_{ATE} = \sum_\vee{w} \Bigbr{\sum_y \Prob{Y = y | A = 1, \vee{W} = w} - \Prob{Y = y | A = 0, \vee{W} = w} } \Prob{\vee{W} = w} \]
\[ \begin{aligned} E_{0}\{\bar{Q}(a, W)\} &=E_{0}\left\{E_{0}(Y \mid A=a, W)\right\} \\ &=E_{0}\left\{\frac{I(A=a)}{\operatorname{Pr}_{0}(A=a \mid W)} E_{0}(Y \mid A=a, W)\right\} \\ &=E_{0}\left\{\frac{I(A=a)}{\operatorname{Pr}_{0}(A=a \mid W)} E_{0}(Y \mid A, W)\right\} \\ &=E_{0}\left\{\frac{I(A=a)}{\operatorname{Pr}_{0}(A=a \mid W)} Y\right\} \end{aligned} \] We use \(g_{0}(a \mid W):=\operatorname{Pr}_{0}(A=a \mid W)\) to denote the propensity score. Suppose we had available \(g_{n}\), an estimate of the PS. Equation (2) motivates the estimator
\[ \psi_{n, \text{IPTW}}(a):=\frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{n}\left(a \mid W_{i}\right)} Y_{i} \]
\[ \psi_{n, \text{AIPTW}}(a)=\psi_{n, \text{IPTW}}(a)+\frac{1}{n} \sum_{i=1}^{n}\left\{1-\frac{I\left(A_{i}=a\right)}{g_{n}\left(a \mid W_{i}\right)}\right\} \bar{Q}_{n}\left(a, W_{i}\right) \]
which adds an augmentation term to the IPTW estimator. Equivalently, the estimator can be viewed as adding an augmentation to the GCOMP estimator
\[ \psi_{n, \text{AIPTW}}(a)=\psi_{n, \text{GCOMP}}(a)+\frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{n}\left(a \mid W_{i}\right)}\left\{Y_{i}-\bar{Q}_{n}\left(a, W_{i}\right)\right\} \]
take residuals
\[ \begin{array}{l} \bar{Q}_{r, 0 n}(a, w):=E_{0}\left\{Y-\bar{Q}_{n}(W) \mid A=a, g_{n}(W)=g_{n}(w)\right\}, \text { and } \\ g_{r, 0 n}(a \mid w):=\operatorname{Pr}_{0}\left\{A=a \mid \bar{Q}_{n}(W)=\bar{Q}_{n}(w), g_{n}(W)=g_{n}(w)\right\} \end{array} \]
In words, the R-OR is the regression of the residual from the outcome regression onto the estimated PS amongst observations with A=a, while the R-PS is the propensity for treatment A=a given the estimated OR and PS.
if estimates of the OR \(\bar{Q}_{n}^{*}, \text{PS } g_{n}^{*}, \text{ R-OR } \bar{Q}_{r, n}^{*},\) and \(\text{R-PS } g_{r, n}^{*}\) satisfy the moment conditions
\[ \begin{array}{l} \frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{n}^{*}\left(a \mid W_{i}\right)}\left\{Y_{i}-\bar{Q}_{n}^{*}\left(a, W_{i}\right)\right\}=0 \\ \frac{1}{n} \sum_{i=1}^{n} \frac{\bar{Q}_{r, n}\left(a, W_{i}\right)}{g_{n}\left(a \mid W_{i}\right)}\left\{I\left(A_{i}=a\right)-g_{n}^{*}\left(a \mid W_{i}\right)\right\}=0, \text { and } \\ \frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{r, n}^{*}\left(a \mid W_{i}\right)}\left\{\frac{g_{r, n}^{*}\left(a \mid W_{i}\right)-g_{n}^{*}\left(a \mid W_{i}\right)}{g_{n}^{*}\left(a \mid W_{i}\right)}\right\}\left\{Y_{i}-\bar{Q}_{n}^{*}\left(a, W_{i}\right)\right\}=0 \end{array} \]
then the GCOMP estimator based on \(\Ol{Q}^*_n\) would be doubly-robust not only with respect to consistency, but also with respect to its limiting distribution. The author additionally proposed a TMLE algorithm to map initial estimates of the OR, PS, R-OR, and R-PS into estimates that satisfy these key equations.
Benkeser et al. (2017) demonstrated alternative equations that can be satisfied to achieve this form of double-robustness. In particular, they formulated an alternative version of the R-PS.
\[ g_{r, 0,1}(a \mid w):=\operatorname{Pr}_{0}\left\{A=a \mid \bar{Q}_{n}(W)=\bar{Q}_{n}(w)\right\} \]
which we refer to as R-PS1. They also introduced an additional regression of a weighted residual from the PS on the OR,
\[ g_{r, 0,2}(a \mid w):=E_{0}\left\{\frac{I(A=a)-g_{n}(a \mid W)}{g_{n}(a \mid W)} \mid \bar{Q}_{n}(W)=\bar{Q}_{n}(w)\right\} \]
which we refer to as R-PS2. The key feature of R-PS1 and R-PS2 is that they are both univariate regressions, while R-PS is a bivariate regression. Benkeser and colleagues suggested that this may make R-PS1 and R-PS2 easier to estimate consistently than R-PS. Further, they showed that if estimators or the OR, PS, and R-OR satisfy equations (5)-(6), and if additionally, estimators of the OR, R-PS\(1 g_{r, n, 1}^{*},\) and \(\mathrm{R}-\mathrm{PS}-2 g_{r, n, 2}^{*}\) satisfy
\[ \frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{r, n, 1}^{*}\left(a \mid W_{i}\right)} g_{r, n, 2}^{*}\left(a \mid W_{i}\right)\left\{Y_{i}-\bar{Q}_{n}^{*}\left(a, W_{i}\right)\right\}=0 \]
then the GCOMP estimator based on this \(\bar{Q}_{n}^{*}\) also enjoys a doubly-robust limiting distribution. Benkeser et al. (2017) provided a TMLE algorithm that produced such estimates and provided closed-form, doubly-robust variance estimates.
generateData<- function(n){
w1 <- rbinom(n, size=1, prob=0.5)
w2 <- rbinom(n, size=1, prob=0.65)
w3 <- round(runif(n, min=0, max=4), digits=0)
w4 <- round(runif(n, min=0, max=5), digits=0)
A <- rbinom(n, size=1, prob= plogis(-5 + 0.05*w2 + 0.25*w3 + 0.6*w4 + 0.4*w2*w4))
# counterfactual
Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.35*w2 + 0.25*w3 + 0.20*w4 + 0.15*w2*w4))
Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.35*w2 + 0.25*w3 + 0.20*w4 + 0.15*w2*w4))
# Observed outcome
Y <- Y.1*A + Y.0*(1 - A)
# return data.frame
data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}
# True ATE in the population
set.seed(7777)
ObsDataTrueATE <- generateData(n = 50000)
True_EY.1 <- mean(ObsDataTrueATE$Y.1)
True_EY.0 <- mean(ObsDataTrueATE$Y.0)
True_ATE <- True_EY.1-True_EY.0 ;True_ATE
[1] 0.1939
True_MOR <- (True_EY.1*(1-True_EY.0))/((1-True_EY.1)*True_EY.0);True_MOR
[1] 2.527
True_ATE: 0.1939
smaller sample for analysis
\[\Ol{Q}^0(A, \vee{W}) = \Exp{Y | A, \vee{W}}\]
# Naive approach: conditional odds ratio
naive <-glm(data = ObsData, Y ~ A + w1 + w2 + w3 + w4, family = binomial)
summary(naive)
Call:
glm(formula = Y ~ A + w1 + w2 + w3 + w4, family = binomial, data = ObsData)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.428 -1.136 0.589 0.989 1.781
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2187 0.0688 -17.71 <2e-16 ***
A 1.0661 0.0847 12.59 <2e-16 ***
w1 -0.1378 0.0439 -3.14 0.0017 **
w2 0.6446 0.0458 14.08 <2e-16 ***
w3 0.2581 0.0183 14.08 <2e-16 ***
w4 0.3016 0.0160 18.87 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13322 on 9999 degrees of freedom
Residual deviance: 12012 on 9994 degrees of freedom
AIC: 12024
Number of Fisher Scoring iterations: 4
exp(naive$coef[2])
A
2.904
2.5 % 97.5 %
(Intercept) 0.2582 0.3381
A 2.4650 3.4364
w1 0.7994 0.9495
w2 1.7419 2.0843
w3 1.2489 1.3420
w4 1.3104 1.3952
## TMLE implementation by hand
# Step 1 estimation and prediction of the model for the outcome (G-computation)
gm <- glm(Y ~ A + w1 + w2 + w3 + w4, family = binomial, data = ObsData)
# Prediction for A, A = 1 and, A = 0
QAW_0 <- predict(gm, type = "response")
Q1W_0 = predict(gm, newdata=data.frame(A = 1, ObsData [,c("w1","w2","w3","w4")]), type = "response")
Q0W_0 = predict(gm, newdata=data.frame(A = 0, ObsData [,c("w1","w2","w3","w4")]), type = "response")
# Estimated mortality risk difference
glue("estimated ATE: {mean(Q1W_0 - Q0W_0)}")
estimated ATE: 0.203833059841068
This is misspecified on purpose (because it omits the interaction terms in the DGP).
Estimate \(\Prob{A = 1 | \vee{W}}\) (using logit here)
\[ \text{logit}[\Prob{A = 1 | \vee{W}}] = \alpha_0 + \ve{\alpha}_1' \vee{W} \]
and then estimate treatment probabilities
\[ \wh{g}(1, \vee{W}) = \exp it(\wh{\alpha}_0 + \wh{\ve{\alpha}_1}' \vee{W}) \]
# Step 2 estimation and prediction of the propensity score (ps)
psm <- glm(A ~ w1 + w2 + w3 + w4, family = binomial, data = ObsData)
gW = predict(psm, type = "response")
summary(gW)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0022 0.0302 0.0866 0.1557 0.2403 0.7185
The ‘clever covariates’ are the pscore-adjusted treatment indicators
\[ H(1, \vee{W}) = \frac{A}{\wh{g}(1, W)} ; \; \; H(0, \vee{W}) = \frac{1-A}{\wh{g}(0, \vee{W})} \]
The fluctuation parameter \(\wh{\epsi} = (\wh{\epsi}_0, \wh{\epsi}_1)\) is estimated through a maximum likelihood procedure. The observed outcome \(Y\) is the outcome and the logit of the initial prediction of \(\Ol{Q}^0(A, \vee{W})\) as an offset in an intercept-free logistic regression with the clever covariates.
\[ \mathbb{E}(\mathrm{Y}=1 \mid \mathrm{A}, \mathbf{W})(\varepsilon)=\frac{1}{1+\exp \left(-\log \frac{\overline{\mathrm{Q}}^{0}(\mathrm{~A}, \mathbf{W})}{\left(1-\overline{\mathrm{Q}}^{0}(\mathrm{~A}, \mathbf{W})\right)}-\varepsilon_{0} \mathrm{H}(0, \mathbf{W})-\varepsilon_{1} \mathrm{H}(1, \mathbf{W})\right)} \]
# Step 3 computation of H (clever covariates) and estimation of epsilon
H1W = ObsData$A / gW
H0W = (1-ObsData$A) / (1 - gW)
epsilon <- coef(glm(ObsData$Y ~ -1 + H0W + H1W + offset(qlogis(QAW_0)), family = binomial))
glue("ε = {epsilon}")
ε = 0.00295279692951496
ε = 0.00269234947115084
Here, we update \(\bar{Q}^0(A, \vee{W})\) to \(\bar{Q}^1(A, \vee{W})\)
using the expressions
\[ \overline{\mathrm{Q}}^{1}(0, \mathbf{W})=\operatorname{expit}\left[\operatorname{logit}\left(\overline{\mathrm{Q}}^{0}(0, \mathbf{W})\right)+\widehat{\varepsilon}_{0} / \widehat{\mathrm{g}}(0, \mathbf{W})\right] \text { and } \overline{\mathrm{Q}}^{1}(1, \mathbf{W})=\operatorname{expit}\left[\operatorname{logit}\left(\overline{\mathrm{Q}}^{0}(1, \mathbf{W})\right)+\widehat{\varepsilon}_{1} / \widehat{\mathrm{g}}(1, \mathbf{W})\right. \]
\[ \wh{\text{ATE}}_{TMLE} = \frac{1}{n} \sumiN \Ol{Q}^1(1, \vee{W}_i) - \Ol{Q}^1(0, \vee{W}_i) \]
# ATE
ATEtmle1 <- mean(Q1W_1 - Q0W_1); ATEtmle1
[1] 0.2206
ATEtmle1_bias: 0.02673
ATEtmle1_rel_bias: 0.1378 %
TMLE constructs estimators based on the efficient Influence curve. The IC of a consistent and asymptotically linear estimator is derived in the gradient of the Gateaux derivative of the target parameter such that
\[ \wh{\tau}_\text{ATE} - \tau_\text{ATE} = \frac{1}{n} \sumiN IC_1 - O_p(\frac{1}{\sqrt{n}}) \]
The central limit theorem applies so that in large samples, the variance of the estimator is thus the variance of the IC divided by n. While many influence functions and corresponding estimators exist for a given target parameter, there always exists an “efficient” IC that achieves the lower bound on asymptotic variance for the given set of modelling assumptions. Targeted maximum likelihood estimation and AIPTW are both constructed by using the efficient IC, making these estimators asymptotically efficient for the statistical model when all necessary models are correctly specified. The Efficient IC for the ATE is
\[\begin{align*} \mathbf{E I C}_{\mathbf{A T E}} & =\left(\frac{\mathrm{A}}{\mathrm{P}(\mathrm{A}=1 \mid \mathbf{W})}-\frac{1-\mathrm{A}}{\mathrm{P}(\mathrm{A}=0 \mid \mathbf{W})}\right)[\mathrm{Y}-\mathrm{E}(\mathrm{Y} \mid \mathrm{A}, \mathbf{W})]+ \\ & \mathrm{E}(\mathrm{Y} \mid \mathrm{A}=1, \mathbf{W})-\mathrm{E}(\mathrm{Y} \mid \mathrm{A}=0, \mathbf{W})-\mathrm{ATE} \end{align*}\]
which can be evaluated as
\[ \widehat{\mathrm{EIC}}_{\mathrm{ATE}}=\left(\frac{\mathrm{A}}{\widehat{\mathrm{g}}(1, \mathbf{W})}-\frac{1-\mathrm{A}}{\widehat{\mathrm{g}}(0, \mathbf{W})}\right)\left[\mathrm{Y}-\overline{\mathrm{Q}}^{1}(\mathrm{~A}, \mathbf{W})\right]+\overline{\mathrm{Q}}^{1}(1, \mathbf{W})-\overline{\mathrm{Q}}^{1}(0, \mathbf{W})-\widehat{\mathbf{A T E}}_{\mathrm{TMLE}} \]
and the SE is computed the standard way
\[ \widehat{\sigma}_{\mathrm{ATE}, \mathrm{TMLE}}=\sqrt{\frac{\widehat{\operatorname{Var}\left(\mathrm{EIC}_{\mathrm{ATE}}\right)}}{\mathrm{n}}} \]
# Step 5 statistical inference (efficient influence curve)
# Efficient influence curve ATE
EY1tmle<-mean(Q1W_1)
EY0tmle<-mean(Q0W_1)
d1 <- ((ObsData$A) * (ObsData$Y - Q1W_1)/gW) + Q1W_1 - EY1tmle
d0 <- ((1 - ObsData$A) * (ObsData$Y - Q0W_1))/(1 - gW) + Q0W_1 - EY0tmle
IC <- d1 - d0
n <- nrow(ObsData)
varHat.IC <- var(IC) / n
ATEtmle1CI <- c(ATEtmle1 - 1.96 * sqrt(varHat.IC), ATEtmle1 + 1.96 * sqrt(varHat.IC))
ATEtmle1
[1] 0.2206
ATEtmle1CI
[1] 0.1513 0.2900
# Augmented inverse probability treatment weighting (AIPTW) estimator
EY1aiptw <- mean((ObsData$A) * (ObsData$Y - Q1W_0) / gW + Q1W_0)
EY0aiptw <- mean((1 - ObsData$A) * (ObsData$Y - Q0W_0) / (1 - gW) + Q0W_0)
AIPTW_ATE <- EY1aiptw - EY0aiptw; AIPTW_ATE
[1] 0.2398
AIPTW_bias: 0.04589
AIPTW_rel_bias: 0.2367 %
D1 <- (ObsData$A) * (ObsData$Y - Q1W_0) / gW + Q1W_0 - EY1aiptw
D0 <- (1 - ObsData$A) * (ObsData$Y - Q0W_0) / (1 - gW) + Q0W_0 - EY0aiptw
varHat_AIPTW <- var(D1 - D0) / n
# AIPTW ATE 95%CI
ATEaiptw_CI <- c(AIPTW_ATE - 1.96 * sqrt(varHat_AIPTW), AIPTW_ATE + 1.96 * sqrt(varHat_AIPTW)); AIPTW_ATE; ATEaiptw_CI
[1] 0.2398
[1] 0.1640 0.3156
# R-package tmle (base implementation includes SL.step, SL.glm and SL.glm.interaction)
library(tmle)
library(SuperLearner)
TMLE2 <- tmle(Y = ObsData$Y, A = ObsData$A, W = ObsData[,c("w1", "w2", "w3", "w4")], family = "binomial")
#Note that the tmle function default bounds the probablities in the clever covariate denominators at 0.025.
#You can remove this bound by specifying gbound=0
ATEtmle2 <- TMLE2$estimates$ATE$psi;ATEtmle2
[1] 0.2273
TMLE2$estimates$ATE$CI
[1] 0.1771 0.2775
MORtmle2 <- TMLE2$estimates$OR$psi;MORtmle2
[1] 3.125
TMLE2$estimates$OR$CI
[1] 2.236 4.368
ATEtmle2_bias: 0.03341
ATEtmle2_Rel_bias: 0.1723 %
SL.library <- c("SL.glm","SL.step","SL.step.interaction", "SL.glm.interaction","SL.gam",
"SL.randomForest", "SL.rpart")
TMLE3 <- tmle(Y = ObsData$Y,A = ObsData$A,W = ObsData [,c("w1", "w2", "w3", "w4")],
family = "binomial", Q.SL.library = SL.library,g.SL.library = SL.library)#, gbound=0)
ATEtmle3 <- TMLE3$estimates$ATE$psi;ATEtmle3
[1] 0.218
TMLE3$estimates$ATE$CI
[1] 0.1786 0.2574
MORtmle3 <- TMLE3$estimates$OR$psi;MORtmle3
[1] 2.939
TMLE3$estimates$OR$CI
[1] 2.287 3.777
ATEtmle3_bias: 0.02411
ATEtmle3_rel_bias: 0.1244 %
libreq(drtmle)
wants loaded
[1,] "drtmle" TRUE
We specify stratify = FALSE to indicate that we wish to fit a single OR to estimate \(Q^0(A,W)\), as opposed to fitting two separate regressions to estimate \(Q^0(1,W)\) and \(Q^0(0,W)\).
glm_fit_uni <- drtmle(W = W, A = A, Y = Y, family = binomial(),
glm_g = "W1 + W2", glm_Q = "W1 + W2*A",
glm_gr = "Qn", glm_Qr = "gn", stratify = FALSE)
glm_fit_uni
$est
1 0.7111
0 0.5384
$cov
1 0
1 1.742e-03 -1.004e-05
0 -1.004e-05 4.082e-03
ci(glm_fit_uni, contrast = c(1,-1))
$drtmle
est cil ciu
E[Y(1)]-E[Y(0)] 0.173 0.023 0.323
The drtmle function implements either the estimators of van der Laan (2014) (if reduction = “bivariate”) or the improved estimators of Benkeser et al. (2017) (reduction = “univariate”).
sl_fit <- drtmle(W = W, A = A, Y = Y, family = binomial(),
SL_g = c("SL.glm","SL.mean"),
SL_Q = c("SL.glm","SL.mean"),
SL_gr = c("SL.glm", "SL.earth"),
SL_Qr = c("SL.glm", "SL.earth"),
stratify = FALSE)
sl_fit
$est
1 0.7109
0 0.5374
$cov
1 0
1 1.740e-03 4.837e-07
0 4.837e-07 4.135e-03
ci(sl_fit, contrast = c(1,-1))
$drtmle
est cil ciu
E[Y(1)]-E[Y(0)] 0.173 0.023 0.324
npreg_fit <- drtmle(W = W, A = A, Y = Y, family = binomial(),
SL_g = "SL.npreg", SL_Q = "SL.npreg",
SL_gr = "SL.npreg", SL_Qr = "SL.npreg",
stratify = TRUE)
npreg_fit
$est
1 0.7159
0 0.5343
$cov
1 0
1 1.675e-03 -1.024e-05
0 -1.024e-05 3.963e-03
ci(npreg_fit, contrast = c(1,-1))
$drtmle
est cil ciu
E[Y(1)]-E[Y(0)] 0.182 0.034 0.329