TMLE

Implementing TMLE in R based on Luque‐Fernandez et al (2018) and Benkeser et al (2017)

Apoorva Lal http://apoorvalal.github.io/ (Stanford)
2021-01-24

\(\DeclareMathOperator*{\argmin}{argmin}\) \(\newcommand{\var}{\mathrm{Var}}\) \(\newcommand{\epsi}{\varepsilon}\) \(\newcommand{\phii}{\varphi}\) \(\newcommand\Bigpar[1]{\left( #1 \right )}\) \(\newcommand\Bigbr[1]{\left[ #1 \right ]}\) \(\newcommand\Bigcr[1]{\left\{ #1 \right \}}\) \(\newcommand\SetB[1]{\left\{ #1 \right\}}\) \(\newcommand\Sett[1]{\mathcal{#1}}\) \(\newcommand{\Data}{\mathcal{D}}\) \(\newcommand{\Ubr}[2]{\underbrace{#1}_{\text{#2}}}\) \(\newcommand{\Obr}[2]{ \overbrace{#1}^{\text{#2}}}\) \(\newcommand{\sumiN}{\sum_{i=1}^N}\) \(\newcommand{\dydx}[2]{\frac{\partial #1}{\partial #2}}\) \(\newcommand\Indic[1]{\mathds{1}_{#1}}\) \(\newcommand{\Realm}[1]{\mathbb{R}^{#1}}\) \(\newcommand{\Exp}[1]{\mathbb{E}\left[#1\right]}\) \(\newcommand{\Expt}[2]{\mathbb{E}_{#1}\left[#2\right]}\) \(\newcommand{\Var}[1]{\mathbb{V}\left[#1\right]}\) \(\newcommand{\Covar}[1]{\text{Cov}\left[#1\right]}\) \(\newcommand{\Prob}[1]{\mathbf{Pr}\left(#1\right)}\) \(\newcommand{\Supp}[1]{\text{Supp}\left[#1\right]}\) \(\newcommand{\doyx}{\Prob{Y \, |\,\mathsf{do} (X = x)}}\) \(\newcommand{\doo}[1]{\Prob{Y |\,\mathsf{do} (#1) }}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\Z}{\mathbb{Z}}\) \(\newcommand{\wh}[1]{\widehat{#1}} % Wide hat\) \(\newcommand{\wt}[1]{\widetilde{#1}} % Wide tilde\) \(\newcommand{\wb}[1]{\overline{#1}} % Wide bar\) \(\newcommand\Ol[1]{\overline{#1}}\) \(\newcommand\Ul[1]{\underline{#1}}\) \(\newcommand\Str[1]{#1^{*}}\) \(\newcommand{\F}{\mathsf{F}}\) \(\newcommand{\ff}{\mathsf{f}}\) \(\newcommand{\Cdf}[1]{\mathbb{F}\left(#1\right)}\) \(\newcommand{\Cdff}[2]{\mathbb{F}_{#1}\left(#2\right)}\) \(\newcommand{\Pdf}[1]{\mathsf{f}\left(#1\right)}\) \(\newcommand{\Pdff}[2]{\mathsf{f}_{#1}\left(#2\right)}\) \(\newcommand{\dd}{\mathsf{d}}\) \(\newcommand\Normal[1]{\mathcal{N} \left( #1 \right )}\) \(\newcommand\Unif[1]{\mathsf{U} \left[ #1 \right ]}\) \(\newcommand\Bern[1]{\mathsf{Bernoulli} \left( #1 \right )}\) \(\newcommand\Binom[1]{\mathsf{Bin} \left( #1 \right )}\) \(\newcommand\Pois[1]{\mathsf{Poi} \left( #1 \right )}\) \(\newcommand\BetaD[1]{\mathsf{Beta} \left( #1 \right )}\) \(\newcommand\Diri[1]{\mathsf{Dir} \left( #1 \right )}\) \(\newcommand\Gdist[1]{\mathsf{Gamma} \left( #1 \right )}\) \(\def\mbf#1{\mathbf{#1}}\) \(\def\mrm#1{\mathrm{#1}}\) \(\def\mbi#1{\boldsymbol{#1}}\) \(\def\ve#1{\mbi{#1}} % Vector notation\) \(\def\vee#1{\mathbf{#1}} % Vector notation\) \(\newcommand{\Mat}[1]{\mathbf{#1}}\) \(\newcommand{\eucN}[1]{\norm{#1}}\) \(\newcommand{\lzero}[1]{\norm{#1}_0}\) \(\newcommand{\lone}[1]{\norm{#1}_1}\) \(\newcommand{\ltwo}[1]{\norm{#1}_2}\) \(\newcommand{\pnorm}[1]{\norm{#1}_p}\)

Introduction

Observed data is \(\Sett{O} := \SetB{Y, A, \vee{W}}_{i=1}^N\), where \(A\) is a binary treatment (‘action’), \(Y\) is an outcome, and \(\vee{W}\) is a vector of covariates / potential confounders.

Parameters of interest are

\[ \psi_{0}(a)=E_{0}\{\bar{Q}(a, W)\}=\int \bar{Q}_{0}(a, w) d Q_{W, 0}(w), \text { for } \]

where \(\bar{Q}_{0}(a, w)=E_{0}(Y \mid A=a, W=w)\) is the so-called outcome regression and \(Q_{W}(w)=\operatorname{Pr}_{0}(W \leq w)\) is the distribution function of the covariates. The value \(\psi_{0}(a)\) is the marginal mean.

Where \(\Ol{Q}_0(a, \vee{w}) = \mathbb{E}_0(Y | A = a , \vee{W = w})\) is the outcome regression and \(Q_{\vee{W}}(\vee{w}) = \Prob{\vee{W} \leq \vee{w}}\).

Estimators

G-computation estimator

The Gcomp estimator is the average outcome regression prediction for each value of \(a\)

\[ \psi_{n, \text{GCOMP}}(a)=\frac{1}{n} \sum_{i=1}^{n} \bar{Q}_{n}\left(a, W_{i}\right) \]

The G Formula for ATE is

\[ \tau_{ATE} = \sum_\vee{w} \Bigbr{\sum_y \Prob{Y = y | A = 1, \vee{W} = w} - \Prob{Y = y | A = 0, \vee{W} = w} } \Prob{\vee{W} = w} \]

IPW estimator

\[ \begin{aligned} E_{0}\{\bar{Q}(a, W)\} &=E_{0}\left\{E_{0}(Y \mid A=a, W)\right\} \\ &=E_{0}\left\{\frac{I(A=a)}{\operatorname{Pr}_{0}(A=a \mid W)} E_{0}(Y \mid A=a, W)\right\} \\ &=E_{0}\left\{\frac{I(A=a)}{\operatorname{Pr}_{0}(A=a \mid W)} E_{0}(Y \mid A, W)\right\} \\ &=E_{0}\left\{\frac{I(A=a)}{\operatorname{Pr}_{0}(A=a \mid W)} Y\right\} \end{aligned} \] We use \(g_{0}(a \mid W):=\operatorname{Pr}_{0}(A=a \mid W)\) to denote the propensity score. Suppose we had available \(g_{n}\), an estimate of the PS. Equation (2) motivates the estimator

\[ \psi_{n, \text{IPTW}}(a):=\frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{n}\left(a \mid W_{i}\right)} Y_{i} \]

AIPW / DR estimators

\[ \psi_{n, \text{AIPTW}}(a)=\psi_{n, \text{IPTW}}(a)+\frac{1}{n} \sum_{i=1}^{n}\left\{1-\frac{I\left(A_{i}=a\right)}{g_{n}\left(a \mid W_{i}\right)}\right\} \bar{Q}_{n}\left(a, W_{i}\right) \]

which adds an augmentation term to the IPTW estimator. Equivalently, the estimator can be viewed as adding an augmentation to the GCOMP estimator

\[ \psi_{n, \text{AIPTW}}(a)=\psi_{n, \text{GCOMP}}(a)+\frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{n}\left(a \mid W_{i}\right)}\left\{Y_{i}-\bar{Q}_{n}\left(a, W_{i}\right)\right\} \]

VdL 2014

take residuals

\[ \begin{array}{l} \bar{Q}_{r, 0 n}(a, w):=E_{0}\left\{Y-\bar{Q}_{n}(W) \mid A=a, g_{n}(W)=g_{n}(w)\right\}, \text { and } \\ g_{r, 0 n}(a \mid w):=\operatorname{Pr}_{0}\left\{A=a \mid \bar{Q}_{n}(W)=\bar{Q}_{n}(w), g_{n}(W)=g_{n}(w)\right\} \end{array} \]

In words, the R-OR is the regression of the residual from the outcome regression onto the estimated PS amongst observations with A=a, while the R-PS is the propensity for treatment A=a given the estimated OR and PS.

if estimates of the OR \(\bar{Q}_{n}^{*}, \text{PS } g_{n}^{*}, \text{ R-OR } \bar{Q}_{r, n}^{*},\) and \(\text{R-PS } g_{r, n}^{*}\) satisfy the moment conditions

\[ \begin{array}{l} \frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{n}^{*}\left(a \mid W_{i}\right)}\left\{Y_{i}-\bar{Q}_{n}^{*}\left(a, W_{i}\right)\right\}=0 \\ \frac{1}{n} \sum_{i=1}^{n} \frac{\bar{Q}_{r, n}\left(a, W_{i}\right)}{g_{n}\left(a \mid W_{i}\right)}\left\{I\left(A_{i}=a\right)-g_{n}^{*}\left(a \mid W_{i}\right)\right\}=0, \text { and } \\ \frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{r, n}^{*}\left(a \mid W_{i}\right)}\left\{\frac{g_{r, n}^{*}\left(a \mid W_{i}\right)-g_{n}^{*}\left(a \mid W_{i}\right)}{g_{n}^{*}\left(a \mid W_{i}\right)}\right\}\left\{Y_{i}-\bar{Q}_{n}^{*}\left(a, W_{i}\right)\right\}=0 \end{array} \]

then the GCOMP estimator based on \(\Ol{Q}^*_n\) would be doubly-robust not only with respect to consistency, but also with respect to its limiting distribution. The author additionally proposed a TMLE algorithm to map initial estimates of the OR, PS, R-OR, and R-PS into estimates that satisfy these key equations.

Benkeser et al. (2017) demonstrated alternative equations that can be satisfied to achieve this form of double-robustness. In particular, they formulated an alternative version of the R-PS.

\[ g_{r, 0,1}(a \mid w):=\operatorname{Pr}_{0}\left\{A=a \mid \bar{Q}_{n}(W)=\bar{Q}_{n}(w)\right\} \]

which we refer to as R-PS1. They also introduced an additional regression of a weighted residual from the PS on the OR,

\[ g_{r, 0,2}(a \mid w):=E_{0}\left\{\frac{I(A=a)-g_{n}(a \mid W)}{g_{n}(a \mid W)} \mid \bar{Q}_{n}(W)=\bar{Q}_{n}(w)\right\} \]

which we refer to as R-PS2. The key feature of R-PS1 and R-PS2 is that they are both univariate regressions, while R-PS is a bivariate regression. Benkeser and colleagues suggested that this may make R-PS1 and R-PS2 easier to estimate consistently than R-PS. Further, they showed that if estimators or the OR, PS, and R-OR satisfy equations (5)-(6), and if additionally, estimators of the OR, R-PS\(1 g_{r, n, 1}^{*},\) and \(\mathrm{R}-\mathrm{PS}-2 g_{r, n, 2}^{*}\) satisfy

\[ \frac{1}{n} \sum_{i=1}^{n} \frac{I\left(A_{i}=a\right)}{g_{r, n, 1}^{*}\left(a \mid W_{i}\right)} g_{r, n, 2}^{*}\left(a \mid W_{i}\right)\left\{Y_{i}-\bar{Q}_{n}^{*}\left(a, W_{i}\right)\right\}=0 \]

then the GCOMP estimator based on this \(\bar{Q}_{n}^{*}\) also enjoys a doubly-robust limiting distribution. Benkeser et al. (2017) provided a TMLE algorithm that produced such estimates and provided closed-form, doubly-robust variance estimates.

Simulation

generateData<- function(n){
  w1 <- rbinom(n, size=1, prob=0.5)
  w2 <- rbinom(n, size=1, prob=0.65)
  w3 <- round(runif(n, min=0, max=4), digits=0)
  w4 <- round(runif(n, min=0, max=5), digits=0)
  A <- rbinom(n, size=1, prob= plogis(-5 +  0.05*w2 + 0.25*w3 + 0.6*w4 + 0.4*w2*w4))
  # counterfactual
  Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.35*w2 + 0.25*w3 + 0.20*w4 + 0.15*w2*w4))
  Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.35*w2 + 0.25*w3 + 0.20*w4 + 0.15*w2*w4))
  # Observed outcome
  Y <- Y.1*A + Y.0*(1 - A)
  # return data.frame
  data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}


# True ATE in the population
set.seed(7777)
ObsDataTrueATE <- generateData(n = 50000)

True_EY.1 <- mean(ObsDataTrueATE$Y.1)
True_EY.0 <- mean(ObsDataTrueATE$Y.0)
True_ATE  <- True_EY.1-True_EY.0 ;True_ATE
[1] 0.1939
True_MOR  <- (True_EY.1*(1-True_EY.0))/((1-True_EY.1)*True_EY.0);True_MOR
[1] 2.527
cat("\n True_ATE:", abs(True_ATE))

 True_ATE: 0.1939

smaller sample for analysis

# Data for analysis
set.seed(7722)
ObsData <- generateData(n = 10000)
write.csv(ObsData, "ObsData.csv")

TMLE Algorithm by hand - 6(!!) steps

1. Outcome regression

\[\Ol{Q}^0(A, \vee{W}) = \Exp{Y | A, \vee{W}}\]

# Naive approach: conditional odds ratio
naive <-glm(data = ObsData, Y ~ A + w1 + w2 + w3 + w4, family = binomial)
summary(naive)

Call:
glm(formula = Y ~ A + w1 + w2 + w3 + w4, family = binomial, data = ObsData)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-2.428  -1.136   0.589   0.989   1.781  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.2187     0.0688  -17.71   <2e-16 ***
A             1.0661     0.0847   12.59   <2e-16 ***
w1           -0.1378     0.0439   -3.14   0.0017 ** 
w2            0.6446     0.0458   14.08   <2e-16 ***
w3            0.2581     0.0183   14.08   <2e-16 ***
w4            0.3016     0.0160   18.87   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 13322  on 9999  degrees of freedom
Residual deviance: 12012  on 9994  degrees of freedom
AIC: 12024

Number of Fisher Scoring iterations: 4
exp(naive$coef[2])
    A 
2.904 
exp(confint(naive))
             2.5 % 97.5 %
(Intercept) 0.2582 0.3381
A           2.4650 3.4364
w1          0.7994 0.9495
w2          1.7419 2.0843
w3          1.2489 1.3420
w4          1.3104 1.3952
## TMLE implementation by hand
# Step 1 estimation and prediction of the model for the outcome (G-computation)
gm  <- glm(Y ~ A + w1 + w2 + w3 + w4, family = binomial, data = ObsData)
# Prediction for A, A = 1 and, A = 0
QAW_0 <- predict(gm, type = "response")
Q1W_0 = predict(gm, newdata=data.frame(A = 1, ObsData [,c("w1","w2","w3","w4")]), type = "response")
Q0W_0 = predict(gm, newdata=data.frame(A = 0, ObsData [,c("w1","w2","w3","w4")]), type = "response")
# Estimated mortality risk difference
glue("estimated ATE: {mean(Q1W_0 - Q0W_0)}")
estimated ATE: 0.203833059841068

This is misspecified on purpose (because it omits the interaction terms in the DGP).

2. Prediction of the propensity score \(\hat{g}(A, \vee{W})\)

Estimate \(\Prob{A = 1 | \vee{W}}\) (using logit here)

\[ \text{logit}[\Prob{A = 1 | \vee{W}}] = \alpha_0 + \ve{\alpha}_1' \vee{W} \]

and then estimate treatment probabilities

\[ \wh{g}(1, \vee{W}) = \exp it(\wh{\alpha}_0 + \wh{\ve{\alpha}_1}' \vee{W}) \]

# Step 2 estimation and prediction of the propensity score (ps)
psm <- glm(A ~ w1 + w2 + w3 + w4, family = binomial, data = ObsData)
gW = predict(psm, type = "response")
summary(gW)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0022  0.0302  0.0866  0.1557  0.2403  0.7185 

3. The ‘clever covariate’ and fluctuation parameter \(\varepsilon\)

The ‘clever covariates’ are the pscore-adjusted treatment indicators

\[ H(1, \vee{W}) = \frac{A}{\wh{g}(1, W)} ; \; \; H(0, \vee{W}) = \frac{1-A}{\wh{g}(0, \vee{W})} \]

The fluctuation parameter \(\wh{\epsi} = (\wh{\epsi}_0, \wh{\epsi}_1)\) is estimated through a maximum likelihood procedure. The observed outcome \(Y\) is the outcome and the logit of the initial prediction of \(\Ol{Q}^0(A, \vee{W})\) as an offset in an intercept-free logistic regression with the clever covariates.

\[ \mathbb{E}(\mathrm{Y}=1 \mid \mathrm{A}, \mathbf{W})(\varepsilon)=\frac{1}{1+\exp \left(-\log \frac{\overline{\mathrm{Q}}^{0}(\mathrm{~A}, \mathbf{W})}{\left(1-\overline{\mathrm{Q}}^{0}(\mathrm{~A}, \mathbf{W})\right)}-\varepsilon_{0} \mathrm{H}(0, \mathbf{W})-\varepsilon_{1} \mathrm{H}(1, \mathbf{W})\right)} \]

# Step 3 computation of H (clever covariates) and estimation of epsilon
H1W = ObsData$A / gW
H0W = (1-ObsData$A)  / (1 - gW)
epsilon <- coef(glm(ObsData$Y ~ -1 + H0W + H1W + offset(qlogis(QAW_0)), family = binomial))
glue("ε = {epsilon}")
ε = 0.00295279692951496
ε = 0.00269234947115084

4. Update of predicted outcomes

Here, we update \(\bar{Q}^0(A, \vee{W})\) to \(\bar{Q}^1(A, \vee{W})\)

using the expressions

\[ \overline{\mathrm{Q}}^{1}(0, \mathbf{W})=\operatorname{expit}\left[\operatorname{logit}\left(\overline{\mathrm{Q}}^{0}(0, \mathbf{W})\right)+\widehat{\varepsilon}_{0} / \widehat{\mathrm{g}}(0, \mathbf{W})\right] \text { and } \overline{\mathrm{Q}}^{1}(1, \mathbf{W})=\operatorname{expit}\left[\operatorname{logit}\left(\overline{\mathrm{Q}}^{0}(1, \mathbf{W})\right)+\widehat{\varepsilon}_{1} / \widehat{\mathrm{g}}(1, \mathbf{W})\right. \]

# Step 4 Targeted estimate of the ATE and Marginal Odds Ratio
Q0W_1 <-  plogis(qlogis(Q0W_0) +  epsilon[1] / (1 - gW))
Q1W_1 <-  plogis(qlogis(Q1W_0) +  epsilon[2] / gW)

5. Target estimate of ATE 6. Variance estimation

\[ \wh{\text{ATE}}_{TMLE} = \frac{1}{n} \sumiN \Ol{Q}^1(1, \vee{W}_i) - \Ol{Q}^1(0, \vee{W}_i) \]

# ATE
ATEtmle1 <-  mean(Q1W_1 - Q0W_1); ATEtmle1
[1] 0.2206
cat("\n ATEtmle1_bias:", abs(True_ATE - ATEtmle1))

 ATEtmle1_bias: 0.02673
cat("\n ATEtmle1_rel_bias:",abs(True_ATE - ATEtmle1)/True_ATE,"%")

 ATEtmle1_rel_bias: 0.1378 %
# Table to visualize the data
psi <- Q1W_1 - Q0W_1
library(DT)
df <- round(cbind(Q1W_0, Q0W_0, gW, eps1=epsilon[1], eps2=epsilon[2], psi), digits = 4)
datatable(head(df, n = 10), options = list(pageLength = 5, digits = 3))

6. Variance Estimation

TMLE constructs estimators based on the efficient Influence curve. The IC of a consistent and asymptotically linear estimator is derived in the gradient of the Gateaux derivative of the target parameter such that

\[ \wh{\tau}_\text{ATE} - \tau_\text{ATE} = \frac{1}{n} \sumiN IC_1 - O_p(\frac{1}{\sqrt{n}}) \]

The central limit theorem applies so that in large samples, the variance of the estimator is thus the variance of the IC divided by n. While many influence functions and corresponding estimators exist for a given target parameter, there always exists an “efficient” IC that achieves the lower bound on asymptotic variance for the given set of modelling assumptions. Targeted maximum likelihood estimation and AIPTW are both constructed by using the efficient IC, making these estimators asymptotically efficient for the statistical model when all necessary models are correctly specified. The Efficient IC for the ATE is

\[\begin{align*} \mathbf{E I C}_{\mathbf{A T E}} & =\left(\frac{\mathrm{A}}{\mathrm{P}(\mathrm{A}=1 \mid \mathbf{W})}-\frac{1-\mathrm{A}}{\mathrm{P}(\mathrm{A}=0 \mid \mathbf{W})}\right)[\mathrm{Y}-\mathrm{E}(\mathrm{Y} \mid \mathrm{A}, \mathbf{W})]+ \\ & \mathrm{E}(\mathrm{Y} \mid \mathrm{A}=1, \mathbf{W})-\mathrm{E}(\mathrm{Y} \mid \mathrm{A}=0, \mathbf{W})-\mathrm{ATE} \end{align*}\]

which can be evaluated as

\[ \widehat{\mathrm{EIC}}_{\mathrm{ATE}}=\left(\frac{\mathrm{A}}{\widehat{\mathrm{g}}(1, \mathbf{W})}-\frac{1-\mathrm{A}}{\widehat{\mathrm{g}}(0, \mathbf{W})}\right)\left[\mathrm{Y}-\overline{\mathrm{Q}}^{1}(\mathrm{~A}, \mathbf{W})\right]+\overline{\mathrm{Q}}^{1}(1, \mathbf{W})-\overline{\mathrm{Q}}^{1}(0, \mathbf{W})-\widehat{\mathbf{A T E}}_{\mathrm{TMLE}} \]

and the SE is computed the standard way

\[ \widehat{\sigma}_{\mathrm{ATE}, \mathrm{TMLE}}=\sqrt{\frac{\widehat{\operatorname{Var}\left(\mathrm{EIC}_{\mathrm{ATE}}\right)}}{\mathrm{n}}} \]

# Step 5 statistical inference (efficient influence curve)
# Efficient influence curve ATE
EY1tmle<-mean(Q1W_1)
EY0tmle<-mean(Q0W_1)

d1 <- ((ObsData$A) * (ObsData$Y - Q1W_1)/gW) + Q1W_1 - EY1tmle
d0 <- ((1 - ObsData$A) * (ObsData$Y - Q0W_1))/(1 - gW)  + Q0W_1 - EY0tmle

IC <- d1 - d0
n <- nrow(ObsData)
varHat.IC <- var(IC) / n
ATEtmle1CI <- c(ATEtmle1 - 1.96 * sqrt(varHat.IC), ATEtmle1 + 1.96 * sqrt(varHat.IC))
ATEtmle1
[1] 0.2206
ATEtmle1CI
[1] 0.1513 0.2900

AIPW - Double Robust estimator

# Augmented inverse probability treatment weighting (AIPTW) estimator
EY1aiptw <- mean((ObsData$A) * (ObsData$Y - Q1W_0) / gW + Q1W_0)
EY0aiptw <- mean((1 - ObsData$A) * (ObsData$Y - Q0W_0) / (1 - gW) + Q0W_0)

AIPTW_ATE <- EY1aiptw - EY0aiptw; AIPTW_ATE
[1] 0.2398
cat("\n AIPTW_bias:", abs(True_ATE - AIPTW_ATE))

 AIPTW_bias: 0.04589
cat("\n AIPTW_rel_bias:",abs(True_ATE - AIPTW_ATE) / True_ATE,"%")

 AIPTW_rel_bias: 0.2367 %
D1 <- (ObsData$A) * (ObsData$Y - Q1W_0) / gW + Q1W_0 - EY1aiptw
D0 <- (1 - ObsData$A) * (ObsData$Y - Q0W_0) / (1 - gW) + Q0W_0 - EY0aiptw
varHat_AIPTW <- var(D1 - D0) / n

# AIPTW ATE 95%CI
ATEaiptw_CI <- c(AIPTW_ATE - 1.96 * sqrt(varHat_AIPTW), AIPTW_ATE + 1.96 * sqrt(varHat_AIPTW)); AIPTW_ATE; ATEaiptw_CI
[1] 0.2398
[1] 0.1640 0.3156

TMLE Package

# R-package tmle (base implementation includes SL.step, SL.glm and SL.glm.interaction)
library(tmle)
library(SuperLearner)
TMLE2 <- tmle(Y = ObsData$Y, A = ObsData$A, W = ObsData[,c("w1", "w2", "w3", "w4")], family = "binomial")

#Note that the tmle function default bounds the probablities in the clever covariate denominators at 0.025.
#You can remove this bound by specifying gbound=0

ATEtmle2 <- TMLE2$estimates$ATE$psi;ATEtmle2
[1] 0.2273
TMLE2$estimates$ATE$CI
[1] 0.1771 0.2775
MORtmle2 <- TMLE2$estimates$OR$psi;MORtmle2
[1] 3.125
TMLE2$estimates$OR$CI
[1] 2.236 4.368
cat("\n ATEtmle2_bias:", abs(True_ATE - ATEtmle2))

 ATEtmle2_bias: 0.03341
cat("\n ATEtmle2_Rel_bias:",abs(True_ATE - ATEtmle2) / True_ATE,"%")

 ATEtmle2_Rel_bias: 0.1723 %

TMLE with Superlearner

SL.library <- c("SL.glm","SL.step","SL.step.interaction", "SL.glm.interaction","SL.gam",
                "SL.randomForest", "SL.rpart")

TMLE3 <- tmle(Y = ObsData$Y,A = ObsData$A,W = ObsData [,c("w1", "w2", "w3", "w4")],
              family = "binomial", Q.SL.library = SL.library,g.SL.library = SL.library)#, gbound=0)

ATEtmle3 <- TMLE3$estimates$ATE$psi;ATEtmle3
[1] 0.218
TMLE3$estimates$ATE$CI
[1] 0.1786 0.2574
MORtmle3 <- TMLE3$estimates$OR$psi;MORtmle3
[1] 2.939
TMLE3$estimates$OR$CI
[1] 2.287 3.777
cat("\n ATEtmle3_bias:", abs(True_ATE - ATEtmle3))

 ATEtmle3_bias: 0.02411
cat("\n ATEtmle3_rel_bias:", abs(True_ATE - ATEtmle3) / True_ATE,"%")

 ATEtmle3_rel_bias: 0.1244 %

DRTMLE

libreq(drtmle)
     wants    loaded
[1,] "drtmle" TRUE  
set.seed(1234)
n <- 200
W <- data.frame(W1 = runif(n), W2 = rbinom(n, 1, 0.5))
A <- rbinom(n, 1, plogis(W$W1 + W$W2))
Y <- rbinom(n, 1, plogis(W$W1 + W$W2*A))

We specify stratify = FALSE to indicate that we wish to fit a single OR to estimate \(Q^0(A,W)\), as opposed to fitting two separate regressions to estimate \(Q^0(1,W)\) and \(Q^0(0,W)\).

glm_fit_uni <- drtmle(W = W, A = A, Y = Y, family = binomial(),
                      glm_g = "W1 + W2", glm_Q = "W1 + W2*A",
                      glm_gr = "Qn", glm_Qr = "gn", stratify = FALSE)
glm_fit_uni
$est
        
1 0.7111
0 0.5384

$cov
           1          0
1  1.742e-03 -1.004e-05
0 -1.004e-05  4.082e-03
ci(glm_fit_uni, contrast = c(1,-1))
$drtmle
                  est   cil   ciu
E[Y(1)]-E[Y(0)] 0.173 0.023 0.323

The drtmle function implements either the estimators of van der Laan (2014) (if reduction = “bivariate”) or the improved estimators of Benkeser et al. (2017) (reduction = “univariate”).

Using learners

sl_fit <- drtmle(W = W, A = A, Y = Y, family = binomial(),
                 SL_g = c("SL.glm","SL.mean"),
                 SL_Q = c("SL.glm","SL.mean"),
                 SL_gr = c("SL.glm", "SL.earth"),
                 SL_Qr = c("SL.glm", "SL.earth"),
                 stratify = FALSE)
sl_fit
$est
        
1 0.7109
0 0.5374

$cov
          1         0
1 1.740e-03 4.837e-07
0 4.837e-07 4.135e-03
ci(sl_fit, contrast = c(1,-1))
$drtmle
                  est   cil   ciu
E[Y(1)]-E[Y(0)] 0.173 0.023 0.324
npreg_fit <- drtmle(W = W, A = A, Y = Y, family = binomial(),
                    SL_g = "SL.npreg", SL_Q = "SL.npreg",
                    SL_gr = "SL.npreg", SL_Qr = "SL.npreg",
                    stratify = TRUE)
npreg_fit
$est
        
1 0.7159
0 0.5343

$cov
           1          0
1  1.675e-03 -1.024e-05
0 -1.024e-05  3.963e-03
ci(npreg_fit, contrast = c(1,-1))
$drtmle
                  est   cil   ciu
E[Y(1)]-E[Y(0)] 0.182 0.034 0.329

References