crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Supervised Learning
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Synthetic Control
    • Logit
    • Multinomial Logit
    • Poisson
    • TwoSLS
    • GMM
    • FTRL
    • MEstimator Poisson
  • Semiparametrics
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
  • Unsupervised Learning
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV

On this page

  • The Three Estimators
    • Oaxaca-Blinder
    • Generalized IPW
    • Doubly Robust
  • Interpretation

Average Derivative

Oaxaca-Blinder, generalized IPW, and doubly robust average-derivative estimators

AverageDerivative targets the average derivative of the outcome with respect to a scalar continuous treatment \(D\).

The working outcome model is

\[ \mathbb{E}[Y \mid D, W] = \alpha + \gamma^\top (W - \mu_W) + D \left[\beta + \delta^\top (W - \mu_W)\right]. \]

Under this parameterization, the conditional derivative is

\[ \frac{\partial}{\partial D}\mathbb{E}[Y \mid D, W] = \beta + \delta^\top (W - \mu_W), \]

so the average derivative is simply \(\beta\) because \(\mathbb{E}[W - \mu_W] = 0\).

The Three Estimators

Oaxaca-Blinder

method="ob" fits the interacted linear regression directly and reads off the coefficient on \(D\).

Generalized IPW

method="ipw" uses the normal working model

\[ e(W) = \mathbb{E}[D \mid W], \qquad v(W) = \operatorname{Var}(D \mid W) = \sigma^2, \]

and solves

\[ \mathbb{E}\left[ \frac{D_i - e(W_i)}{\sigma^2}\left(Y_i - \beta D_i\right) \right] = 0. \]

Doubly Robust

method="dr" combines the two. It keeps the outcome regression basis from the Oaxaca-Blinder estimator and replaces the last instrument with the generalized IPW score:

\[ Z_i = \begin{bmatrix} 1 \\ W_i - \mu_W \\ (W_i - \mu_W)D_i \\ \frac{D_i - e(W_i)}{\sigma^2} \end{bmatrix}. \]

The implementation treats all three estimators as exactly identified stacked-moment systems, so their reported standard errors come from a common sandwich calculation rather than ad hoc post-processing.

Show code
from html import escape

import numpy as np
from IPython.display import HTML, display

import crabbymetrics as cm


def html_table(headers, rows):
    parts = [
        "<table>",
        "<thead>",
        "<tr>",
        *[f"<th>{escape(str(header))}</th>" for header in headers],
        "</tr>",
        "</thead>",
        "<tbody>",
    ]
    for row in rows:
        parts.append("<tr>")
        for cell in row:
            parts.append(f"<td>{escape(str(cell))}</td>")
        parts.append("</tr>")
    parts.extend(["</tbody>", "</table>"])
    return "".join(parts)


rng = np.random.default_rng(2027)
n = 1400
w = rng.normal(size=(n, 3))
d = 0.2 + w @ np.array([0.6, -0.4, 0.3]) + rng.normal(scale=0.7, size=n)
wc = w - w.mean(axis=0)
y = (
    1.0
    + wc @ np.array([0.3, -0.2, 0.4])
    + d * (1.25 + wc @ np.array([0.2, -0.15, 0.1]))
    + rng.normal(scale=0.5, size=n)
)

results = []
for method in ["ob", "ipw", "dr"]:
    model = cm.AverageDerivative(method=method)
    model.fit(y, d, w)
    summary = model.summary()
    results.append([method.upper(), f"{summary['coef']: .4f}", f"{summary['se']: .4f}"])

results.insert(0, ["Truth", f"{1.25: .4f}", "--"])
display(HTML(html_table(["Estimator", "Estimate", "SE"], results)))
Estimator Estimate SE
Truth 1.2500 --
OB 1.2779 0.0212
IPW 1.3136 0.0276
DR 1.2779 0.0212

Interpretation

  • ob is the most model-driven: it assumes the interacted linear regression is the right local approximation.
  • ipw leans on the generalized propensity score side instead.
  • dr is the cleanest compromise in this working-model family. It uses both pieces and is locally efficient under the joint specification used here.

The current implementation is the scalar continuous-treatment case. That is the right v1 because it keeps the average-derivative target, the normal working model, and the moment system all aligned.