Lalgorithms

❯

2025-06-09

Jun 23, 20255 min read

linklog
causal-inference
own-research
double-ml
empirical-bayes
score-matching
lalonde
political-economy
bayesian-temu
python
open-source
optimization

self-promotion

Winston Chou and I just arxived Does Residuals-on-Residuals Regression Produce Representative Estimates of Causal Effects?, which studies what the residuals-on-residuals regression estimates in settings with discrete or continuous-valued treatments, and finds that the familiar conditional variance-weighted average interpretation is no longer benign in this setting: the plim of the coefficient is the conditional-variance-weighted average of of causal derivatives evaluated at points not in the observed dataset, which generally differs from the Average Causal Derivative (ACD). This is yet another example where the identifying variation in OLS (even the ML-powered partially-linear variety) is opaque and data-dependent, which goes against the maxim of separating identification from estimation / not letting your estimator pick your estimand. This leads us to caution against using RoRR reflexively in settings with unknown and opaque treatment distributions. We find that a coarsened AIPW estimator is more interpretable in such settings.
Alex Fisher and I are doing an introductory talk/invitation to the python open-source econometrics ecosystem (primarily revolving around pyfixest) at the NABE webinar on Thursday
LLMs are getting pretty good at sketching out toy economic models: documented my attempts with two models on formalizing Paul David’s classic QWERTY paper

links

papers

Coey and Hung on empirical bayes for policy learning / value maximization where they show that in a restrictive parametric setting, selection is fundamentally easier than estimation and they can attain superefficient rates.
Feng et al on selecting the optimal loss function for m-estimation without assumptions about noise in the context of linear regression. We’re used to working with L2 loss that gives us a clean moment condition for M-estimation/GMM, but this work goes a step further by contributing a method to determine the optimal population-level convex loss function to minimize the asymptotic variance of the downstream parameter vector. This turns out to be connected to score matching and energy-based models that are popular in training diffusion models (!).
Imbens and Xu is a nice didactic article on progress in observational causal inference since Lalonde’s landmark 1986 paper, which was very pessimistic in its conclusions about observational methods at the time. The authors (CoI: both are mentors and coauthors) provide five key lessons: (1) unconfoundedness has rightfully become the central assumption for observational methods (and it importantly separates identification from estimation), (2) overlap/common support is critical, (3) pscores have become a core component of our toolkit, (4) treatment effect heterogeneity is taken much more seriously, and (5) validation / sensitivity analyses are essential. The paper is also accompanied by a nice tutorial in R that doubles up as a crash course in modern causal inference / causalML methods in R.
Cai et al on general conditions that enable the identification of the average treatment effect, extending beyond (1) unconfoundedness and (2) overlap using ideas from statistical learning theory. They characterize the set of concept classes for the generalized propensity score ( $P \subseteq {p : R^{d} \times R \to [0, 1]}$ ) and covariate-outcome distributions $D \subseteq Δ (R^{d} \times R)$ , and then proceed to analyze conditions on ( $P, D$ ) characterize the identifiability of treatment effects. They then tackle scenarios where (1) and (2) fail respectively, e.g. overlap with selection (e.g. Marginal Sensitivity Model) and Regression Discontinuity. At a first glance, it reads to me like a very familiar story translated into a foreign language with no obvious implications for practice [I don’t see any new estimators or implications for empirical practice].
Chib and Shimazu propose a bayesian framework for staggered DiD, which is appealing in settings with small sample sizes where fully disaggregated event studies might be challenging to estimate. I find the cottage-industry of producing bayesian alternatives to frequentist procedures in econometrics fascinating, and would love to hear about anyone who actually uses these things.
Roman Vershynin’s High-dimensional Probability Book received a 2nd edition update.
Martin Osborne’s Models in Political Economy open textbook is a very comprehensive and beginner-friendly intro to the value of economic and mathematical formalization in thinking about political institutions and environments.

code, music

mlx-lm is good for running small LLMs on apple silicon. I usually prefer ollama, but this lets one download safetensors directly from huggingface and thus has a larger choice set [but is a bit slower, in my casual comparison]
argmin : numerical optimization in rust
psql : piped sql for duckdb. I despise SQL’s syntax order of SELECT _ FROM _ JOIN _ ON _ WHERE _ GROUP BY _ HAVING _ ORDER BY _ UNION with a passion; that is not the right order of operations for any sane data analyst and has slowed me down / tripped me up / produced mistakes in the past. Piping is much more sane [either when done in dplyr or polars / analogous syntax in data.table or pandas], and I wish people would adopt it more widely.

Graph View

self-promotion
links
papers
code, music

Created with Quartz v4.5.0 © 2025

GitHub
Discord Community