while keeping the control weights close to uniform under either an entropy or a quadratic objective. In causal-inference terms, the common ATT use case is:
Kang-Schafer, where the observed covariates are nonlinear transformations of latent Gaussian drivers.
Hainmueller, adapted from aipyw, where overlap and functional-form difficulty can be dialed up or down.
Literature Map
BalancingWeights sits in a tight cluster of weighting estimators that differ more by parameterization than by the balance conditions they impose.
Hainmueller (2012) formulates entropy balancing as a convex calibration problem: choose positive control weights that exactly match treated covariate moments while staying close to baseline weights.
Graham, de Xavier Pinto, and Egel (2012) show that inverse probability tilting estimates a logit index by moment conditions chosen so the implied weights satisfy exact sample balance.
Imai and Ratkovic (2014) recast the same balance-first logic as a propensity-score GMM / empirical-likelihood estimator.
Graham, Pinto, and Egel (2016) extend the same tilting geometry to data-combination problems by introducing separate study and auxiliary tilts.
Zhao and Percival (2017) make the dual interpretation explicit from the entropy-balancing side: entropy balancing behaves like a logistic propensity-score fit with a different loss.
For the ATT problem, the cleanest exact equivalence is between entropy balancing and just-identified logit CBPS. Graham’s tilting estimators are the same family, but full AST adds an extra layer of tilting beyond that baseline case.
Entropy Balancing, CBPS, and Tilting
Let \(c_i = c(X_i)\) be the balance basis, including an intercept, and let \(q_i > 0\) denote baseline control weights. For the ATT, entropy balancing solves
after the usual sign relabeling of the multipliers. This is the core Hainmueller result: entropy balancing chooses the multiplier \(\lambda\) so that the tilted control distribution exactly matches the treated moments.
Now write the ATT-CBPS balance equations with a logit propensity score \(p_i = \Lambda(\beta^\top c_i)\):
So with the same balance basis \(c(X)\), an intercept, and uniform baseline weights \(q_i\), ATT entropy balancing and just-identified logit CBPS deliver the same normalized weights. The practical difference is mostly primal versus dual parameterization: entropy balancing solves directly for calibration weights or multipliers, while CBPS solves for propensity-score coefficients whose logit odds generate those same weights.
Graham’s inverse probability tilting step is the same balance-first idea in another parameterization. The 2012 IPT moments choose a logit index so the implied weights satisfy exact balance moments, which in the ATT specialization again produces inverse-odds weights proportional to \(\exp(\beta^\top c_i)\). AST adds a second layer of tilting on top of that baseline. Writing \(\hat p_i = \Lambda(r_i^\top \hat \delta)\), the auxiliary weights take the form
which is exactly the same inverse-odds / entropy-balancing weight formula. With nontrivial study or auxiliary tilts, AST is a strict generalization rather than literally the same estimator.
Quadratic balancing in this library keeps the same balance constraints and changes only the distance penalty. It therefore targets the same sample moments as entropy balancing, but it does not imply the same log-linear weight formula.
Kang-Schafer is useful here because balancing is asked to work on the transformed covariates \(X\), not the latent Gaussian drivers \(Z\) that generated treatment and outcomes. The true ATT is zero, so the gap between the estimator and zero is pure bias.
On the single draw above, both balancing estimators substantially reduce the raw covariate imbalance, and the oracle version that balances on latent \(Z\) shows the benchmark we would like to approach.
Show code
def run_kang_schafer_panel(n_rep=80, n=1000, seed=2026): rng = np.random.default_rng(seed) rows = []for rep inrange(n_rep): y, d, x, z = kang_schafer_dgp(n, rng) naive = y[d ==1].mean() - y[d ==0].mean() quad, quad_summary = fit_att_balancing(y, d, x, "quadratic") ent, ent_summary = fit_att_balancing(y, d, x, "entropy") oracle, oracle_summary = fit_att_balancing(y, d, z, "entropy") rows.append(("Naive", naive, True)) rows.append(("Quadratic on observed X", quad, bool(quad_summary["success"]))) rows.append(("Entropy on observed X", ent, bool(ent_summary["success"]))) rows.append(("Entropy on latent Z", oracle, bool(oracle_summary["success"])))return rowsdef summarize_rows(rows, truth=0.0): methods =sorted({row[0] for row in rows}) out = []for method in methods: values = np.array([row[1] for row in rows if row[0] == method], dtype=float) successes = np.array([row[2] for row in rows if row[0] == method], dtype=bool) out.append( [ method,f"{values.mean(): .3f}",f"{(values.mean() - truth): .3f}",f"{np.sqrt(np.mean((values - truth) **2)): .3f}",f"{successes.mean(): .3f}", ] )return outkang_rows = run_kang_schafer_panel()display(HTML(html_table(["Method", "Mean Estimate", "Bias", "RMSE", "Success Rate"], summarize_rows(kang_rows))))
Method
Mean Estimate
Bias
RMSE
Success Rate
Entropy on latent Z
0.027
0.027
0.095
1.000
Entropy on observed X
-4.402
-4.402
4.544
1.000
Naive
-20.383
-20.383
20.511
1.000
Quadratic on observed X
-6.217
-6.217
6.353
1.000
The observed-\(X\) versions still live inside the canonical misspecification problem, so they do not become oracle estimators just by balancing means. But they do move sharply toward zero relative to the naive treated-control difference.
Hainmueller
The Hainmueller design below is adapted from aipyw. It keeps the true treatment effect at zero while varying overlap and the difficulty of the treatment and outcome models.
Show code
def run_hainmueller_panel(setting_name, overlap_design, pscore_design, outcome_design, n_rep=50, n=1500, seed=3030): rng = np.random.default_rng(seed) rows = []for rep inrange(n_rep): y, d, x = hainmueller_dgp( n=n, rng=rng, overlap_design=overlap_design, pscore_design=pscore_design, outcome_design=outcome_design, ) naive = y[d ==1].mean() - y[d ==0].mean() quad, quad_summary = fit_att_balancing(y, d, x, "quadratic") ent, ent_summary = fit_att_balancing(y, d, x, "entropy") rows.append((setting_name, "Naive", naive, True)) rows.append((setting_name, "Quadratic", quad, bool(quad_summary["success"]))) rows.append((setting_name, "Entropy", ent, bool(ent_summary["success"])))return rowsdef summarize_hainmueller(rows, truth=0.0): settings =sorted({row[0] for row in rows}) out = []for setting in settings: methods =sorted({row[1] for row in rows if row[0] == setting})for method in methods: values = np.array([row[2] for row in rows if row[0] == setting and row[1] == method], dtype=float) successes = np.array([row[3] for row in rows if row[0] == setting and row[1] == method], dtype=bool) out.append( [ setting, method,f"{values.mean(): .3f}",f"{(values.mean() - truth): .3f}",f"{np.sqrt(np.mean((values - truth) **2)): .3f}",f"{successes.mean(): .3f}", ] )return outhain_easy = run_hainmueller_panel("Easy: overlap 2 / pscore 1 / outcome 1", 2, 1, 1)hain_hard = run_hainmueller_panel("Hard: overlap 1 / pscore 3 / outcome 3", 1, 3, 3)hain_rows = hain_easy + hain_harddisplay( HTML( html_table( ["Setting", "Method", "Mean Estimate", "Bias", "RMSE", "Success Rate"], summarize_hainmueller(hain_rows), ) ))
Setting
Method
Mean Estimate
Bias
RMSE
Success Rate
Easy: overlap 2 / pscore 1 / outcome 1
Entropy
-0.001
-0.001
0.063
1.000
Easy: overlap 2 / pscore 1 / outcome 1
Naive
1.157
1.157
1.167
1.000
Easy: overlap 2 / pscore 1 / outcome 1
Quadratic
-0.001
-0.001
0.064
0.980
Hard: overlap 1 / pscore 3 / outcome 3
Entropy
1.774
1.774
18.739
1.000
Hard: overlap 1 / pscore 3 / outcome 3
Naive
1.823
1.823
18.806
1.000
Hard: overlap 1 / pscore 3 / outcome 3
Quadratic
1.799
1.799
18.748
1.000
Show code
def rmse_by_setting(rows): settings =sorted({row[0] for row in rows}) methods = ["Naive", "Quadratic", "Entropy"] rmse = np.zeros((len(settings), len(methods)))for i, setting inenumerate(settings):for j, method inenumerate(methods): values = np.array([row[2] for row in rows if row[0] == setting and row[1] == method], dtype=float) rmse[i, j] = np.sqrt(np.mean(values**2))return settings, methods, rmsesettings, methods, rmse = rmse_by_setting(hain_rows)fig, ax = plt.subplots(figsize=(10, 4))xpos = np.arange(len(settings))width =0.25for j, method inenumerate(methods): ax.bar(xpos + (j -1) * width, rmse[:, j], width=width, label=method)ax.set_xticks(xpos)ax.set_xticklabels(settings, rotation=10, ha="right")ax.set_ylabel("RMSE around true ATT = 0")ax.set_title("Hainmueller DGP: balancing weights versus the naive difference in means")ax.legend()fig.tight_layout()
Takeaways
BalancingWeights is most naturally a building block. The class returns the control weights; the ATT estimate is the weighted control mean subtracted from the treated mean.
autoscale=True is useful on these simulation designs because the raw covariate scales can be wildly different.
Entropy and quadratic balancing often move together on easy designs, but quadratic balancing can prefer sparser solutions and a smaller effective sample size.
Kang-Schafer remains hard when only the transformed covariates are observed. Balancing means helps, but it does not erase misspecification by itself.