crabbymetrics

crabbymetrics logo

crabbymetrics is a Rust-backed econometrics library with a compact Python API. The docs are organized around the public API, a single binding crash course, an active First Course Ding translation track, supervised and semiparametric example pages, unsupervised transform notes, focused numerical ablations, and a small set of supporting internals pages.

Start Here

API reference: verified public surface, summary schemas, optimizer catalog, and runtime smoke checks.
First Course Ding: chapter-by-chapter translation plan and current implementation status for the Peng Ding notebooks.
Binding Crash Course: OLS: the shortest end-to-end walkthrough of the Rust-to-Python wrapper pattern in this codebase.

First Course Ding

Overview And TOC: planning page and chapter map for the full Ding translation pass.
Translated so far: Chapter 1, Chapter 2, Chapter 3, Chapter 4, Chapter 9, Chapter 11, Chapter 12, Chapter 13, Chapter 21, and Chapter 23.
The next obvious tranche is Chapters 15 through 20, where matching, sensitivity analysis, and RD helpers start to matter.

Supervised Learning Examples

OLS: baseline linear regression with switchable vanilla and HC1 covariance estimators.
Ridge: closed-form L2-regularized least squares with a scalar penalty or a penalty grid plus cross-validation.
Fixed Effects OLS: partial out one-way or multi-way categorical fixed effects with within, then estimate slopes without an intercept.
ElasticNet: regularized linear regression.
Synthetic Control: simplex-constrained donor weighting for treated-versus-donor panel matching under latent-factor drift.
Logit: binary logistic regression.
Multinomial Logit: multiclass classification.
Poisson: count regression, with model-based or QMLE sandwich inference available from summary(vcov=...).
TwoSLS: instrumental variables regression, including multiple endogenous regressors and multiple excluded instruments.
GMM: fit just-identified score equations, overidentified two-step IV moments, and stacked nuisance-parameter moments with the first-class GMM estimator.
FTRL: online-style binary classification.
MEstimator Poisson: callback-driven estimation matched against the built-in Poisson estimator.

Semiparametric Examples

Balancing Weights: entropy and quadratic calibration weights for ATT-style reweighting, covariate shift, and domain adaptation.
EPLM: the Robins-Newey partially linear E-estimator, implemented as a stacked-moment estimator for a scalar continuous treatment.
Average Derivative: Oaxaca-Blinder, generalized IPW, and doubly robust average-derivative estimators in the Graham-Pinto style under a continuous-treatment working model.
Double ML And AIPW: cross-fit ridge nuisance estimation for partially linear DML and binary-treatment AIPW.
Richer Regression With Transformer Pipelines: a longer semiparametric-style example built on KernelBasis and PCA.

Unsupervised Learning Examples

PCA And Kernel Basis: transformer examples for richer design matrices and nonlinear geometry.

Ablations

Variance Estimators: cached Monte Carlo coverage experiments for OLS, Poisson, and GMM variance estimators under heteroskedasticity or overdispersion.
Semiparametric Estimator Comparisons: cached Monte Carlo comparisons for EPLM versus partially linear DML under continuous-treatment misspecification, and for vanilla regression versus balancing weights versus AIPW under binary treatment.
Bridging Finite And Superpopulation: cached Monte Carlo coverage comparisons for HC2, Ding’s closed-form correction, a raw-row bootstrap, and stacked GMM when the same adjusted estimator is evaluated against SATE and PATE.

Optimization

Optimizers: direct optimizer usage for smooth likelihoods, rougher objective surfaces, and solver behavior comparisons.
GMM With Optimizers: the lower-level notebook that motivated the first-class GMM estimator and still shows the residual-collection view directly.

Supporting Pages

Binding Internals: Poisson: a deeper built-in-estimator walkthrough after the OLS crash course.
Binding Internals: MEstimator: the callback-heavy bridge where Rust owns optimization and Python supplies the objective and scores.

Runtime Snapshot

Runtime comparison benchmark across crabbymetrics, scikit-learn, and statsmodels — Repo-level runtime comparison for OLS, Logit, Poisson, and MultinomialLogit across `crabbymetrics`, `scikit-learn`, and `statsmodels`. Lower is faster.

The benchmark figure is generated from the repo-level benchmark assets in benchmarks/ and gives a quick scale check for the estimators that overlap cleanly with mainstream Python baselines.

Notes

api.qmd remains the main documentation page and renders to api.html.
The site is a Quarto website, so shared navigation and search are generated under docs/.
All pages are rendered with embedded resources so the checked-in HTML files remain self-contained.