2025 03 01

Substacks are too much work, just write markdown. Going to try to maintain a linkblog where I preserve links for posterity.

overview of LLMs for statisticians LLM papers are often/usually unreadable, especially for those of us who expect a specific kind of mathematical clarity inculcated in graduate programs in statistics and adjacent fields. This looks like a welcome remedy.
classificator from Julia Evans, which lets you run a small web-based UI to perform annotations and (thanks to a small PR from yours truly) export it.
LLM alignment from an experimental-design point of view from Shen et al. I’ve recently been working in the space of aligning large black-box models to stated preferences from human annotations, and the 0th order step of what to get human annotations from is an interesting (afaict open) problem. The problem is basically that of selecting a set of maximally informative labelled queries 𝒬 from an infinite set of possible queries 𝒰 for which one seeks ground-truth labels from human annotators. Chen et al provide an interesting connection to classical experimental design relating to the alphabet-optimal schemes that correspond to various properties of the design matrix, most famously the fisher information.
Yogurts choose consumers? an amusing but mathematically deep discrete choice paper that begins by demonstrating that the discrete choice problem is more symmetric than one would realise, thereby justifying their inversion of the common trope in Industrial organization.
Animal Well Documentary about one of the finest games I’ve played in a long time.
Julian Lage’s acoustic albums - had the good fortune of seeing Lage and Jorge Roeder [on upright bass] at sfjazz. Life-affirming levels of virtuosity.

shameless self-promotion

For my own edification, I built lludens as a thin wrapper around Simon Willison’s llm package to instantiate agents powered by various LLMs (that Simon is generous enough to provide plugins to interact with). This departs from the traditional API workflow for many LLMs, which entails a client.create() function call that does not preserve user and model responses in the context window and therefore is optimal for one-off tasks. Simulation of economic agents, e.g. when playing games (not fun ones but strategic ones), requires preserving state, action, reward tuples (naively as string tokens, could be amended to accomodate more bespoke data structures), so I built it in a burst of impatience.

The notebooks directory has an example where I get two LLMs to play a repeated prisoners’ dilemma. Claude does the economically optimal thing of defecting in the last round, while GPT4o gullibly cooperates throughout.

synthlearners now supports matrix completion methods for causal inference with panel data (Athey et al 2021) thanks to a PR from my colleague Ayal.
Does regression produce representative causal rankings had some interesting discussions with colleagues at an onsite this week about the properties of various DML estimators and how much stock to put in them when they are applied (often blindly) to an array of many treatments. My short paper provides both a theoretical and empirical examination of where these discrepancies arise, and under what circumstances these differences can be consequential. The upshot is that while the PLM can yield incorrect rankings of treatments by treatment effects while AIPW never does, the level and patterns of heterogeneity required are unusual. Then again, it is always worth checking.