Substacks are too much work, just write markdown. Going to try to maintain a linkblog where I preserve links for posterity.
ground-truth
labels from human annotators. Chen et al
provide an interesting connection to classical experimental design
relating to the alphabet
-optimal schemes that correspond to
various properties of the design matrix, most famously the fisher
information.llm
package to instantiate agents
powered by
various LLMs (that Simon is generous enough to provide plugins to
interact with). This departs from the traditional API workflow for many
LLMs, which entails a client.create()
function call that
does not preserve user and model responses in the context window and
therefore is optimal for one-off tasks. Simulation of economic agents,
e.g. when playing games (not fun ones but strategic ones), requires
preserving state, action, reward
tuples (naively as string
tokens, could be amended to accomodate more bespoke data structures), so
I built it in a burst of impatience.The notebooks directory has an example where I get two LLMs to play a repeated prisoners’ dilemma. Claude does the economically optimal thing of defecting in the last round, while GPT4o gullibly cooperates throughout.
synthlearners now supports matrix completion methods for causal inference with panel data (Athey et al 2021) thanks to a PR from my colleague Ayal.
Does regression produce representative causal rankings had some interesting discussions with colleagues at an onsite this week about the properties of various DML estimators and how much stock to put in them when they are applied (often blindly) to an array of many treatments. My short paper provides both a theoretical and empirical examination of where these discrepancies arise, and under what circumstances these differences can be consequential. The upshot is that while the PLM can yield incorrect rankings of treatments by treatment effects while AIPW never does, the level and patterns of heterogeneity required are unusual. Then again, it is always worth checking.