self-promotion

  • Lal, Imbens, and Hull is a new paper we wrote analyzing the long-term causal inference problem [wherein one has a randomly assigned treatment , several short-term proxy metrics observed soon after treatment assignment, and a long-term outcome observed much later]. Experimenters typically run the experiment for a short period of time so treatment effects on , are estimated without bias, but the policymaker / business-leader wants to optimize for the long-term outcome , whose treatment effect is hard to estimate without additional assumptions. We work with a measurement error framework and study the very intiuitive inner product estimator where is the vector of regularized regression coefficients from regressing on in a large observational dataset. We show that if is a growing set of measurements of a low-dimensional unobserved ‘true surrogate’ [say, customer utility], the simple projection estimator has quantifable and shrinking bias as the dimensionality of grows large and in `informative directions’. You can play with an interactive demo of the method on the GAIN data here

  • I’m moving to the model measurement team at OpenAI! Excited to be working on causal and post-training problems in the LLM world.

  • RL-Causal Inference Rosetta Stone is a work-in-progress set of notes on modern policy gradient methods in deep reinforcement learning for those well-versed in causal inference. There are striking similarities between the two fields, and I find RL notation intolerable to read, so I wrote this to help myself understand the connections better. The notes are still a work in progress, but should be mostly complete for the policy gradient section.

  • pricing interactive simulator is a small interactive essay on the econ 101 of pricing problem in tech - valuations are unknown and everyone [if twitter is to be believed] believes their demand is wayy more inelastic than it actually is. The simulator lets you specify a valuation distribution [including mixtures with 2 types of consumers], simulates a firm experimenting with prices to arrive at the profit-maximizing optimum, and price discrimination under complete information. The upshot is that the optimal price is the inverse hazard rate of the customer valuation distribution plus markup; and firms rediscover this at their peril.

  • lalten is an open source codebase of my hobbyist application website lalten.org [lalten means ‘lantern’ in hindi/maithili]. It is deliberately very minimal - get a VPS [I use hetzner], buy a domain and point it to the VPS’s IP address, and write applications in basic python+js for personal use. No frameworks, no bloat, just the bare minimum to get things done. Here is something of a hobbyist software developer manifesto. The code is open source if anyone wants to use it as a starting point for their own personal projects. Stuff that I have deployed primarily for my own use

    • radio is a basic internet radio interface that accepts radio stream URLs (.pls or .m3u from indie radio websites) and plays them through a web interface. Built primarily so I can rotate through the venerable bay area SOMA stations without many popups + on my phone. soma SF1033, suburbsofgoa, kcsm, kexp, are all good.
    • linkpull grabs complete URLs matching a regex (say, pdf extensions) from a webpage. Very useful for large data downloads or lecture-note-hoarding from course websites.
    • daylight is a small webapp that shows calculates the amount of daylight D(t) and its first derivative D’(t) for any location on earth on any date.
    • we also have household shopping lists and a few other small utilities that I won’t link to here because I don’t want to roll my own auth.
  • audio-to-sft-pipeline is a fun little project I did to go from soup-to-nuts on fine-tuning a small open source LLM to an audio corpus (which requires transcription with whisper - other models are available). Go from an xml of the rss feeds to transcripts to fine-tuning to generating new fake episodes [if you think you’d like to make a podcast like X, just finetune on X and generate some nonsense - the world doesn’t need another podcast]. I used one of my favourite comedy podcasts The Budpod for it.

  • rovingbandit received some major updates to support feasible neyman allocation for precision-oriented adaptive experimentation, LUCB for best arm identification, and a linear contextual bandit implementation.

papers

  • Demirer et al on the market for LLMs (on openrouter). Multihoming is extremely common, there is little lock-in, and people switch models frequently based on price and latency. Interesting read, with serious implications for the economics of model provision.

  • He and Robin on ridge estimation of FE regression [esp with applications to AKM and other decompositions]. The distribution of FEs (and their covariance) is not a nuisance parameter but an object of interest in many labour applications, and the paper provides a clean framework to think about the estimation of the FE distribution and how it relates to the adjacency graph.

  • Liu, Liu, and Xu monograph on conditional marginal effects [~ special case of a CATE], with a characteristically thorough exposition of modern semiparametric causal inference along the way.

  • Coulombe on OLS as attention - fun way to learn about the linear algebra of OLS as a weighted average of outcomes interpretation, which feeds naturally into nonparametric regression and modern attention mechanisms. See also Cosma Shalizi’s now classic ‘kernel smoothing can do that?’.

code, music

  • llama.cpp probably rediscovering this for the umpteenth time but this is such a phenomenal piece of software. Building it is a bit intimidating, but once you’re done, it runs mid-size models [7-20B] locally with shocking competence. GPT-20b-OSS and qwen3-30b-quantized are great on my 16GB blackwell gpu.

  • opencode is a fantastic open-source coding agent with excellent support for multiple LLM backends, and a clean interface to build custom tools. I even got it to work with llama.cpp locally.

  • Lage trio with John Medeski is sublime. Really looking forward to their new album.