No. 224: Earning While Learning: How to Run Batched Bandit Experiments

Year: 2026
Type: Working Paper

Abstract

Researchers typically collect experimental data sequentially, allowing early outcome observations and adaptive treatment assignment to reduce exposure to inferior treatments. This article reviews multi-armed-bandit adaptive experimental designs that balance exploration and exploitation. Because adaptively collected experimental data through bandit algorithms violate standard asymptotics, inference is challenging. We implement an estimator that yields valid heteroskedasticity-robust confidence intervals in batched bandit designs and compare coverage in Monte Carlo simulations. We introduce bbandits for Stata, a tool for designing experiments via simulation, running interactive bandit experiments, and implementing and analyzing adaptively collected data. bbandits includes three common assignment algorithms-ε-first, ε-greedy, and Thompson sampling-and supports estimation, inference, and visualization.

 

Participating Institutions

TRR 266‘s main locations are Paderborn University (Coordinating University), HU Berlin, and University of Mannheim. All three locations have been centers for accounting and tax research for many years. They are joined by researchers from LMU Munich, Frankfurt School of Finance and Management, Goethe University Frankfurt, University of Cologne, Leibniz University Hannover and TU Darmstadt who share the same research agenda.

WordPress Cookie Plugin by Real Cookie Banner