MICRO/THEORY: Optimal Allocation Strategies in a Discrete-Time Two-Armed Bandit Problem; Professor Audrey Hu (City University of Hong Kong)
Abstract
This study addresses a two-armed bandit problem involving a "safe" and a "risky" arm across a countable number of periods. The agent, with one time unit per period, strategically allocates time between these two arms aiming at achieving a "breakthrough." The risky arm's type is unknown, which can be "good" or "bad," and breakthrough depends on proving it to be good. Breakthrough probability is an exponential function of the allocated time, given the risky arm is good. Departing from the "either-or" binary choices in previous studies, we explore smooth allocation strategies in the [0,1] range. Our analytical solution reveals that the optimal allocation plan significantly differs from binary strategies, and stopping after any finite periods of unsuccessful trials is suboptimal. A technical contribution of this study lies in a problem transformation that enhances tractability, going beyond the standard Bellman-equation approach for bandit problems.