Jie Bian, Vincent Y. F. Tan
This paper presents a novel algorithm for best feasible arm identification in linear bandits with a fixed budget, achieving optimal error decay rates using a posterior sampling framework.
In this study, researchers tackle the problem of identifying the best option (or 'arm') from a set of choices when there is a limited amount of resources to make this decision. They focus on a specific type of problem known as linear bandits, where each choice has a linear relationship with the outcome, and uncertainty is modeled with Gaussian noise. The team developed a new algorithm that not only reduces the probability of making an incorrect choice at an exponential rate but also matches the best possible rate theoretically. Their method builds on a statistical approach called Thompson sampling and has been tested to show it performs better than existing methods in both accuracy and efficiency.