Kongchang Zhou, Tingyu Zhang, Wei Chen, Fang Kong
The paper introduces a hybrid CMAB-T framework that combines offline data with online interaction to improve learning in multi-armed bandit problems, outperforming purely online or offline methods.
This research addresses the challenges of learning in environments where decisions involve multiple interconnected choices, known as combinatorial multi-armed bandits. Traditionally, learning in such settings has been done either through direct interaction with the environment or by analyzing pre-existing data. Each method has its downsides; direct interaction is costly and slow, while relying on existing data can lead to biased results. The authors propose a new approach that combines both methods, using existing data to guide decisions and direct interaction to fill in gaps. This hybrid approach is shown to be more effective than using either method alone.