A Change Detection Based Framework for Piecewise Stationary Multi Armed Bandit Problem
- Authors
- Liu, Fang; Lee, Joohyun Send mail to Lee J.; Shroff, Ness
- Issue Date
- Feb-2018
- Publisher
- AAAI press
- Citation
- Proceedings of the AAAI Conference on Artificial Intelligence, pp 3651 - 3658
- Pages
- 8
- Indexed
- SCOPUS
- Journal Title
- Proceedings of the AAAI Conference on Artificial Intelligence
- Start Page
- 3651
- End Page
- 3658
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/114375
- ISSN
- 2159-5399
2374-3468
- Abstract
- The multi-armed bandit problem has been extensively studied under the stationary assumption. However in reality, this assumption often does not hold because the distributions of rewards themselves may change over time. In this paper, we propose a change-detection (CD) based framework for multiarmed bandit problems under the piecewise-stationary setting, and study a class of change-detection based UCB (Upper Confidence Bound) policies, CD-UCB, that actively detects change points and restarts the UCB indices. We then develop CUSUM-UCB and PHT-UCB, that belong to the CD-UCB class and use cumulative sum (CUSUM) and Page-Hinkley Test (PHT) to detect changes. We show that CUSUM-UCB obtains the best known regret upper bound under mild assumptions. We also demonstrate the regret reduction of the CD-UCB policies over arbitrary Bernoulli rewards and Yahoo! datasets of webpage click-through rates. Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF ENGINEERING SCIENCES > SCHOOL OF ELECTRICAL ENGINEERING > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.