Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Full metadata record
DC Field Value Language
dc.contributor.authorZhao, Yu-
dc.contributor.authorLee, Joohyun-
dc.contributor.authorChen, Wei-
dc.date.accessioned2021-07-28T08:08:16Z-
dc.date.available2021-07-28T08:08:16Z-
dc.date.issued2021-06-
dc.identifier.issn1673-5447-
dc.identifier.urihttps://scholarworks.bwise.kr/erica/handle/2021.sw.erica/105735-
dc.description.abstractThis paper proposes a Reinforcement learning (RL) algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time. For this purpose, this problem is formulated as an infinite-horizon Constrained Markov Decision Process (CMDP). To handle the constrained optimization problem, we first adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines 6-greedy and Upper Confidence Bound (UCB) algorithms to solve this constrained MDP problem. We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution. Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with epsilon-greedy, R-learning and the Average-payoff RL (ARL) algorithm in terms of the cumulative regret. We also show that our algorithm can learn and adapt to the changes of the environment, so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.-
dc.format.extent12-
dc.language영어-
dc.language.isoENG-
dc.publisherCHINA INST COMMUNICATIONS-
dc.titleQ-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling-
dc.typeArticle-
dc.publisher.location중국-
dc.identifier.doi10.23919/JCC.2021.06.002-
dc.identifier.scopusid2-s2.0-85108095805-
dc.identifier.wosid000662047500002-
dc.identifier.bibliographicCitationCHINA COMMUNICATIONS, v.18, no.6, pp 12 - 23-
dc.citation.titleCHINA COMMUNICATIONS-
dc.citation.volume18-
dc.citation.number6-
dc.citation.startPage12-
dc.citation.endPage23-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaTelecommunications-
dc.relation.journalWebOfScienceCategoryTelecommunications-
dc.subject.keywordPlusTRANSMISSION-
dc.subject.keywordPlusALGORITHMS-
dc.subject.keywordAuthorreinforcement learning for average rewards-
dc.subject.keywordAuthorinfinite-horizon Markov decision process-
dc.subject.keywordAuthorupper confidence bound-
dc.subject.keywordAuthorqueue scheduling-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/9459561-
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF ENGINEERING SCIENCES > SCHOOL OF ELECTRICAL ENGINEERING > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lee, Joo hyun photo

Lee, Joo hyun
ERICA 공학대학 (SCHOOL OF ELECTRICAL ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE