Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Zhao, Yu; Lee, Joohyun; Chen, Wei

doi:10.23919/JCC.2021.06.002

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhao, Yu	-
dc.contributor.author	Lee, Joohyun	-
dc.contributor.author	Chen, Wei	-
dc.date.accessioned	2021-07-28T08:08:16Z	-
dc.date.available	2021-07-28T08:08:16Z	-
dc.date.issued	2021-06	-
dc.identifier.issn	1673-5447	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/105735	-
dc.description.abstract	This paper proposes a Reinforcement learning (RL) algorithm to find an optimal scheduling policy to minimize the delay for a given energy constraint in communication system where the environments such as traffic arrival rates are not known in advance and can change over time. For this purpose, this problem is formulated as an infinite-horizon Constrained Markov Decision Process (CMDP). To handle the constrained optimization problem, we first adopt the Lagrangian relaxation technique to solve it. Then, we propose a variant of Q-learning, Q-greedyUCB that combines 6-greedy and Upper Confidence Bound (UCB) algorithms to solve this constrained MDP problem. We mathematically prove that the Q-greedyUCB algorithm converges to an optimal solution. Simulation results also show that Q-greedyUCB finds an optimal scheduling strategy, and is more efficient than Q-learning with epsilon-greedy, R-learning and the Average-payoff RL (ARL) algorithm in terms of the cumulative regret. We also show that our algorithm can learn and adapt to the changes of the environment, so as to obtain an optimal scheduling strategy under a given power constraint for the new environment.	-
dc.format.extent	12	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	CHINA INST COMMUNICATIONS	-
dc.title	Q-greedyUCB: a New Exploration Policy to Learn Resource-Efficient Scheduling	-
dc.type	Article	-
dc.publisher.location	중국	-
dc.identifier.doi	10.23919/JCC.2021.06.002	-
dc.identifier.scopusid	2-s2.0-85108095805	-
dc.identifier.wosid	000662047500002	-
dc.identifier.bibliographicCitation	CHINA COMMUNICATIONS, v.18, no.6, pp 12 - 23	-
dc.citation.title	CHINA COMMUNICATIONS	-
dc.citation.volume	18	-
dc.citation.number	6	-
dc.citation.startPage	12	-
dc.citation.endPage	23	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordPlus	TRANSMISSION	-
dc.subject.keywordPlus	ALGORITHMS	-
dc.subject.keywordAuthor	reinforcement learning for average rewards	-
dc.subject.keywordAuthor	infinite-horizon Markov decision process	-
dc.subject.keywordAuthor	upper confidence bound	-
dc.subject.keywordAuthor	queue scheduling	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9459561	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF ENGINEERING SCIENCES > SCHOOL OF ELECTRICAL ENGINEERING > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Lee, Joo hyun photo

Lee, Joo hyun: ERICA 공학대학 (SCHOOL OF ELECTRICAL ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE