Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

Lee, Haanvid; Lee, Jongmin; Choi, Yunseon; Jeon, Wonseok; Lee, Byung-Jun; Noh, Yung-Kyun; Kim, Kee-Eung

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Haanvid	-
dc.contributor.author	Lee, Jongmin	-
dc.contributor.author	Choi, Yunseon	-
dc.contributor.author	Jeon, Wonseok	-
dc.contributor.author	Lee, Byung-Jun	-
dc.contributor.author	Noh, Yung-Kyun	-
dc.contributor.author	Kim, Kee-Eung	-
dc.date.accessioned	2024-12-04T08:30:18Z	-
dc.date.available	2024-12-04T08:30:18Z	-
dc.date.issued	2022-11	-
dc.identifier.issn	1049-5258	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/199892	-
dc.description.abstract	We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth selection, our work takes a step further being capable of vector action spaces and metric optimization. We show that our estimator is consistent, and significantly reduces the MSE compared to baseline OPE methods through experiments on various domains.	-
dc.format.extent	13	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.wosid	001213811605018	-
dc.identifier.bibliographicCitation	Advances in Neural Information Processing Systems, pp 1 - 13	-
dc.citation.title	Advances in Neural Information Processing Systems	-
dc.citation.startPage	1	-
dc.citation.endPage	13	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.identifier.url	https://proceedings.neurips.cc/paper_files/paper/2022/hash/18fee39e2666f43cf44425138bae9def-Abstract-Conference.html	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Noh, Yung Kyun photo

Noh, Yung Kyun: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE