No-regret shannon entropy regularized neural contextual bandit online learning for robotic grasping
- Authors
- Lee, K.; Choy, J.; Choi, Y.; Kee, H.; Oh, S.
- Issue Date
- Oct-2020
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Citation
- IEEE International Conference on Intelligent Robots and Systems, pp 9620 - 9625
- Pages
- 6
- Journal Title
- IEEE International Conference on Intelligent Robots and Systems
- Start Page
- 9620
- End Page
- 9625
- URI
- https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/59359
- DOI
- 10.1109/IROS45743.2020.9341123
- ISSN
- 2153-0858
- Abstract
- In this paper, we propose a novel contextual bandit algorithm that employs a neural network as a reward estimator and utilizes Shannon entropy regularization to encourage exploration, which is called Shannon entropy regularized neural contextual bandits (SERN). In many learning-based algorithms for robotic grasping, the lack of the real-world data hampers the generalization performance of a model and makes it difficult to apply a trained model to real-world problems. To handle this issue, the proposed method utilizes the benefit of an online learning. The proposed method trains a neural network to predict the success probability of a given grasp pose based on a depth image, which is called a grasp quality. We theoretically show that the SERN has a no regret property. We empirically demonstrate that the SERN outperforms ϵ-greedy in terms of sample efficiency.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Software > Department of Artificial Intelligence > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/59359)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.