From Language to Grasp: Object Retrieval and Grasping Through Explicit and Implicit Linguistic Commands
- Authors
- Yoon, Dongmin; Cha, Seonghun; Oh, Yoonseon
- Issue Date
- Oct-2024
- Keywords
- grasp detection; multi-modal learning; Robotic object retrieval
- Citation
- International Conference on Control, Automation and Systems, pp 1565 - 1566
- Pages
- 2
- Indexed
- SCOPUS
- Journal Title
- International Conference on Control, Automation and Systems
- Start Page
- 1565
- End Page
- 1566
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206472
- DOI
- 10.23919/ICCAS63016.2024.10773029
- ISSN
- 1598-7833
- Abstract
- In human-centered environments, assistive robots are required to understand verbal commands to retrieve and grasp objects within complex scenes. We propose a novel Language Understanding Object Retrieval module (LUOR) by fine-tuning the CLIP text encoder to enhance robot manipulators' understanding of both explicit and implicit natural language commands. A new dataset with 712 verb-object pairs is created for training. This dataset includes 78 verbs associated with 244 ImageNet classes, providing a comprehensive range of scenarios. Additionally, 336 verb-object pairs cover 54 verbs for 138 ObjectNet classes, further expanding the model's applicability. Experimental results demonstrate that LUOR outperforms existing baselines in both accuracy and efficiency, particularly in handling implicit commands. The integrated system with the Multi-Task Detection module (MTD) shows strong performance in real-world robotic applications using a Panda Franka manipulator. These findings confirm the practical applicability of our approach and suggest potential for further improvements in robotic grasping and manipulation tasks.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.