LUOR: A Framework for Language Understanding in Object Retrieval and Grasping

Yoon, Dongmin; Cha, Seonghun; Oh, Yoonseon

doi:10.1007/s12555-024-0527-7

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

LUOR: A Framework for Language Understanding in Object Retrieval and Grasping

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yoon, Dongmin	-
dc.contributor.author	Cha, Seonghun	-
dc.contributor.author	Oh, Yoonseon	-
dc.date.accessioned	2025-03-05T07:30:13Z	-
dc.date.available	2025-03-05T07:30:13Z	-
dc.date.issued	2025-02	-
dc.identifier.issn	1598-6446	-
dc.identifier.issn	2005-4092	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206683	-
dc.description.abstract	In human-centered environments, assistive robots are required to understand verbal commands to retrieve and grasp objects within complex scenes. Previous research on natural language object retrieval tasks has mainly focused on commands explicitly mentioning an object's name. However, in real-world environments, responding to implicit commands based on an object's function is also essential. To address this problem, we propose a new dataset consisting of 712 verb-object pairs containing 78 verbs for 244 ImageNet classes and 336 verb-object pairs covering 54 verbs for 138 ObjectNet classes. Utilizing this dataset, we propose a novel language understanding object retrieval (LUOR) module by fine-tuning the CLIP text encoder. This approach enables effective learning for the downstream task of object retrieval while preserving the object classification performance. Additionally, we integrate LUOR with a YOLOv3-based multi-task detection (MTD) module for simultaneous object and grasp pose detection. This integration enables the robot manipulator to accurately grasp objects based on verbal commands in complex environments containing multiple objects. Our results demonstrate that LUOR outperforms CLIP in both explicit and implicit retrieval tasks while preserving object classification accuracy for both the ImageNet and ObjectNet datasets. Also, the real-world applicability of the integrated system is demonstrated through experiments with the Franka Panda manipulator.	-
dc.format.extent	11	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	제어·로봇·시스템학회	-
dc.title	LUOR: A Framework for Language Understanding in Object Retrieval and Grasping	-
dc.type	Article	-
dc.publisher.location	대한민국	-
dc.identifier.doi	10.1007/s12555-024-0527-7	-
dc.identifier.scopusid	2-s2.0-85218338733	-
dc.identifier.wosid	001415358700009	-
dc.identifier.bibliographicCitation	International Journal of Control, Automation, and Systems, v.23, no.2, pp 530 - 540	-
dc.citation.title	International Journal of Control, Automation, and Systems	-
dc.citation.volume	23	-
dc.citation.number	2	-
dc.citation.startPage	530	-
dc.citation.endPage	540	-
dc.type.docType	Article	-
dc.identifier.kciid	ART003171801	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.description.journalRegisteredClass	kci	-
dc.relation.journalResearchArea	Automation & Control Systems	-
dc.relation.journalWebOfScienceCategory	Automation & Control Systems	-
dc.subject.keywordPlus	Adversarial machine learning	-
dc.subject.keywordPlus	Linguistics	-
dc.subject.keywordPlus	Modular robots	-
dc.subject.keywordPlus	Multi-task learning	-
dc.subject.keywordPlus	Object detection	-
dc.subject.keywordPlus	Object recognition	-
dc.subject.keywordPlus	Problem oriented languages	-
dc.subject.keywordPlus	Robot applications	-
dc.subject.keywordPlus	Robot learning	-
dc.subject.keywordAuthor	Grasp detection	-
dc.subject.keywordAuthor	multi-modal learning	-
dc.subject.keywordAuthor	robotic object retrieval	-
dc.identifier.url	https://link.springer.com/article/10.1007/s12555-024-0527-7	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher oh, yoonseon photo

oh, yoonseon: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE