GNN-Transformer Task Planning Enhanced with Semantic-Driven Data Augmentation

Jeong, Soojin; Byeon, Seongwan; Kim, Sangwoo; Kwon, HyeokJun; Oh, Yoonseon

doi:10.1609/aaai.v39i14.33598

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

GNN-Transformer Task Planning Enhanced with Semantic-Driven Data Augmentation

Authors: Jeong, Soojin; Byeon, Seongwan; Kim, Sangwoo; Kwon, HyeokJun; Oh, Yoonseon

Issue Date: Apr-2025

Publisher: Association for the Advancement of Artificial Intelligence

Citation: Proceedings of the AAAI Conference on Artificial Intelligence, v.39, no.14, pp 14585 - 14593

Pages: 9

Indexed: SCOPUS

Journal Title: Proceedings of the AAAI Conference on Artificial Intelligence

Volume: 39

Number: 14

Start Page: 14585

End Page: 14593

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/207430

DOI: 10.1609/aaai.v39i14.33598

ISSN: 2159-5399
2374-3468

Abstract: Natural language is the most intuitive means for humans to interact with robots, making task planning based on natural language commands a longstanding area of research. Large language models (LLMs) have significantly improved task planning by enhancing understanding of language and common sense. However, current methods still face several challenges: they lack a deep understanding of physical environments, their performance relies heavily on prompt examples, LLMs are oversized and not customized for specific tasks, and the planning costs remain high. To overcome these issues, we introduce the GNN-Transformer Task Planner (GTTP), designed to predict task-level actions by leveraging the semantic environment and incorporating historical state data. The GTTP architecture is scalable through the use of GNN layers, while transformer layers facilitate understanding task progression. In addition, our model uses a text encoder to embed environments, allowing it to be trained on simulated datasets and applied directly in real-world scenarios. We also propose an automated data generation method that includes semantic augmentation, planning verification, and instruction generation via LLM. This method enables the collection of 14k instruction-annotated tasks in the VirtualHome environment with minimal human effort. The model has been validated across diverse scenes containing up to 715 objects, achieving significantly higher success rates compared to baseline models. It has also been successfully deployed on a physical mobile manipulator, demonstrating its practical applicability and effectiveness.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher oh, yoonseon photo

oh, yoonseon: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE