Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Responseopen access

Authors
Ji, JunyungKim, JiwooKim, Younghoon
Issue Date
Oct-2024
Publisher
Multidisciplinary Digital Publishing Institute (MDPI)
Keywords
survey data; item non-response; large language models; prompt engineering
Citation
Future Internet, v.16, no.10, pp 1 - 19
Pages
19
Indexed
SCOPUS
ESCI
Journal Title
Future Internet
Volume
16
Number
10
Start Page
1
End Page
19
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/121275
DOI
10.3390/fi16100351
ISSN
1999-5903
1999-5903
Abstract
Survey data play a crucial role in various research fields, including economics, education, and healthcare, by providing insights into human behavior and opinions. However, item non-response, where respondents fail to answer specific questions, presents a significant challenge by creating incomplete datasets that undermine data integrity and can hinder or even prevent accurate analysis. Traditional methods for addressing missing data, such as statistical imputation techniques and deep learning models, often fall short when dealing with the rich linguistic content of survey data. These approaches are also hampered by high time complexity for training and the need for extensive preprocessing or feature selection. In this paper, we introduce an approach that leverages Large Language Models (LLMs) through prompt engineering for predicting item non-responses in survey data. Our method combines the strengths of both traditional imputation techniques and deep learning methods with the advanced linguistic understanding of LLMs. By integrating respondent similarities, question relevance, and linguistic semantics, our approach enhances the accuracy and comprehensiveness of survey data analysis. The proposed method bypasses the need for complex preprocessing and additional training, making it adaptable, scalable, and capable of generating explainable predictions in natural language. We evaluated the effectiveness of our LLM-based approach through a series of experiments, demonstrating its competitive performance against established methods such as Multivariate Imputation by Chained Equations (MICE), MissForest, and deep learning models like TabTransformer. The results show that our approach not only matches but, in some cases, exceeds the performance of these methods while significantly reducing the time required for data processing.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young hoon photo

Kim, Young hoon
ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE