Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

AI-based nanotoxicity data extraction and prediction of nanotoxicityopen access

Authors
Ha, EunyongHa, Seung MinGerelkhuu, ZayakhuuKim, Hyun-YiYoon, Tae Hyun
Issue Date
Jan-2025
Publisher
Research Network of Computational and Structural Biotechnology
Keywords
Nanotoxicity; Large Language Models; Data extraction; Prompt engineering; LangChain; Automated machine learning
Citation
Computational and Structural Biotechnology Journal, v.29, pp 138 - 148
Pages
11
Indexed
SCIE
SCOPUS
Journal Title
Computational and Structural Biotechnology Journal
Volume
29
Start Page
138
End Page
148
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209982
DOI
10.1016/j.csbj.2025.03.052
ISSN
2001-0370
2001-0370
Abstract
With the growing use of nanomaterials (NMs), assessing their toxicity has become increasingly important. Among toxicity assessment methods, computational models for predicting nanotoxicity are emerging as alternatives to traditional in vitro and in vivo assays, which involve high costs and ethical concerns. As a result, the qualitative and quantitative importance of data is now widely recognized. However, collecting large, high-quality data is both time-consuming and labor-intensive. Artificial intelligence (AI)-based data extraction techniques hold significant potential for extracting and organizing information from unstructured text. However, the use of large language models (LLMs) and prompt engineering for nanotoxicity data extraction has not been widely studied. In this study, we developed an AI-based automated data extraction pipeline to facilitate efficient data collection. The automation process was implemented using Python-based LangChain. We used 216 nanotoxicity research articles as training data to refine prompts and evaluate LLM performance. Subsequently, the most suitable LLM with refined prompts was used to extract test data, from 605 research articles. As a result, data extraction performance on training data achieved F1D.E. (F1 score for Data Extraction) ranging from 84.6 % to 87.6 % across different LLMs. Furthermore, using the extracted dataset from test set, we constructed automated machine learning (AutoML) models that achieved F1N.P. (F1 score for Nanotoxicity Prediction) exceeding 86.1 % in predicting nanotoxicity. Additionally, we assessed the reliability and applicability of models by comparing them in terms of ground truth, size, and balance. This study highlights the potential of AI-based data extraction, representing a significant contribution to nanotoxicity research.
Files in This Item
Go to Link
Appears in
Collections
서울 자연과학대학 > 서울 화학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Yoon, Tae Hyun photo

Yoon, Tae Hyun
COLLEGE OF NATURAL SCIENCES (DEPARTMENT OF CHEMISTRY)
Read more

Altmetrics

Total Views & Downloads

BROWSE