Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Developing an automated framework for eco-label information categorization using web crawling and Natural Language Processing techniques

Authors
Nguyen, Ho Anh ThuPham, Duy HoangKim, ByeolAhn, YonghanKwon, Nahyun
Issue Date
Jul-2025
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Keywords
Green building material; Eco-label; Information management; Machine learning; Natural language processing
Citation
EXPERT SYSTEMS WITH APPLICATIONS, v.282, pp 1 - 24
Pages
24
Indexed
SCIE
SCOPUS
Journal Title
EXPERT SYSTEMS WITH APPLICATIONS
Volume
282
Start Page
1
End Page
24
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/125248
DOI
10.1016/j.eswa.2025.127688
ISSN
0957-4174
1873-6793
Abstract
Eco-labels are extensively employed to assess the environmental performance of building materials. However, their management is often fragmented across disparate online databases with inconsistent data structures, presenting significant challenges for efficient information acquisition and management. This study explores the application of web crawling techniques, Natural Language Processing (NLP), and machine learning (ML) models to collect and categorize eco-label information, with the objective of advancing the automation of information management processes. The results demonstrate that the categorization models exhibit high performance, achieving F1-scores exceeding 0.95 on the test set and at least 0.76 when validating datasets incorporating temporally updated information. However, the limited availability of data for certain eco-labels, such as Forest Stewardship Council certification and Green Screen, substantially degrades model performance with updated data. Notably, traditional ML models leveraging manual feature engineering outperform deep learning models with automatic feature extraction when applied to web-crawled data. Furthermore, the TF-IDF feature extraction technique surpasses other n-gram-based approaches, with model performance declining as n-gram length increases. This study establishes a systematic framework that informs the selection of reliable data sources, feature engineering strategies, and ML algorithms for integrating web crawling, thereby enhancing the automation of eco-label information management.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF ENGINEERING SCIENCES > MAJOR IN ARCHITECTURAL ENGINEERING > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Ahn, Yong Han photo

Ahn, Yong Han
ERICA 공학대학 (MAJOR IN ARCHITECTURAL ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE