Developing an automated framework for eco-label information categorization using web crawling and Natural Language Processing techniques
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Nguyen, Ho Anh Thu | - |
dc.contributor.author | Pham, Duy Hoang | - |
dc.contributor.author | Kim, Byeol | - |
dc.contributor.author | Ahn, Yonghan | - |
dc.contributor.author | Kwon, Nahyun | - |
dc.date.accessioned | 2025-05-16T08:00:35Z | - |
dc.date.available | 2025-05-16T08:00:35Z | - |
dc.date.issued | 2025-07 | - |
dc.identifier.issn | 0957-4174 | - |
dc.identifier.issn | 1873-6793 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/125248 | - |
dc.description.abstract | Eco-labels are extensively employed to assess the environmental performance of building materials. However, their management is often fragmented across disparate online databases with inconsistent data structures, presenting significant challenges for efficient information acquisition and management. This study explores the application of web crawling techniques, Natural Language Processing (NLP), and machine learning (ML) models to collect and categorize eco-label information, with the objective of advancing the automation of information management processes. The results demonstrate that the categorization models exhibit high performance, achieving F1-scores exceeding 0.95 on the test set and at least 0.76 when validating datasets incorporating temporally updated information. However, the limited availability of data for certain eco-labels, such as Forest Stewardship Council certification and Green Screen, substantially degrades model performance with updated data. Notably, traditional ML models leveraging manual feature engineering outperform deep learning models with automatic feature extraction when applied to web-crawled data. Furthermore, the TF-IDF feature extraction technique surpasses other n-gram-based approaches, with model performance declining as n-gram length increases. This study establishes a systematic framework that informs the selection of reliable data sources, feature engineering strategies, and ML algorithms for integrating web crawling, thereby enhancing the automation of eco-label information management. | - |
dc.format.extent | 24 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | PERGAMON-ELSEVIER SCIENCE LTD | - |
dc.title | Developing an automated framework for eco-label information categorization using web crawling and Natural Language Processing techniques | - |
dc.type | Article | - |
dc.publisher.location | 영국 | - |
dc.identifier.doi | 10.1016/j.eswa.2025.127688 | - |
dc.identifier.scopusid | 2-s2.0-105003373949 | - |
dc.identifier.wosid | 001481749600001 | - |
dc.identifier.bibliographicCitation | EXPERT SYSTEMS WITH APPLICATIONS, v.282, pp 1 - 24 | - |
dc.citation.title | EXPERT SYSTEMS WITH APPLICATIONS | - |
dc.citation.volume | 282 | - |
dc.citation.startPage | 1 | - |
dc.citation.endPage | 24 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Operations Research & Management Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Operations Research & Management Science | - |
dc.subject.keywordPlus | BUILDING-MATERIALS | - |
dc.subject.keywordPlus | SOCIAL MEDIA | - |
dc.subject.keywordPlus | BIM | - |
dc.subject.keywordPlus | CLASSIFICATION | - |
dc.subject.keywordPlus | INTEGRATION | - |
dc.subject.keywordPlus | MANAGEMENT | - |
dc.subject.keywordPlus | ENERGY | - |
dc.subject.keywordAuthor | Green building material | - |
dc.subject.keywordAuthor | Eco-label | - |
dc.subject.keywordAuthor | Information management | - |
dc.subject.keywordAuthor | Machine learning | - |
dc.subject.keywordAuthor | Natural language processing | - |
dc.identifier.url | https://www.sciencedirect.com/science/article/pii/S0957417425013107?pes=vor&utm_source=scopus&getft_integrator=scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.