A Natural language processing based machine learning approach on building material eco-label databases wrangling
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Pham, Duy Hoang | - |
dc.contributor.author | Park,Sojin | - |
dc.contributor.author | Ahn,Yonghan | - |
dc.date.accessioned | 2024-11-05T07:00:18Z | - |
dc.date.available | 2024-11-05T07:00:18Z | - |
dc.date.issued | 2024-09 | - |
dc.identifier.issn | 2093-761X | - |
dc.identifier.issn | 2093-7628 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/120762 | - |
dc.description.abstract | In recent years, databases promoting eco-labeled building materials (EBM) have received increasing attention. However, extracting valuable values from vast and disparate EBM databases remains a challenge due to inconsistencies in data formats, terminology, and organization. This research proposes a natural language processing (NLP) based machine learning (ML) approach to streamline the wrangling of EBM databases. An investigation of EBM databases, developing web-scraping and web-crawling to collect data from these databases, resulting in a refined dataset of 64,350 data points. The study then leverages NLP techniques and ML algorithms to standardize terminology, resolve inconsistencies, and integrate diverse EBM databases into a cohesive database. The Random Forest algorithm consistently emerged as a top-performing classifier, achieving high AUC scores in models such as “PBTs”, “Crate to Gate”, and the “UL GREENGUARD label”. For many ecolabels, the RF algorithm consistently delivered commendable performance, exemplified by its F1-scores for attributes like “PBTs” (94.19% in cross-validation, 94.72% in testing) and “C2Gate” (92.78% in cross-validation, 93.25% in testing). This structured representation facilitates efficient querying and analysis, enabling stakeholders to make informed decisions about EBM selection and utilization. By automating the labor-intensive process of EBM data wrangling, our research contributes to the advancement of sustainable construction practices and the broader goal of environmental stewardship in the built environment. © International Journal of Sustainable Building Technology and Urban Development. | - |
dc.format.extent | 14 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Sustainable Building Research Center | - |
dc.title | A Natural language processing based machine learning approach on building material eco-label databases wrangling | - |
dc.type | Article | - |
dc.publisher.location | 영국 | - |
dc.identifier.doi | 10.22712/susb.20240026 | - |
dc.identifier.scopusid | 2-s2.0-85207032591 | - |
dc.identifier.bibliographicCitation | International Journal of Sustainable Building Technology and Urban Development, v.15, no.3, pp 367 - 380 | - |
dc.citation.title | International Journal of Sustainable Building Technology and Urban Development | - |
dc.citation.volume | 15 | - |
dc.citation.number | 3 | - |
dc.citation.startPage | 367 | - |
dc.citation.endPage | 380 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordAuthor | data wrangling | - |
dc.subject.keywordAuthor | green building materials | - |
dc.subject.keywordAuthor | machine learning | - |
dc.subject.keywordAuthor | natural language processing | - |
dc.subject.keywordAuthor | text classification | - |
dc.identifier.url | https://www.sbt-durabi.org/articles/article/v8AV/#Information | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.