Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

A Natural language processing based machine learning approach on building material eco-label databases wrangling

Authors
Pham, Duy HoangPark,SojinAhn,Yonghan
Issue Date
Sep-2024
Publisher
Sustainable Building Research Center
Keywords
data wrangling; green building materials; machine learning; natural language processing; text classification
Citation
International Journal of Sustainable Building Technology and Urban Development, v.15, no.3, pp 367 - 380
Pages
14
Indexed
SCOPUS
Journal Title
International Journal of Sustainable Building Technology and Urban Development
Volume
15
Number
3
Start Page
367
End Page
380
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/120762
DOI
10.22712/susb.20240026
ISSN
2093-761X
2093-7628
Abstract
In recent years, databases promoting eco-labeled building materials (EBM) have received increasing attention. However, extracting valuable values from vast and disparate EBM databases remains a challenge due to inconsistencies in data formats, terminology, and organization. This research proposes a natural language processing (NLP) based machine learning (ML) approach to streamline the wrangling of EBM databases. An investigation of EBM databases, developing web-scraping and web-crawling to collect data from these databases, resulting in a refined dataset of 64,350 data points. The study then leverages NLP techniques and ML algorithms to standardize terminology, resolve inconsistencies, and integrate diverse EBM databases into a cohesive database. The Random Forest algorithm consistently emerged as a top-performing classifier, achieving high AUC scores in models such as “PBTs”, “Crate to Gate”, and the “UL GREENGUARD label”. For many ecolabels, the RF algorithm consistently delivered commendable performance, exemplified by its F1-scores for attributes like “PBTs” (94.19% in cross-validation, 94.72% in testing) and “C2Gate” (92.78% in cross-validation, 93.25% in testing). This structured representation facilitates efficient querying and analysis, enabling stakeholders to make informed decisions about EBM selection and utilization. By automating the labor-intensive process of EBM data wrangling, our research contributes to the advancement of sustainable construction practices and the broader goal of environmental stewardship in the built environment. © International Journal of Sustainable Building Technology and Urban Development.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF ENGINEERING SCIENCES > MAJOR IN ARCHITECTURAL ENGINEERING > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Ahn, Yong Han photo

Ahn, Yong Han
ERICA 공학대학 (MAJOR IN ARCHITECTURAL ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE