A Natural language processing based machine learning approach on building material eco-label databases wrangling
- Authors
- Pham, Duy Hoang; Park,Sojin; Ahn,Yonghan
- Issue Date
- Sep-2024
- Publisher
- Sustainable Building Research Center
- Keywords
- data wrangling; green building materials; machine learning; natural language processing; text classification
- Citation
- International Journal of Sustainable Building Technology and Urban Development, v.15, no.3, pp 367 - 380
- Pages
- 14
- Indexed
- SCOPUS
- Journal Title
- International Journal of Sustainable Building Technology and Urban Development
- Volume
- 15
- Number
- 3
- Start Page
- 367
- End Page
- 380
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/120762
- DOI
- 10.22712/susb.20240026
- ISSN
- 2093-761X
2093-7628
- Abstract
- In recent years, databases promoting eco-labeled building materials (EBM) have received increasing attention. However, extracting valuable values from vast and disparate EBM databases remains a challenge due to inconsistencies in data formats, terminology, and organization. This research proposes a natural language processing (NLP) based machine learning (ML) approach to streamline the wrangling of EBM databases. An investigation of EBM databases, developing web-scraping and web-crawling to collect data from these databases, resulting in a refined dataset of 64,350 data points. The study then leverages NLP techniques and ML algorithms to standardize terminology, resolve inconsistencies, and integrate diverse EBM databases into a cohesive database. The Random Forest algorithm consistently emerged as a top-performing classifier, achieving high AUC scores in models such as “PBTs”, “Crate to Gate”, and the “UL GREENGUARD label”. For many ecolabels, the RF algorithm consistently delivered commendable performance, exemplified by its F1-scores for attributes like “PBTs” (94.19% in cross-validation, 94.72% in testing) and “C2Gate” (92.78% in cross-validation, 93.25% in testing). This structured representation facilitates efficient querying and analysis, enabling stakeholders to make informed decisions about EBM selection and utilization. By automating the labor-intensive process of EBM data wrangling, our research contributes to the advancement of sustainable construction practices and the broader goal of environmental stewardship in the built environment. © International Journal of Sustainable Building Technology and Urban Development.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF ENGINEERING SCIENCES > MAJOR IN ARCHITECTURAL ENGINEERING > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.