Cited 0 time in
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Heegyu | - |
| dc.contributor.author | Jeon, Taeyang | - |
| dc.contributor.author | Choi, Seungtaek | - |
| dc.contributor.author | Hong, Ji-hoon | - |
| dc.contributor.author | Jeon, Dong-won | - |
| dc.contributor.author | Baek, Ga-yeon | - |
| dc.contributor.author | Kwak, Gyeong-won | - |
| dc.contributor.author | Lee, Dong-hee | - |
| dc.contributor.author | Bae, Jisu | - |
| dc.contributor.author | Lee, Chi-hoon | - |
| dc.contributor.author | Kim, Yoon-seo | - |
| dc.contributor.author | Choi, Seon-Jin | - |
| dc.contributor.author | Park, Jin-seong | - |
| dc.contributor.author | Cho, Sung-beom | - |
| dc.contributor.author | Cho, Hyunsouk | - |
| dc.date.accessioned | 2025-12-18T05:00:32Z | - |
| dc.date.available | 2025-12-18T05:00:32Z | - |
| dc.date.issued | 2025-11 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209906 | - |
| dc.description.abstract | Materials synthesis remains a critical bottleneck in developing innovations for energy storage, catalysis, electronics, and biomedical devices. Current synthesis design relies heavily on empirical trial-and-error methods guided by expert intuition, limiting the pace of materials discovery. To address this challenge, we present AlchemyBench, a comprehensive benchmark built upon a curated dataset of 17,667 expert-verified synthesis recipes from open-access literature. AlchemyBench provides an end-to-end framework that supports research in large language models (LLMs) applied to materials synthesis prediction. The benchmark encompasses four key tasks: raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. To enable scalable evaluation, we propose an LLM-as-a-Judge framework that leverages large language models for automated assessment, demonstrating strong agreement with expert evaluations (e.g., Pearson's r = 0.80, Spearman's ρ = 0.78). Our experimental results reveal that reasoning-focused models (Claude 3.7, GPT-4o) achieve scores around 4.0 on well-documented oxide and organic synthesis targets, but performance drops by approximately 0.3 points on electrochemical workflows. Fine-tuning on AlchemyBench data enables a 7B-parameter open-source model to surpass generic baselines trained on 1M samples, while retrieval-augmented generation provides an additional +0.20 improvement when supplied with five high-similarity contexts. AlchemyBench addresses a critical gap in the field by providing the first comprehensive, legally redistributable benchmark for automated materials synthesis prediction. Our contributions establish a foundation for exploring LLM capabilities in predicting and guiding materials synthesis, ultimately accelerating experimental design and innovation in materials science. | - |
| dc.format.extent | 11 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Association for Computing Machinery, Inc | - |
| dc.title | Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1145/3746252.3761359 | - |
| dc.identifier.scopusid | 2-s2.0-105023162332 | - |
| dc.identifier.bibliographicCitation | CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pp 1302 - 1312 | - |
| dc.citation.title | CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management | - |
| dc.citation.startPage | 1302 | - |
| dc.citation.endPage | 1312 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Automation | - |
| dc.subject.keywordPlus | Biomedical equipment | - |
| dc.subject.keywordPlus | Digital storage | - |
| dc.subject.keywordPlus | Forecasting | - |
| dc.subject.keywordPlus | Interactive computer systems | - |
| dc.subject.keywordPlus | Large datasets | - |
| dc.subject.keywordPlus | Open Data | - |
| dc.subject.keywordPlus | Open systems | - |
| dc.subject.keywordAuthor | benchmark | - |
| dc.subject.keywordAuthor | dataset | - |
| dc.subject.keywordAuthor | human evaluation | - |
| dc.subject.keywordAuthor | large language model | - |
| dc.subject.keywordAuthor | llm-as-a-judge | - |
| dc.subject.keywordAuthor | materials science | - |
| dc.identifier.url | https://dl.acm.org/doi/10.1145/3746252.3761359 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
