Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Full metadata record
DC Field Value Language
dc.contributor.authorKim, Heegyu-
dc.contributor.authorJeon, Taeyang-
dc.contributor.authorChoi, Seungtaek-
dc.contributor.authorHong, Ji-hoon-
dc.contributor.authorJeon, Dong-won-
dc.contributor.authorBaek, Ga-yeon-
dc.contributor.authorKwak, Gyeong-won-
dc.contributor.authorLee, Dong-hee-
dc.contributor.authorBae, Jisu-
dc.contributor.authorLee, Chi-hoon-
dc.contributor.authorKim, Yoon-seo-
dc.contributor.authorChoi, Seon-Jin-
dc.contributor.authorPark, Jin-seong-
dc.contributor.authorCho, Sung-beom-
dc.contributor.authorCho, Hyunsouk-
dc.date.accessioned2025-12-18T05:00:32Z-
dc.date.available2025-12-18T05:00:32Z-
dc.date.issued2025-11-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209906-
dc.description.abstractMaterials synthesis remains a critical bottleneck in developing innovations for energy storage, catalysis, electronics, and biomedical devices. Current synthesis design relies heavily on empirical trial-and-error methods guided by expert intuition, limiting the pace of materials discovery. To address this challenge, we present AlchemyBench, a comprehensive benchmark built upon a curated dataset of 17,667 expert-verified synthesis recipes from open-access literature. AlchemyBench provides an end-to-end framework that supports research in large language models (LLMs) applied to materials synthesis prediction. The benchmark encompasses four key tasks: raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. To enable scalable evaluation, we propose an LLM-as-a-Judge framework that leverages large language models for automated assessment, demonstrating strong agreement with expert evaluations (e.g., Pearson's r = 0.80, Spearman's ρ = 0.78). Our experimental results reveal that reasoning-focused models (Claude 3.7, GPT-4o) achieve scores around 4.0 on well-documented oxide and organic synthesis targets, but performance drops by approximately 0.3 points on electrochemical workflows. Fine-tuning on AlchemyBench data enables a 7B-parameter open-source model to surpass generic baselines trained on 1M samples, while retrieval-augmented generation provides an additional +0.20 improvement when supplied with five high-similarity contexts. AlchemyBench addresses a critical gap in the field by providing the first comprehensive, legally redistributable benchmark for automated materials synthesis prediction. Our contributions establish a foundation for exploring LLM capabilities in predicting and guiding materials synthesis, ultimately accelerating experimental design and innovation in materials science.-
dc.format.extent11-
dc.language영어-
dc.language.isoENG-
dc.publisherAssociation for Computing Machinery, Inc-
dc.titleTowards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge-
dc.typeArticle-
dc.identifier.doi10.1145/3746252.3761359-
dc.identifier.scopusid2-s2.0-105023162332-
dc.identifier.bibliographicCitationCIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pp 1302 - 1312-
dc.citation.titleCIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management-
dc.citation.startPage1302-
dc.citation.endPage1312-
dc.type.docTypeConference paper-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusAutomation-
dc.subject.keywordPlusBiomedical equipment-
dc.subject.keywordPlusDigital storage-
dc.subject.keywordPlusForecasting-
dc.subject.keywordPlusInteractive computer systems-
dc.subject.keywordPlusLarge datasets-
dc.subject.keywordPlusOpen Data-
dc.subject.keywordPlusOpen systems-
dc.subject.keywordAuthorbenchmark-
dc.subject.keywordAuthordataset-
dc.subject.keywordAuthorhuman evaluation-
dc.subject.keywordAuthorlarge language model-
dc.subject.keywordAuthorllm-as-a-judge-
dc.subject.keywordAuthormaterials science-
dc.identifier.urlhttps://dl.acm.org/doi/10.1145/3746252.3761359-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 신소재공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Seon-Jin photo

Choi, Seon-Jin
COLLEGE OF ENGINEERING (SCHOOL OF MATERIALS SCIENCE AND ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE