Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Kim, Heegyu; Jeon, Taeyang; Choi, Seungtaek; Hong, Ji-hoon; Jeon, Dong-won; Baek, Ga-yeon; Kwak, Gyeong-won; Lee, Dong-hee; Bae, Jisu; Lee, Chi-hoon; Kim, Yoon-seo; Choi, Seon-Jin; Park, Jin-seong; Cho, Sung-beom; Cho, Hyunsouk

doi:10.1145/3746252.3761359

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Heegyu	-
dc.contributor.author	Jeon, Taeyang	-
dc.contributor.author	Choi, Seungtaek	-
dc.contributor.author	Hong, Ji-hoon	-
dc.contributor.author	Jeon, Dong-won	-
dc.contributor.author	Baek, Ga-yeon	-
dc.contributor.author	Kwak, Gyeong-won	-
dc.contributor.author	Lee, Dong-hee	-
dc.contributor.author	Bae, Jisu	-
dc.contributor.author	Lee, Chi-hoon	-
dc.contributor.author	Kim, Yoon-seo	-
dc.contributor.author	Choi, Seon-Jin	-
dc.contributor.author	Park, Jin-seong	-
dc.contributor.author	Cho, Sung-beom	-
dc.contributor.author	Cho, Hyunsouk	-
dc.date.accessioned	2025-12-18T05:00:32Z	-
dc.date.available	2025-12-18T05:00:32Z	-
dc.date.issued	2025-11	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/209906	-
dc.description.abstract	Materials synthesis remains a critical bottleneck in developing innovations for energy storage, catalysis, electronics, and biomedical devices. Current synthesis design relies heavily on empirical trial-and-error methods guided by expert intuition, limiting the pace of materials discovery. To address this challenge, we present AlchemyBench, a comprehensive benchmark built upon a curated dataset of 17,667 expert-verified synthesis recipes from open-access literature. AlchemyBench provides an end-to-end framework that supports research in large language models (LLMs) applied to materials synthesis prediction. The benchmark encompasses four key tasks: raw materials and equipment prediction, synthesis procedure generation, and characterization outcome forecasting. To enable scalable evaluation, we propose an LLM-as-a-Judge framework that leverages large language models for automated assessment, demonstrating strong agreement with expert evaluations (e.g., Pearson's r = 0.80, Spearman's ρ = 0.78). Our experimental results reveal that reasoning-focused models (Claude 3.7, GPT-4o) achieve scores around 4.0 on well-documented oxide and organic synthesis targets, but performance drops by approximately 0.3 points on electrochemical workflows. Fine-tuning on AlchemyBench data enables a 7B-parameter open-source model to surpass generic baselines trained on 1M samples, while retrieval-augmented generation provides an additional +0.20 improvement when supplied with five high-similarity contexts. AlchemyBench addresses a critical gap in the field by providing the first comprehensive, legally redistributable benchmark for automated materials synthesis prediction. Our contributions establish a foundation for exploring LLM capabilities in predicting and guiding materials synthesis, ultimately accelerating experimental design and innovation in materials science.	-
dc.format.extent	11	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Association for Computing Machinery, Inc	-
dc.title	Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge	-
dc.type	Article	-
dc.identifier.doi	10.1145/3746252.3761359	-
dc.identifier.scopusid	2-s2.0-105023162332	-
dc.identifier.bibliographicCitation	CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management, pp 1302 - 1312	-
dc.citation.title	CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management	-
dc.citation.startPage	1302	-
dc.citation.endPage	1312	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Automation	-
dc.subject.keywordPlus	Biomedical equipment	-
dc.subject.keywordPlus	Digital storage	-
dc.subject.keywordPlus	Forecasting	-
dc.subject.keywordPlus	Interactive computer systems	-
dc.subject.keywordPlus	Large datasets	-
dc.subject.keywordPlus	Open Data	-
dc.subject.keywordPlus	Open systems	-
dc.subject.keywordAuthor	benchmark	-
dc.subject.keywordAuthor	dataset	-
dc.subject.keywordAuthor	human evaluation	-
dc.subject.keywordAuthor	large language model	-
dc.subject.keywordAuthor	llm-as-a-judge	-
dc.subject.keywordAuthor	materials science	-
dc.identifier.url	https://dl.acm.org/doi/10.1145/3746252.3761359	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 신소재공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Seon-Jin photo

Choi, Seon-Jin: COLLEGE OF ENGINEERING (SCHOOL OF MATERIALS SCIENCE AND ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE