Hardware-software Co-design for Vector Similarity Search on HBM-PIM
- Authors
- Kim, Nahyeon; Kim, Sujin; Jung, Min; Noh, Haechannuri; Kim, Ji-Hoon
- Issue Date
- Dec-2025
- Publisher
- IEEK PUBLICATION CENTER
- Keywords
- Processing-in-memory (PIM); retrieval-augmented generation (RAG); vector similarity search; distance computation; instruction set extension; hardware-software co-design; PIM simulator
- Citation
- JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, v.25, no.6, pp 662 - 669
- Pages
- 8
- Indexed
- SCIE
SCOPUS
KCI
- Journal Title
- JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE
- Volume
- 25
- Number
- 6
- Start Page
- 662
- End Page
- 669
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210365
- DOI
- 10.5573/JSTS.2025.25.6.662
- ISSN
- 1598-1657
2233-4866
- Abstract
- Vector similarity search is a key component of Retrieval-Augmented Generation (RAG) for large language models (LLMs), requiring memory-intensive computations such as Manhattan distance, Euclidean distance, and cosine similarity. Processing-In-Memory (PIM) architectures offer a promising solution to accelerate these memory-bound operations by reducing data movement between memory and processor. This study presents a hardware-software co-design approach for optimizing distance computation on PIM. We first implemented and evaluated a vector similarity search application on a DRAM-based PIM platform using the developed computation library, achieving 44.2% and 59.0% speed improvements for Euclidean distance and cosine similarity, respectively, compared to the CPU. However, instruction set limitations led to performance bottlenecks despite software-level optimization. To address this, we utilized an HBM-based PIM simulator and proposed two new instructions, AMC and MAN, optimized for Euclidean and Manhattan distance computations. Evaluation using a simulator integrated with DRAMSim2 showed that the proposed instructions reduced the total cycle count for distance computations by up to 44% compared to the baseline, with performance gains increasing for larger input sizes. These results demonstrate that both software-level and instruction-level optimizations are essential to fully exploit the performance potential of PIM architectures for distance computation workloads.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.