Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Hardware-software Co-design for Vector Similarity Search on HBM-PIM

Authors
Kim, NahyeonKim, SujinJung, MinNoh, HaechannuriKim, Ji-Hoon
Issue Date
Dec-2025
Publisher
IEEK PUBLICATION CENTER
Keywords
Processing-in-memory (PIM); retrieval-augmented generation (RAG); vector similarity search; distance computation; instruction set extension; hardware-software co-design; PIM simulator
Citation
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, v.25, no.6, pp 662 - 669
Pages
8
Indexed
SCIE
SCOPUS
KCI
Journal Title
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE
Volume
25
Number
6
Start Page
662
End Page
669
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/210365
DOI
10.5573/JSTS.2025.25.6.662
ISSN
1598-1657
2233-4866
Abstract
Vector similarity search is a key component of Retrieval-Augmented Generation (RAG) for large language models (LLMs), requiring memory-intensive computations such as Manhattan distance, Euclidean distance, and cosine similarity. Processing-In-Memory (PIM) architectures offer a promising solution to accelerate these memory-bound operations by reducing data movement between memory and processor. This study presents a hardware-software co-design approach for optimizing distance computation on PIM. We first implemented and evaluated a vector similarity search application on a DRAM-based PIM platform using the developed computation library, achieving 44.2% and 59.0% speed improvements for Euclidean distance and cosine similarity, respectively, compared to the CPU. However, instruction set limitations led to performance bottlenecks despite software-level optimization. To address this, we utilized an HBM-based PIM simulator and proposed two new instructions, AMC and MAN, optimized for Euclidean and Manhattan distance computations. Evaluation using a simulator integrated with DRAMSim2 showed that the proposed instructions reduced the total cycle count for distance computations by up to 44% compared to the baseline, with performance gains increasing for larger input sizes. These results demonstrate that both software-level and instruction-level optimizations are essential to fully exploit the performance potential of PIM architectures for distance computation workloads.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Ji Hoon photo

Kim, Ji Hoon
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE