Detailed Information

Cited 2 time in webofscience Cited 3 time in scopus
Metadata Downloads

DPCrypto: Acceleration of Post-Quantum Cryptography Using Dot-Product Instructions on GPUs

Authors
Lee, Wai-KongSeo, HwajeongHwang, Seong OunAchar, RamachandraKarmakar, AngshumanMera, Jose Maria Bermudo
Issue Date
Sep-2022
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Graphics processing units; Computer architecture; Cryptography; Convolution; Throughput; NIST; Standardization; Post-quantum cryptography; dot-product; polynomial convolution; matrix-multiplication; graphics processing unit; FrodoKEM and Saber
Citation
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, v.69, no.9, pp.3591 - 3604
Journal Title
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
Volume
69
Number
9
Start Page
3591
End Page
3604
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/85432
DOI
10.1109/TCSI.2022.3176966
ISSN
1549-8328
Abstract
Modern NVIDIA GPU architectures offer dot-product instructions (DP2A and DP4A), with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a single clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper, we show that the dot-product instruction can also be used to accelerate matrix-multiplication and polynomial convolution operations, which are widely used in post-quantum lattice-based cryptographic schemes. In particular, we propose a highly optimized implementation of FrodoKEM wherein the matrix-multiplication is accelerated by the dot-product instruction. We also present specially designed data structures that allow an efficient implementation of Saber key-encapsulation mechanism, utilizing the dot-product instruction to speed-up the polynomial convolution. The proposed FrodoKEM implementation achieves 4.37x higher throughput than the state-of-the-art implementation on a V100 GPU. This paper also presents the first implementation of Saber on GPU platforms, achieving 124,418, 120,463, and 31,658 key exchanges per second on RTX3080, V100, and T4 GPUs, respectively. Since matrix-multiplication and polynomial convolution operations are the most time-consuming operations in lattice-based cryptographic schemes, we strongly believe that the proposed methods can be beneficial to other KEM and signatures schemes based on lattices.
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Hwang, Seong Oun photo

Hwang, Seong Oun
College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))
Read more

Altmetrics

Total Views & Downloads

BROWSE