DPCrypto: Acceleration of Post-Quantum Cryptography Using Dot-Product Instructions on GPUs
- Authors
- Lee, Wai-Kong; Seo, Hwajeong; Hwang, Seong Oun; Achar, Ramachandra; Karmakar, Angshuman; Mera, Jose Maria Bermudo
- Issue Date
- Sep-2022
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Keywords
- Graphics processing units; Computer architecture; Cryptography; Convolution; Throughput; NIST; Standardization; Post-quantum cryptography; dot-product; polynomial convolution; matrix-multiplication; graphics processing unit; FrodoKEM and Saber
- Citation
- IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, v.69, no.9, pp.3591 - 3604
- Journal Title
- IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS
- Volume
- 69
- Number
- 9
- Start Page
- 3591
- End Page
- 3604
- URI
- https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/85432
- DOI
- 10.1109/TCSI.2022.3176966
- ISSN
- 1549-8328
- Abstract
- Modern NVIDIA GPU architectures offer dot-product instructions (DP2A and DP4A), with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a single clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper, we show that the dot-product instruction can also be used to accelerate matrix-multiplication and polynomial convolution operations, which are widely used in post-quantum lattice-based cryptographic schemes. In particular, we propose a highly optimized implementation of FrodoKEM wherein the matrix-multiplication is accelerated by the dot-product instruction. We also present specially designed data structures that allow an efficient implementation of Saber key-encapsulation mechanism, utilizing the dot-product instruction to speed-up the polynomial convolution. The proposed FrodoKEM implementation achieves 4.37x higher throughput than the state-of-the-art implementation on a V100 GPU. This paper also presents the first implementation of Saber on GPU platforms, achieving 124,418, 120,463, and 31,658 key exchanges per second on RTX3080, V100, and T4 GPUs, respectively. Since matrix-multiplication and polynomial convolution operations are the most time-consuming operations in lattice-based cryptographic schemes, we strongly believe that the proposed methods can be beneficial to other KEM and signatures schemes based on lattices.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - IT융합대학 > 컴퓨터공학과 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/85432)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.