DPCrypto: Acceleration of Post-Quantum Cryptography Using Dot-Product Instructions on GPUs

Lee, Wai-Kong; Seo, Hwajeong; Hwang, Seong Oun; Achar, Ramachandra; Karmakar, Angshuman; Mera, Jose Maria Bermudo

Detailed Information

Cited 2 time in webofscience

Cited 3 time in scopus

Metadata Downloads

DPCrypto: Acceleration of Post-Quantum Cryptography Using Dot-Product Instructions on GPUs

Authors: Lee, Wai-Kong; Seo, Hwajeong; Hwang, Seong Oun; Achar, Ramachandra; Karmakar, Angshuman; Mera, Jose Maria Bermudo

Issue Date: Sep-2022

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: Graphics processing units; Computer architecture; Cryptography; Convolution; Throughput; NIST; Standardization; Post-quantum cryptography; dot-product; polynomial convolution; matrix-multiplication; graphics processing unit; FrodoKEM and Saber

Citation: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, v.69, no.9, pp.3591 - 3604

Journal Title: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS

Volume: 69

Number: 9

Start Page: 3591

End Page: 3604

URI: https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/85432

DOI: 10.1109/TCSI.2022.3176966

ISSN: 1549-8328

Abstract: Modern NVIDIA GPU architectures offer dot-product instructions (DP2A and DP4A), with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a single clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper, we show that the dot-product instruction can also be used to accelerate matrix-multiplication and polynomial convolution operations, which are widely used in post-quantum lattice-based cryptographic schemes. In particular, we propose a highly optimized implementation of FrodoKEM wherein the matrix-multiplication is accelerated by the dot-product instruction. We also present specially designed data structures that allow an efficient implementation of Saber key-encapsulation mechanism, utilizing the dot-product instruction to speed-up the polynomial convolution. The proposed FrodoKEM implementation achieves 4.37x higher throughput than the state-of-the-art implementation on a V100 GPU. This paper also presents the first implementation of Saber on GPU platforms, achieving 124,418, 120,463, and 31,658 key exchanges per second on RTX3080, V100, and T4 GPUs, respectively. Since matrix-multiplication and polynomial convolution operations are the most time-consuming operations in lattice-based cryptographic schemes, we strongly believe that the proposed methods can be beneficial to other KEM and signatures schemes based on lattices.

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Hwang, Seong Oun photo

Hwang, Seong Oun: College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,149,905; Today View :11,237

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE