Detailed Information

Cited 5 time in webofscience Cited 6 time in scopus
Metadata Downloads

TensorCrypto: High Throughput Acceleration of Lattice-based Cryptography Using Tensor Core on GPU

Authors
Lee, Wai-KongSeo, HwajeongZhang, ZhenfeiHwang, Seong Oun
Issue Date
Feb-2022
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
convolution; Cryptography; graphics processing units; information security; lattice-based cryptography; polynomial convolution; tensor core
Citation
IEEE Access, v.10, pp.20616 - 20632
Journal Title
IEEE Access
Volume
10
Start Page
20616
End Page
20632
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/83819
DOI
10.1109/ACCESS.2022.3152217
ISSN
2169-3536
Abstract
Tensor core is a newly introduced hardware unit in NVIDIA GPU chips that allows matrix multiplication to be computed much faster than in the integer and floating-point units. In this paper, we show that for the first time, tensor core can be used to accelerate state-of-the-art lattice-based cryptosystems. We employed tensor core to speed up polynomial convolution, which is the most time consuming operation in lattice-based cryptosystems. Towards that aim, several parallel algorithms are proposed to allow the tensor core to handle flexible matrix sizes and ephemeral key pairs. Experimental results show that the polynomial convolution computed using the tensor core is at least 2× faster than the version implemented with conventional integer units of the NVIDIA GPU. The proposed tensor-core-based polynomial convolution technique was applied to NTRU, one of the finalists in NIST post-quantum cryptography (PQC) standardization. It achieved 2.02×/1.98× (encapsulation) and 1.56×/1.90× (decapsulation) higher throughput on two parameter sets (ntruhps2048509 and ntruhps2048677), compared to the conventional integer-based implementations on a GPU. In particular, the proposed implementation techniques achieved throughput up to 793651 key encapsulations per second and 505051 decapsulations per second on a RTX2060 GPU. To demonstrate the flexibility of the proposed technique, we extend the implementation to other lattice-based cryptosystems that have a small modulus: LAC and two variant parameter sets in FrodoKEM. Considering that the IoT gateway devices and cloud servers need to handle massive connections from the sensor nodes, the proposed high throughput implementation on GPU is very useful in securing the IoT communication. Author
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Hwang, Seong Oun photo

Hwang, Seong Oun
College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))
Read more

Altmetrics

Total Views & Downloads

BROWSE