TTLG - An Efficient Tensor Transposition Library for GPUs
- Authors
- Vedurada, J.; Suresh, A.; Rajam, A.S.; Kim, J.; Hong, C.; Panyala, A.; Krishnamoorthy, S.; Nandivada, V.K.; Srivastava, R.K.; Sadayappan, P.
- Issue Date
- May-2018
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- GPU; High performance; Tensor Transpose
- Citation
- Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018, pp 578 - 588
- Pages
- 11
- Journal Title
- Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
- Start Page
- 578
- End Page
- 588
- URI
- https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/63881
- DOI
- 10.1109/IPDPS.2018.00067
- Abstract
- This paper presents a Tensor Transposition Library for GPUs (TTLG). A distinguishing feature of TTLG is that it also includes a performance prediction model, which can be used by higher level optimizers that use tensor transposition. For example, tensor contractions are often implemented by using the TTGT (Transpose-Transpose-GEMM-Transpose) approach-transpose input tensors to a suitable layout and then use high-performance matrix multiplication followed by transposition of the result. The performance model is also used internally by TTLG for choosing among alternative kernels and/or slicing/blocking parameters for the transposition. TTLG is compared with current state-of-The-Art alternatives for GPUs. Comparable or better transposition times for the 'repeated-use' scenario and considerably better 'single-use' performance are observed. © 2018 IEEE.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Software > School of Computer Science and Engineering > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/63881)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.