Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Shim, Kyuhong; Choi, Iksoo; Sung, Wonyong; Choi, Jung wook

doi:10.1109/ISOCC53507.2021.9613933

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Authors: Shim, Kyuhong; Choi, Iksoo; Sung, Wonyong; Choi, Jung wook

Issue Date: Nov-2021

Publisher: IEEE

Keywords: multihead attention; pruning; transformer

Citation: Proceedings - International SoC Design Conference 2021, ISOCC 2021, pp.357 - 358

Indexed: SCOPUS

Journal Title: Proceedings - International SoC Design Conference 2021, ISOCC 2021

Start Page: 357

End Page: 358

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/140374

DOI: 10.1109/ISOCC53507.2021.9613933

ISSN: 2163-9612

Abstract: Recently, the necessity of multiple attention heads in transformer architecture has been questioned [1]. Removing less important heads from a large network is a promising strategy to reduce computation cost and parameters. However, pruning out attention heads in multihead attention does not evenly reduce the overall load, because feedforward modules are not affected. In this study, we apply attention head pruning on All-Attention [2] transformer, where savings in the computation are proportional to the number of pruned heads. This improved computing efficiency comes at the cost of pruning sensitivity, which we stabilize with three training techniques. Our attention head pruning enables a considerably fewer number of parameters with a comparable perplexity for transformer-based language modeling.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,030,066; Today View :3,578

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE