Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

Song, Woomin; Oh,  Seunghyuk; Mo, Sangwoo; Kim, Jaehyung; Yun, Sukmin; Ha, Jung-Woo; Shin, Jinwoo

doi:10.48550/arXiv.2404.10308

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

Authors: Song, Woomin; Oh, Seunghyuk; Mo, Sangwoo; Kim, Jaehyung; Yun, Sukmin; Ha, Jung-Woo; Shin, Jinwoo

Issue Date: Jan-2024

Publisher: IEEE Information Theory Society

Citation: The International Conference on Learning Representations, pp 1 - 19

Pages: 19

Indexed: FOREIGN

Journal Title: The International Conference on Learning Representations

Start Page: 1

End Page: 19

URI: https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/119137

DOI: 10.48550/arXiv.2404.10308

Abstract: Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations. HOMER uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. Each chunk is then processed collectively, employing a hierarchical strategy that merges adjacent chunks at progressive transformer layers. A token reduction technique precedes each merging, ensuring memory usage efficiency. We also propose an optimized computational order reducing the memory requirement to logarithmically scale with respect to input length, making it especially favorable for environments with tight memory restrictions. Our experiments demonstrate the proposed method's superior performance and memory efficiency, enabling the broader use of LLMs in contexts requiring extended context.

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Yun, Sukmin photo

Yun, Sukmin: COLLEGE OF COMPUTING (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :2,599,126; Today View :2,259

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE