Efficient scheme for compressing and transferring data in hadoop clusters
- Authors
- Lee S.; Lee J.; Kim Y.; Park K.; Hong J.; Heo J.
- Issue Date
- Mar-2020
- Publisher
- Association for Computing Machinery
- Keywords
- Data compression; Hadoop; Netowrk bandwidth
- Citation
- Proceedings of the ACM Symposium on Applied Computing, pp.1256 - 1263
- Journal Title
- Proceedings of the ACM Symposium on Applied Computing
- Start Page
- 1256
- End Page
- 1263
- URI
- http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/35911
- DOI
- 10.1145/3341105.3374044
- ISSN
- 0000-0000
- Abstract
- The size of data collected by public institutions and industries is rapidly exploding. As the data that needs to be processed grows larger, there is a limit to processing big data simply by using scale-up servers. To address this limitation, distributed cluster computing systems that use scale-out servers have emerged. However, if the network bandwidth is not used efficiently, the distributed cluster computing systems can not maximize the performance of the scale-out servers. In this paper, we propose an efficient scheme for compressing and transferring data in Hadoop clusters. The proposed method selects an appropriate compression algorithm by calculating the data transfer cost model based on the information entropy of data and network bandwidth. Experimental results show that the data transfer time and the amount of data transfer between the data nodes of the proposed scheme are significantly reduced. © 2020 ACM.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Information Technology > School of Computer Science and Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.