Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Splitting of Composite Neural Networks via Proximal Operator With Information Bottleneckopen access

Authors
Han, Sang-IlNakamura, KensukeHong, Byung-Woo
Issue Date
2024
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Linear programming; Deep learning; Task analysis; Mutual information; Training; Optimization methods; Biological neural networks; information bottleneck; stochastic gradient descent; proximal algorithm
Citation
IEEE ACCESS, v.12, pp 157 - 167
Pages
11
Journal Title
IEEE ACCESS
Volume
12
Start Page
157
End Page
167
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/72668
DOI
10.1109/ACCESS.2023.3346697
ISSN
2169-3536
Abstract
Deep learning has achieved efficient success in the field of machine learning, made possible by the emergence of efficient optimization methods such as Stochastic Gradient Descent (SGD) and its variants. Simultaneously, the Information Bottleneck theory (IB) has been studied to train neural networks, aiming to enhance the performance of optimization methods. However, previous works have focused on their specific tasks, and the effect of the IB theory on general deep learning tasks is still unclear. In this study, we introduce a new method inspired by the proximal operator, which sequentially updates the neural network parameters based on the defined bottleneck features between the forward and backward networks. Unlike the conventional proximal-based methods, we consider the second-order gradients of the objective function to achieve better updates for the forward networks. In contrast to SGD-based methods, our approach involves accessing the network's black box, and incorporating the bottleneck feature update process into the parameter update process. This way, from the perspective of the IB theory, the data is well compressed up to the bottleneck feature, ensuring that the compressed information maintains sufficient mutual information up to the final output. To demonstrate the performance of the proposed approach, we applied the method to various optimizers with several tasks and analyzed the results by training on both the MNIST dataset and CIFAR-10 dataset. We also conducted several ablation studies by modifying the components of the proposed algorithm to further validate its performance.
Files in This Item
Appears in
Collections
College of Software > Department of Artificial Intelligence > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Hong, Byung-Woo photo

Hong, Byung-Woo
소프트웨어대학 (AI학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE