Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission
- Authors
- Lee, Bong-Ki; Chang, Joon-Hyuk
- Issue Date
- Feb-2016
- Publisher
- IEEE Advancing Technology for Humanity
- Keywords
- Adaptive multi-rate wideband; deep neural network (DNN); network speech recognition; packet loss concealment (PLC); regression model; speech quality
- Citation
- IEEE/ACM Transactions on Audio, Speech, and Language Processing, v.24, no.2, pp 378 - 387
- Pages
- 10
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- Volume
- 24
- Number
- 2
- Start Page
- 378
- End Page
- 387
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/24002
- DOI
- 10.1109/TASLP.2015.2509780
- ISSN
- 2329-9290
2329-9304
- Abstract
- In this paper, we propose the regression-based packet loss concealment (PLC) for digital speech transmission by using deep neural networks (DNNs) with a multiple-layer deep architecture. For the DNN training, log-power spectra and phases are employed as features in the input layer for the large training set, which ensures non-linear mapping the frames from the last correctly received frame to the missing frame. Once the training is accomplished by the restricted Boltzmann machine (RBM)-based pre-training to initialize the DNN, minimum mean square error (MMSE)-based fine tuning is then performed based on the back-propagation algorithm. In the reconstruction stage, the trained DNN model is fed with the features of the previous frames in order to estimate the log-power spectra and phases of the missing frames. Reconstruction is further improved by using the cross-fading technique to mitigate discontinuity between the reconstruction signal and good frame signal in the time-domain. To demonstrate the performance of the proposed algorithm, hidden Markov model (HMM)-based PLC algorithm and the PLC algorithm standardized in adaptive multi-rate wideband (AMR-WB) Appendix I were used for comparison. The experimental results show that the proposed approach provides better speech quality and speech recognition accuracy than the conventional approaches.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.