Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission

Lee, Bong-Ki; Chang, Joon-Hyuk

doi:10.1109/TASLP.2015.2509780

Detailed Information

Cited 20 time in webofscience

Cited 26 time in scopus

Metadata Downloads

Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission

Authors: Lee, Bong-Ki; Chang, Joon-Hyuk

Issue Date: Feb-2016

Publisher: IEEE Advancing Technology for Humanity

Keywords: Adaptive multi-rate wideband; deep neural network (DNN); network speech recognition; packet loss concealment (PLC); regression model; speech quality

Citation: IEEE/ACM Transactions on Audio, Speech, and Language Processing, v.24, no.2, pp 378 - 387

Pages: 10

Indexed: SCIE
SCOPUS

Journal Title: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Volume: 24

Number: 2

Start Page: 378

End Page: 387

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/24002

DOI: 10.1109/TASLP.2015.2509780

ISSN: 2329-9290
2329-9304

Abstract: In this paper, we propose the regression-based packet loss concealment (PLC) for digital speech transmission by using deep neural networks (DNNs) with a multiple-layer deep architecture. For the DNN training, log-power spectra and phases are employed as features in the input layer for the large training set, which ensures non-linear mapping the frames from the last correctly received frame to the missing frame. Once the training is accomplished by the restricted Boltzmann machine (RBM)-based pre-training to initialize the DNN, minimum mean square error (MMSE)-based fine tuning is then performed based on the back-propagation algorithm. In the reconstruction stage, the trained DNN model is fed with the features of the previous frames in order to estimate the log-power spectra and phases of the missing frames. Reconstruction is further improved by using the cross-fading technique to mitigate discontinuity between the reconstruction signal and good frame signal in the time-domain. To demonstrate the performance of the proposed algorithm, hidden Markov model (HMM)-based PLC algorithm and the PLC algorithm standardized in adaptive multi-rate wideband (AMR-WB) Appendix I were used for comparison. The experimental results show that the proposed approach provides better speech quality and speech recognition accuracy than the conventional approaches.

Files in This Item: There are no files associated with this item.

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE