Detailed Information

Cited 20 time in webofscience Cited 26 time in scopus
Metadata Downloads

Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission

Authors
Lee, Bong-KiChang, Joon-Hyuk
Issue Date
Feb-2016
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
Adaptive multi-rate wideband; deep neural network (DNN); network speech recognition; packet loss concealment (PLC); regression model; speech quality
Citation
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.24, no.2, pp.378 - 387
Indexed
SCIE
SCOPUS
Journal Title
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
Volume
24
Number
2
Start Page
378
End Page
387
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/24002
DOI
10.1109/TASLP.2015.2509780
ISSN
2329-9290
Abstract
In this paper, we propose the regression-based packet loss concealment (PLC) for digital speech transmission by using deep neural networks (DNNs) with a multiple-layer deep architecture. For the DNN training, log-power spectra and phases are employed as features in the input layer for the large training set, which ensures non-linear mapping the frames from the last correctly received frame to the missing frame. Once the training is accomplished by the restricted Boltzmann machine (RBM)-based pre-training to initialize the DNN, minimum mean square error (MMSE)-based fine tuning is then performed based on the back-propagation algorithm. In the reconstruction stage, the trained DNN model is fed with the features of the previous frames in order to estimate the log-power spectra and phases of the missing frames. Reconstruction is further improved by using the cross-fading technique to mitigate discontinuity between the reconstruction signal and good frame signal in the time-domain. To demonstrate the performance of the proposed algorithm, hidden Markov model (HMM)-based PLC algorithm and the PLC algorithm standardized in adaptive multi-rate wideband (AMR-WB) Appendix I were used for comparison. The experimental results show that the proposed approach provides better speech quality and speech recognition accuracy than the conventional approaches.
Files in This Item
There are no files associated with this item.
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE