Convolutional Recurrent Neural Network with Auxiliary Stream for Robust Variable-Length Acoustic Scene Classification
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choi, Won-Gook | - |
dc.contributor.author | Chang, Joon-Hyuk | - |
dc.date.accessioned | 2022-12-20T06:25:06Z | - |
dc.date.available | 2022-12-20T06:25:06Z | - |
dc.date.created | 2022-11-02 | - |
dc.date.issued | 2022-09 | - |
dc.identifier.issn | 2308-457X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173089 | - |
dc.description.abstract | Deep learning has proven to be suitable for acoustic scene classification (ASC). Therefore, it exhibits significant improvement in performance while using neural networks. However, several studies have been performed using convolutional neural network (CNN) rather than recurrent neural network (RNN) or convolutional recurrent neural network (CRNN), even though acoustic scene data is treated as a temporal signal. In practice, CRNNs are rarely adopted and are ranked lower in recent detection and classification of acoustic scenes and events (DCASE) challenges for fixed-length (i.e., 10 s) ASC. In this paper, an auxiliary stream technique is proposed that can improve the performance of CRNNs compared with that of CNNs by controlling the inductive bias of RNN. The auxiliary stream trains CNN by effectively extracting embeddings and is only connected on training steps. Therefore, it does not affect the model complexity on the inference steps. The experimental results demonstrate the superiority of the proposed method, regardless of the CNN model used for CRNN. Additionally, the proposed method yields robustness on variable-length ASC by performing streaming inferences and demonstrates the importance of CRNN. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | International Speech Communication Association | - |
dc.title | Convolutional Recurrent Neural Network with Auxiliary Stream for Robust Variable-Length Acoustic Scene Classification | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Chang, Joon-Hyuk | - |
dc.identifier.doi | 10.21437/Interspeech.2022-959 | - |
dc.identifier.scopusid | 2-s2.0-85140094712 | - |
dc.identifier.wosid | 000900724502119 | - |
dc.identifier.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, v.2022-September, pp.2418 - 2422 | - |
dc.relation.isPartOf | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.title | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH | - |
dc.citation.volume | 2022-September | - |
dc.citation.startPage | 2418 | - |
dc.citation.endPage | 2422 | - |
dc.type.rims | ART | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Acoustics | - |
dc.relation.journalResearchArea | Audiology & Speech-Language Pathology | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Acoustics | - |
dc.relation.journalWebOfScienceCategory | Audiology & Speech-Language Pathology | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordPlus | Convolution | - |
dc.subject.keywordPlus | Convolutional neural networks | - |
dc.subject.keywordPlus | Speech communication | - |
dc.subject.keywordPlus | Recurrent neural networks | - |
dc.subject.keywordPlus | Acoustic scene classification | - |
dc.subject.keywordPlus | Convolutional neural network | - |
dc.subject.keywordPlus | Convolutional recurrent neural network | - |
dc.subject.keywordPlus | Inductive bias | - |
dc.subject.keywordPlus | Neural-networks | - |
dc.subject.keywordPlus | Performance | - |
dc.subject.keywordPlus | Scene classification | - |
dc.subject.keywordPlus | Streaming | - |
dc.subject.keywordPlus | Temporal signals | - |
dc.subject.keywordPlus | Variable length | - |
dc.subject.keywordAuthor | acoustic scene classification | - |
dc.subject.keywordAuthor | convolutional recurrent neural network | - |
dc.subject.keywordAuthor | streaming | - |
dc.subject.keywordAuthor | variable-length | - |
dc.identifier.url | https://www.isca-speech.org/archive/interspeech_2022/chang22d_interspeech.html | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.