Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

CNN을 위한 사투리 음성 데이터의 Spectrogram 이미지 변환 적용의 POC 검증POC Analysis of Spectrogram Image Transformation Application to Dialect Speech Data for CNN

Other Titles
POC Analysis of Spectrogram Image Transformation Application to Dialect Speech Data for CNN
Authors
차병래권용
Issue Date
Aug-2024
Publisher
IT연구소
Keywords
Dialect audio data; Spectrogram; Mel-spectrogram; CNN; RNN
Citation
Journal of Information Technology and Applied Engineering, v.14, no.2, pp 1 - 7
Pages
7
Indexed
KCI
Journal Title
Journal of Information Technology and Applied Engineering
Volume
14
Number
2
Start Page
1
End Page
7
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/195190
DOI
10.22733/JITAE.2024.14.02.001
ISSN
2234-3326
Abstract
본질적으로 음성 데이터는 시계열(Time-series) 데이터이며, 따라서 음성 분류를 위해 ARIMA (Autoregressive integrated moving average) 또는 ES(Exponential smoothing) 알고리즘과 같은 시계열 알고리즘 또는 ML 측면에서는 RNN(Recurrent neural network)을 사용한다. 또 다른 방법으로 RNN 대신에 CNN(Convolutional neural network) 학습 과정의 입력으로 시계 열 숫자 배열이 아닌 오디오 데이터를 나타내는 이미지로 스펙트로그램(Spectrogram)을 사용하는 것이다. 본 논문에서는 시계열 데이터 분석에 RNN 대신에 CNN 기법을 활용하기 위한 음성 데이 터의 Spectrogram 분석과 Mel-spectrogram 분석 이미지를 입력 데이터의 이미지로 사용하며, 사투리 음성의 패턴을 추출하기 위한 CNN 모델을 제안한다. 또한, 제안된 모델의 파이썬 기반 프 로토타입에 의한 POC(Proof of concept)를 수행하여 가능성을 검증하였다.
In essence, audio data is time-series data. Therefore, for audio classification, time-series algorithms such as ARIMA (Autoregressive integrated moving average), ES (Exponential smoothing), or RNN (Recurrent neural network) in machine learning terms are commonly employed. Another method is to use a spectrogram as an image representing audio data rather than a time series number array as the input to the CNN(Convolutional neural network) learning process instead of RNN. In this paper, we use spectrogram analysis and mel-spectrogram analysis images of audio data as input data, using the CNN technique instead of RNN for time series data analysis, and we propose a CNN model to extract patterns of dialect audio. In addition, the feasibility was verified by performing a POC(Proof of concept) using a Python-based prototype of the proposed model.
Files in This Item
Go to Link
Appears in
Collections
서울 예술·체육대학 > 서울 연극영화학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kwon, Yong photo

Kwon, Yong
COLLEGE OF PERFORMING ARTS AND SPORT (DEPARTMENT OF THEATER AND FILM)
Read more

Altmetrics

Total Views & Downloads

BROWSE