Subject-Independent Silent Speech Classification Using Three-Axis Accelerometers with z-Axis Vector Rotation-Based Data Augmentation
- Authors
- Jung, Sungmin; Sohn, Jang Jay; Kwon, Jinuk; Im, Chang-Hwan
- Issue Date
- Mar-2026
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Keywords
- Accelerometers; Accuracy; Deep learning; Speech recognition; Calibration; Artificial intelligence; Vectors; Training; Gravity; Data augmentation; Silent speech recognition; inertial measurement unit (IMU); deep learning; vector rotation; human-computer interface (HCI)
- Citation
- IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, v.34, pp 1686 - 1697
- Pages
- 12
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
- Volume
- 34
- Start Page
- 1686
- End Page
- 1697
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/214013
- DOI
- 10.1109/TASLPRO.2026.3671660
- ISSN
- 2998-4173
2998-4173
- Abstract
- Silent speech interfaces (SSIs) offer a promising alternative communication method for individuals with speech impairments and in environments where acoustic speech is not feasible. In this study, we propose a subject-independent silent speech recognition system that utilizes facial muscle movements measured by three-axis accelerometers attached to the facial skin. To address inter-individual variability arising from differences in facial anatomy and sensor placement, we introduce spatial normalization and data augmentation methods. First, a z-alignment process aligns the accelerometer z-axis with the direction of gravity, providing a consistent vertical reference across participants. Subsequently, a yaw augmentation process simulates rotational variability of accelerometers in the perpendicular horizontal plane by applying controlled angular perturbations around the z-axis. These techniques eliminate the need for subject-specific calibration while improving model generalizability. The proposed approach was applied to an accelerometer dataset recorded while 20 participants silently spoke 30 Korean words. The results demonstrated substantial performance improvement, with the proposed method achieving an average classification accuracy of 82.93 ± 4.09%, compared with 75.97 ± 6.06% without the proposed approach. Further evaluation on a selected 20-word subset yielded an accuracy of 92.10 ± 3.28%, demonstrating that high-performance subject-independent SSIs can be implemented using the proposed method.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > ETC > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.