Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Efficient Speaker Embedding Extraction Using a Twofold Sliding Window Algorithm for Speaker Diarization

Full metadata record
DC Field Value Language
dc.contributor.authorChoi, Jeong-Hwan-
dc.contributor.authorJeoung, Ye-Rin-
dc.contributor.authorKim, Ilseok-
dc.contributor.authorChang, Joon-Hyuk-
dc.date.accessioned2025-02-13T03:00:10Z-
dc.date.available2025-02-13T03:00:10Z-
dc.date.issued2024-09-
dc.identifier.issn1990-9772-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206483-
dc.description.abstractThis paper proposes an efficient speaker embedding (SE) extraction method that employs a twofold sliding window algorithm (SWA) for speaker diarization (SD) systems. Non-overlapping short segments are obtained through the first SWA and fed into the frame-level neural networks of a pre-trained SE model to extract frame-level representations. The neighboring frame-level representations are concatenated along the time axis through the second SWA, which enables an overlap between representations. The concatenated representations are used to extract multiple SEs. Additionally, we propose a fine-tuning strategy that employs a residual adapter and knowledge distillation techniques on a pre-trained SE model to refine the frame-level representation. Experimental results using two SD benchmarks show the effectiveness of the proposed extraction method with a fine-tuned SE model in terms of floating-point operations while maintaining the diarization error rate.-
dc.format.extent5-
dc.language영어-
dc.language.isoENG-
dc.titleEfficient Speaker Embedding Extraction Using a Twofold Sliding Window Algorithm for Speaker Diarization-
dc.typeArticle-
dc.identifier.doi10.21437/Interspeech.2024-1874-
dc.identifier.scopusid2-s2.0-85214823715-
dc.identifier.wosid001331850103177-
dc.identifier.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 3749 - 3753-
dc.citation.titleProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH-
dc.citation.startPage3749-
dc.citation.endPage3753-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.subject.keywordPlusError statistics-
dc.subject.keywordPlusNetwork embeddings-
dc.subject.keywordAuthorsegmentation-
dc.subject.keywordAuthorsliding window algorithm-
dc.subject.keywordAuthorspeaker diarization-
dc.subject.keywordAuthorspeaker embedding-
dc.identifier.urlhttps://www.isca-archive.org/interspeech_2024/choi24d_interspeech.html-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chang, Joon-Hyuk photo

Chang, Joon-Hyuk
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE