Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text

Full metadata record
DC Field Value Language
dc.contributor.authorJung, Jinwoo-
dc.contributor.authorKim, Misuk-
dc.date.accessioned2026-06-02T02:00:16Z-
dc.date.available2026-06-02T02:00:16Z-
dc.date.issued2026-04-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212934-
dc.description.abstractHarmful online content, including hate speech, fraud, and phishing, is increasingly disseminated in obfuscated forms designed to evade detection. This creates an urgent need for accurate and efficient real-time de-obfuscation methods to protect users and maintain trust. Existing obfuscation detection methods rely on large auto-regressive models and byte-level fallback tokenizers, which are hindered by slow inference speeds and face difficulties in handling graphemes with multiple code points and out-of-vocabulary (OOV) processing. This study proposes Bidirectionally Aligned Next-Token Denoising ( BIND ), which integrates character-level token alignment with a novel attention technique to enable precise and efficient corrections at fixed positions. Experiments conducted on a public dataset of obfuscated harmful text demonstrate that BIND outperforms existing methods. BIND has shown strong robustness against various text-based visual, phonetic, and semantic perturbations, proving particularly resilient against emojis and other OOV elements. This research highlights how a task-specific small language model can outperform larger ones, offering a practical solution for real-time harmful content mitigation and contributing to the development of a safer and more responsible web.-
dc.format.extent12-
dc.language영어-
dc.language.isoENG-
dc.publisherAssociation for Computing Machinery-
dc.titleBIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1145/3774904.3792596-
dc.identifier.scopusid2-s2.0-105038546985-
dc.identifier.bibliographicCitationWWW 2026 - Proceedings of the ACM Web Conference 2026, pp 1705 - 1716-
dc.citation.titleWWW 2026 - Proceedings of the ACM Web Conference 2026-
dc.citation.startPage1705-
dc.citation.endPage1716-
dc.type.docTypeConference paper-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordAuthordecoder-as-encoder transformation-
dc.subject.keywordAuthorharmful content mitigation-
dc.subject.keywordAuthorsmall language models-
dc.subject.keywordAuthortext deobfuscation-
dc.subject.keywordAuthorweb safety and trust-
dc.identifier.urlhttps://dl.acm.org/doi/10.1145/3774904.3792596-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher MISUK, KIM photo

MISUK, KIM
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE