BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Textopen access
- Authors
- Jung, Jinwoo; Kim, Misuk
- Issue Date
- Apr-2026
- Publisher
- Association for Computing Machinery
- Keywords
- decoder-as-encoder transformation; harmful content mitigation; small language models; text deobfuscation; web safety and trust
- Citation
- WWW 2026 - Proceedings of the ACM Web Conference 2026, pp 1705 - 1716
- Pages
- 12
- Indexed
- SCOPUS
- Journal Title
- WWW 2026 - Proceedings of the ACM Web Conference 2026
- Start Page
- 1705
- End Page
- 1716
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212934
- DOI
- 10.1145/3774904.3792596
- Abstract
- Harmful online content, including hate speech, fraud, and phishing, is increasingly disseminated in obfuscated forms designed to evade detection. This creates an urgent need for accurate and efficient real-time de-obfuscation methods to protect users and maintain trust. Existing obfuscation detection methods rely on large auto-regressive models and byte-level fallback tokenizers, which are hindered by slow inference speeds and face difficulties in handling graphemes with multiple code points and out-of-vocabulary (OOV) processing. This study proposes Bidirectionally Aligned Next-Token Denoising ( BIND ), which integrates character-level token alignment with a novel attention technique to enable precise and efficient corrections at fixed positions. Experiments conducted on a public dataset of obfuscated harmful text demonstrate that BIND outperforms existing methods. BIND has shown strong robustness against various text-based visual, phonetic, and semantic perturbations, proving particularly resilient against emojis and other OOV elements. This research highlights how a task-specific small language model can outperform larger ones, offering a practical solution for real-time harmful content mitigation and contributing to the development of a safer and more responsible web.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > ETC > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.