Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Textopen access

Authors
Jung, JinwooKim, Misuk
Issue Date
Apr-2026
Publisher
Association for Computing Machinery
Keywords
decoder-as-encoder transformation; harmful content mitigation; small language models; text deobfuscation; web safety and trust
Citation
WWW 2026 - Proceedings of the ACM Web Conference 2026, pp 1705 - 1716
Pages
12
Indexed
SCOPUS
Journal Title
WWW 2026 - Proceedings of the ACM Web Conference 2026
Start Page
1705
End Page
1716
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212934
DOI
10.1145/3774904.3792596
Abstract
Harmful online content, including hate speech, fraud, and phishing, is increasingly disseminated in obfuscated forms designed to evade detection. This creates an urgent need for accurate and efficient real-time de-obfuscation methods to protect users and maintain trust. Existing obfuscation detection methods rely on large auto-regressive models and byte-level fallback tokenizers, which are hindered by slow inference speeds and face difficulties in handling graphemes with multiple code points and out-of-vocabulary (OOV) processing. This study proposes Bidirectionally Aligned Next-Token Denoising ( BIND ), which integrates character-level token alignment with a novel attention technique to enable precise and efficient corrections at fixed positions. Experiments conducted on a public dataset of obfuscated harmful text demonstrate that BIND outperforms existing methods. BIND has shown strong robustness against various text-based visual, phonetic, and semantic perturbations, proving particularly resilient against emojis and other OOV elements. This research highlights how a task-specific small language model can outperform larger ones, offering a practical solution for real-time harmful content mitigation and contributing to the development of a safer and more responsible web.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher MISUK, KIM photo

MISUK, KIM
COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)
Read more

Altmetrics

Total Views & Downloads

BROWSE