Cited 0 time in
BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jung, Jinwoo | - |
| dc.contributor.author | Kim, Misuk | - |
| dc.date.accessioned | 2026-06-02T02:00:16Z | - |
| dc.date.available | 2026-06-02T02:00:16Z | - |
| dc.date.issued | 2026-04 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212934 | - |
| dc.description.abstract | Harmful online content, including hate speech, fraud, and phishing, is increasingly disseminated in obfuscated forms designed to evade detection. This creates an urgent need for accurate and efficient real-time de-obfuscation methods to protect users and maintain trust. Existing obfuscation detection methods rely on large auto-regressive models and byte-level fallback tokenizers, which are hindered by slow inference speeds and face difficulties in handling graphemes with multiple code points and out-of-vocabulary (OOV) processing. This study proposes Bidirectionally Aligned Next-Token Denoising ( BIND ), which integrates character-level token alignment with a novel attention technique to enable precise and efficient corrections at fixed positions. Experiments conducted on a public dataset of obfuscated harmful text demonstrate that BIND outperforms existing methods. BIND has shown strong robustness against various text-based visual, phonetic, and semantic perturbations, proving particularly resilient against emojis and other OOV elements. This research highlights how a task-specific small language model can outperform larger ones, offering a practical solution for real-time harmful content mitigation and contributing to the development of a safer and more responsible web. | - |
| dc.format.extent | 12 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Association for Computing Machinery | - |
| dc.title | BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1145/3774904.3792596 | - |
| dc.identifier.scopusid | 2-s2.0-105038546985 | - |
| dc.identifier.bibliographicCitation | WWW 2026 - Proceedings of the ACM Web Conference 2026, pp 1705 - 1716 | - |
| dc.citation.title | WWW 2026 - Proceedings of the ACM Web Conference 2026 | - |
| dc.citation.startPage | 1705 | - |
| dc.citation.endPage | 1716 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordAuthor | decoder-as-encoder transformation | - |
| dc.subject.keywordAuthor | harmful content mitigation | - |
| dc.subject.keywordAuthor | small language models | - |
| dc.subject.keywordAuthor | text deobfuscation | - |
| dc.subject.keywordAuthor | web safety and trust | - |
| dc.identifier.url | https://dl.acm.org/doi/10.1145/3774904.3792596 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
