BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text

Jung, Jinwoo; Kim, Misuk

doi:10.1145/3774904.3792596

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jung, Jinwoo	-
dc.contributor.author	Kim, Misuk	-
dc.date.accessioned	2026-06-02T02:00:16Z	-
dc.date.available	2026-06-02T02:00:16Z	-
dc.date.issued	2026-04	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212934	-
dc.description.abstract	Harmful online content, including hate speech, fraud, and phishing, is increasingly disseminated in obfuscated forms designed to evade detection. This creates an urgent need for accurate and efficient real-time de-obfuscation methods to protect users and maintain trust. Existing obfuscation detection methods rely on large auto-regressive models and byte-level fallback tokenizers, which are hindered by slow inference speeds and face difficulties in handling graphemes with multiple code points and out-of-vocabulary (OOV) processing. This study proposes Bidirectionally Aligned Next-Token Denoising ( BIND ), which integrates character-level token alignment with a novel attention technique to enable precise and efficient corrections at fixed positions. Experiments conducted on a public dataset of obfuscated harmful text demonstrate that BIND outperforms existing methods. BIND has shown strong robustness against various text-based visual, phonetic, and semantic perturbations, proving particularly resilient against emojis and other OOV elements. This research highlights how a task-specific small language model can outperform larger ones, offering a practical solution for real-time harmful content mitigation and contributing to the development of a safer and more responsible web.	-
dc.format.extent	12	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Association for Computing Machinery	-
dc.title	BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1145/3774904.3792596	-
dc.identifier.scopusid	2-s2.0-105038546985	-
dc.identifier.bibliographicCitation	WWW 2026 - Proceedings of the ACM Web Conference 2026, pp 1705 - 1716	-
dc.citation.title	WWW 2026 - Proceedings of the ACM Web Conference 2026	-
dc.citation.startPage	1705	-
dc.citation.endPage	1716	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordAuthor	decoder-as-encoder transformation	-
dc.subject.keywordAuthor	harmful content mitigation	-
dc.subject.keywordAuthor	small language models	-
dc.subject.keywordAuthor	text deobfuscation	-
dc.subject.keywordAuthor	web safety and trust	-
dc.identifier.url	https://dl.acm.org/doi/10.1145/3774904.3792596	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > ETC > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher MISUK, KIM photo

MISUK, KIM: COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE