Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weightopen access
- Authors
- Shin, Gun-Yoon; Jang, Younghoan; Kim, Dong-Wook; Park, Sungjin; Park, A-Ran; Kim, Younghwan; Han, Myung-Mook
- Issue Date
- Jan-2024
- Publisher
- IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
- Keywords
- Dark Web; Feature extraction; Data models; Analytical models; Text categorization; Graph neural networks; Classification algorithms; Dark web; dark web analysis; text classification; topic modeling; model explanation
- Citation
- IEEE ACCESS, v.12, pp 36361 - 36371
- Pages
- 11
- Journal Title
- IEEE ACCESS
- Volume
- 12
- Start Page
- 36361
- End Page
- 36371
- URI
- https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/90821
- DOI
- 10.1109/ACCESS.2023.3347737
- ISSN
- 2169-3536
- Abstract
- The Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these cyber threats. We processed Dark Web texts to extract vector types suitable for machine learning classification. Traditional methods utilizing the entirety of Dark Web texts to generate features result in vectors including all words found on the Dark Web. However, this approach incorporates extraneous information in the vectors, diminishing learning effectiveness and extending processing duration. The research aimed to optimize the classification process by selectively focusing on keywords within each class, thereby curtailing word vector dimensions. This optimization was facilitated by leveraging the anonymity characteristic of the Dark Web and employing topic-modeling-based weight generation. These methods enabled the creation of word vectors with a constrained feature set, enhancing the distinction of Dark Web classes. To further improve classification performance, we integrated TextCNN with topic modeling weights. For validation, we employed two datasets and compared the performance of the model with other text classification algorithms, where the proposed model demonstrated superior effectiveness in Dark Web classification.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - ETC > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/90821)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.