Encoder-Based Multimodal Ensemble Learning for High Compatibility and Accuracy in Phishing Website Detection
- Authors
- Ahn, Jemin; Akhavan, Dorian; Jung, Woohwan; Kang, Kyungtae; Son, Junggab
- Issue Date
- Sep-2025
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Keywords
- Bidirectional Encoder Representations from Transformers (BERT); Encoder Model; Ensemble Learning; Multimodal; Phishing Website Detection
- Citation
- Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, v.629 LNICST, pp 347 - 365
- Pages
- 19
- Indexed
- SCOPUS
- Journal Title
- Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
- Volume
- 629 LNICST
- Start Page
- 347
- End Page
- 365
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/126589
- DOI
- 10.1007/978-3-031-94455-0_16
- ISSN
- 1867-8211
1867-822X
- Abstract
- Phishing websites pose a significant threat to modern network security. In response, various detection methods have been developed, with deep learning-based approaches recently becoming dominant. As phishing tactics grow increasingly sophisticated, the use of diverse data types and advanced deep learning models is essential in contemporary detection methods. However, integrating various data types can cause compatibility issues, posing challenges for deep learning techniques. Furthermore, there is potential to enhance model accuracy through the careful selection of data types. To address these issues, this paper proposes a novel encoder-based multimodal ensemble learning approach to achieve high compatibility and accuracy in phishing website detection. Our method leverages two features: URLs and text content extracted from a single data source, HTML. HTML builds a crucial foundation, and these features are the most effective ones that illustrate every component of a website. Therefore, selecting these features from the single data source contributes to enhancing not only reliability but also compatibility of our model. Since both features are text-based and sequential, we employ Bidirectional Encoder Representations from Transformers (BERT) for its superior performance in handling such data. Comprehensive experiments demonstrate that our model achieves a classification accuracy of 98.9%, surpassing both our baseline models and existing detection methods.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.