PhishHaven - An Efficient Real-Time AI Phishing URLs Detection System
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Sameen M. | - |
dc.contributor.author | Han K. | - |
dc.contributor.author | Hwang S.O. | - |
dc.date.available | 2020-10-20T06:42:36Z | - |
dc.date.created | 2020-06-03 | - |
dc.date.issued | 2020-04 | - |
dc.identifier.issn | 2169-3536 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/78496 | - |
dc.description.abstract | Different machine learning and deep learning-based approaches have been proposed for designing defensive mechanisms against various phishing attacks. Recently, researchers showed that phishing attacks can be performed by employing a deep neural network-based phishing URL generating system called DeepPhish. To prevent this kind of attack, we design an ensemble machine learning-based detection system called PhishHaven to identify AI-generated as well as human-crafted phishing URLs. To the best of our knowledge, this is the first study to consider detecting phishing attacks by both AI and human attackers. PhishHaven employs lexical analysis for feature extraction. To further enhance lexical analysis, we introduce URL HTML Encoding to classify URL on-the-fly and proactively compare with some of the existing methods. We also introduce a URL Hit approach to deal with tiny URLs, which is an open problem yet to be solved. Moreover, the final classification of URLs is made on an unbiased voting mechanism in PhishHaven, which aims to avoid misclassification when the number of votes is equal. To speed up the ensemble-based machine learning models, PhishHaven employs a multi-threading approach to execute the classification in parallel, leading to real-time detection. Theoretical analysis of our solution shows that (1) it can always detect tiny URLs, and (2) it can detect future AI-generated Phishing URLs based on our selected lexical features with 100% accuracy. Through experiments, we analyze our solution with a benchmark dataset of 100,000 phishing and normal URLs. The results show that PhishHaven can achieve 98.00% accuracy, outperforming the existing lexical-based human-crafted phishing URLs detection systems. © 2013 IEEE. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.relation.isPartOf | IEEE Access | - |
dc.title | PhishHaven - An Efficient Real-Time AI Phishing URLs Detection System | - |
dc.type | Article | - |
dc.type.rims | ART | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 000549502200142 | - |
dc.identifier.doi | 10.1109/ACCESS.2020.2991403 | - |
dc.identifier.bibliographicCitation | IEEE Access, v.8, pp.83425 - 83443 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.scopusid | 2-s2.0-85085127890 | - |
dc.citation.endPage | 83443 | - |
dc.citation.startPage | 83425 | - |
dc.citation.title | IEEE Access | - |
dc.citation.volume | 8 | - |
dc.contributor.affiliatedAuthor | Sameen M. | - |
dc.contributor.affiliatedAuthor | Hwang S.O. | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | AI-generated phishing URLs | - |
dc.subject.keywordAuthor | ensemble machine learning | - |
dc.subject.keywordAuthor | human-crafted phishing URLs | - |
dc.subject.keywordAuthor | lexical features | - |
dc.subject.keywordAuthor | multi-threading | - |
dc.subject.keywordAuthor | tiny URLs | - |
dc.subject.keywordAuthor | URL HTML encoding | - |
dc.subject.keywordAuthor | voting | - |
dc.subject.keywordPlus | Computational linguistics | - |
dc.subject.keywordPlus | Computer crime | - |
dc.subject.keywordPlus | Deep learning | - |
dc.subject.keywordPlus | Deep neural networks | - |
dc.subject.keywordPlus | Feature extraction | - |
dc.subject.keywordPlus | Information dissemination | - |
dc.subject.keywordPlus | Benchmark datasets | - |
dc.subject.keywordPlus | Defensive mechanism | - |
dc.subject.keywordPlus | Generating system | - |
dc.subject.keywordPlus | Learning-based approach | - |
dc.subject.keywordPlus | Machine learning models | - |
dc.subject.keywordPlus | Misclassifications | - |
dc.subject.keywordPlus | Real-time detection | - |
dc.subject.keywordPlus | Voting mechanism | - |
dc.subject.keywordPlus | Learning systems | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.