PhishHaven - An Efficient Real-Time AI Phishing URLs Detection System

Sameen M.; Han K.; Hwang S.O.

Detailed Information

Cited 19 time in webofscience

Cited 37 time in scopus

Metadata Downloads

PhishHaven - An Efficient Real-Time AI Phishing URLs Detection System

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sameen M.	-
dc.contributor.author	Han K.	-
dc.contributor.author	Hwang S.O.	-
dc.date.available	2020-10-20T06:42:36Z	-
dc.date.created	2020-06-03	-
dc.date.issued	2020-04	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/78496	-
dc.description.abstract	Different machine learning and deep learning-based approaches have been proposed for designing defensive mechanisms against various phishing attacks. Recently, researchers showed that phishing attacks can be performed by employing a deep neural network-based phishing URL generating system called DeepPhish. To prevent this kind of attack, we design an ensemble machine learning-based detection system called PhishHaven to identify AI-generated as well as human-crafted phishing URLs. To the best of our knowledge, this is the first study to consider detecting phishing attacks by both AI and human attackers. PhishHaven employs lexical analysis for feature extraction. To further enhance lexical analysis, we introduce URL HTML Encoding to classify URL on-the-fly and proactively compare with some of the existing methods. We also introduce a URL Hit approach to deal with tiny URLs, which is an open problem yet to be solved. Moreover, the final classification of URLs is made on an unbiased voting mechanism in PhishHaven, which aims to avoid misclassification when the number of votes is equal. To speed up the ensemble-based machine learning models, PhishHaven employs a multi-threading approach to execute the classification in parallel, leading to real-time detection. Theoretical analysis of our solution shows that (1) it can always detect tiny URLs, and (2) it can detect future AI-generated Phishing URLs based on our selected lexical features with 100% accuracy. Through experiments, we analyze our solution with a benchmark dataset of 100,000 phishing and normal URLs. The results show that PhishHaven can achieve 98.00% accuracy, outperforming the existing lexical-based human-crafted phishing URLs detection systems. © 2013 IEEE.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.relation.isPartOf	IEEE Access	-
dc.title	PhishHaven - An Efficient Real-Time AI Phishing URLs Detection System	-
dc.type	Article	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.identifier.wosid	000549502200142	-
dc.identifier.doi	10.1109/ACCESS.2020.2991403	-
dc.identifier.bibliographicCitation	IEEE Access, v.8, pp.83425 - 83443	-
dc.description.isOpenAccess	N	-
dc.identifier.scopusid	2-s2.0-85085127890	-
dc.citation.endPage	83443	-
dc.citation.startPage	83425	-
dc.citation.title	IEEE Access	-
dc.citation.volume	8	-
dc.contributor.affiliatedAuthor	Sameen M.	-
dc.contributor.affiliatedAuthor	Hwang S.O.	-
dc.type.docType	Article	-
dc.subject.keywordAuthor	AI-generated phishing URLs	-
dc.subject.keywordAuthor	ensemble machine learning	-
dc.subject.keywordAuthor	human-crafted phishing URLs	-
dc.subject.keywordAuthor	lexical features	-
dc.subject.keywordAuthor	multi-threading	-
dc.subject.keywordAuthor	tiny URLs	-
dc.subject.keywordAuthor	URL HTML encoding	-
dc.subject.keywordAuthor	voting	-
dc.subject.keywordPlus	Computational linguistics	-
dc.subject.keywordPlus	Computer crime	-
dc.subject.keywordPlus	Deep learning	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Feature extraction	-
dc.subject.keywordPlus	Information dissemination	-
dc.subject.keywordPlus	Benchmark datasets	-
dc.subject.keywordPlus	Defensive mechanism	-
dc.subject.keywordPlus	Generating system	-
dc.subject.keywordPlus	Learning-based approach	-
dc.subject.keywordPlus	Machine learning models	-
dc.subject.keywordPlus	Misclassifications	-
dc.subject.keywordPlus	Real-time detection	-
dc.subject.keywordPlus	Voting mechanism	-
dc.subject.keywordPlus	Learning systems	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Hwang, Seong Oun photo

Hwang, Seong Oun: College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,206,235; Today View :726

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE