Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

정적 특징 기반 랜섬웨어 탐지를 위한 특징 중요도 알고리즘 비교 및 특징 선정 연구open accessA Comparative Study of Feature Importance Algorithms and Feature Selection for Static Feature-Based Ransomware Detection

Other Titles
A Comparative Study of Feature Importance Algorithms and Feature Selection for Static Feature-Based Ransomware Detection
Authors
전혜민최두섭임을규
Issue Date
Aug-2025
Publisher
한국정보처리학회
Keywords
랜섬웨어; 특징 중요도; 정적 특징; 머신러닝; Ransomware; Feature Importance; Static Feature; Machine Learning
Citation
정보처리학회 논문지, v.14, no.8, pp 576 - 587
Pages
12
Indexed
KCI
Journal Title
정보처리학회 논문지
Volume
14
Number
8
Start Page
576
End Page
587
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211571
DOI
10.3745/TKIPS.2025.14.8.576
ISSN
3022-7011
3022-7011
Abstract
본 논문에서는 랜섬웨어 PE 파일 헤더 정보, 섹션 크기, 가상 메모리 크기 등 54개의 정적 특징을 추출하여 Gain Ratio, Information Gain, Gini Importance, Mutual Information 네 가지 알고리즘으로 중요도를 평가하고, 각 알고리즘 상위 K값에 해당하는 특징 집합을 구성해 Random Forest, Decision Tree, Support Vector Machine, Multi-Layer Perceptron 네 가지 분류 모델을 학습·검증하였다. 실험 결과, RF 모델이 Gain Ratio 기반 K=0.01에서 41개 특징으로 99.33%의 최고 정확도를 달성했으며, DT(98.67%), SVM(96.67%), MLP(98.75%) 등 모든 모델이 96% 이상의 성능을 보였다. 이를 통해 특징 수 조정으로 학습 자원을 적게 사용하면서 높은 탐지 정확도를 확인하였다.
In this paper, we extract 54 static features from ransomware PE files-including header metadata, section sizes, and virtual memory sizes-and evaluate their importance using four algorithms: Gain Ratio, Information Gain, Gini Importance, and Mutual Information. For each algorithm, we select the top-K features to form a reduced feature set, which is then used to train and validate four classification models: Random Forest, Decision Tree, Support Vector Machine, and Multi-Layer Perceptron. Experimental results show that the Random Forest model, using 41 features selected by a Gain Ratio threshold of K = 0.01, achieves the highest accuracy of 99.33%. The Decision Tree, SVM, and MLP models also demonstrate strong performance with accuracies of 98.67%, 96.67%, and 98.75%, respectively. These findings confirm that careful feature selection can substantially reduce computational costs while maintaining high detection accuracy.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Im, Eul Gyu photo

Im, Eul Gyu
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE