정적 특징 기반 랜섬웨어 탐지를 위한 특징 중요도 알고리즘 비교 및 특징 선정 연구

전혜민; 최두섭; 임을규

doi:10.3745/TKIPS.2025.14.8.576

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

정적 특징 기반 랜섬웨어 탐지를 위한 특징 중요도 알고리즘 비교 및 특징 선정 연구open accessA Comparative Study of Feature Importance Algorithms and Feature Selection for Static Feature-Based Ransomware Detection

Other Titles: A Comparative Study of Feature Importance Algorithms and Feature Selection for Static Feature-Based Ransomware Detection

Authors: 전혜민; 최두섭; 임을규

Issue Date: Aug-2025

Publisher: 한국정보처리학회

Keywords: 랜섬웨어; 특징 중요도; 정적 특징; 머신러닝; Ransomware; Feature Importance; Static Feature; Machine Learning

Citation: 정보처리학회 논문지, v.14, no.8, pp 576 - 587

Pages: 12

Indexed: KCI

Journal Title: 정보처리학회 논문지

Volume: 14

Number: 8

Start Page: 576

End Page: 587

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211571

DOI: 10.3745/TKIPS.2025.14.8.576

ISSN: 3022-7011
3022-7011

Abstract: 본 논문에서는 랜섬웨어 PE 파일 헤더 정보, 섹션 크기, 가상 메모리 크기 등 54개의 정적 특징을 추출하여 Gain Ratio, Information Gain, Gini Importance, Mutual Information 네 가지 알고리즘으로 중요도를 평가하고, 각 알고리즘 상위 K값에 해당하는 특징 집합을 구성해 Random Forest, Decision Tree, Support Vector Machine, Multi-Layer Perceptron 네 가지 분류 모델을 학습·검증하였다. 실험 결과, RF 모델이 Gain Ratio 기반 K=0.01에서 41개 특징으로 99.33%의 최고 정확도를 달성했으며, DT(98.67%), SVM(96.67%), MLP(98.75%) 등 모든 모델이 96% 이상의 성능을 보였다. 이를 통해 특징 수 조정으로 학습 자원을 적게 사용하면서 높은 탐지 정확도를 확인하였다.
In this paper, we extract 54 static features from ransomware PE files-including header metadata, section sizes, and virtual memory sizes-and evaluate their importance using four algorithms: Gain Ratio, Information Gain, Gini Importance, and Mutual Information. For each algorithm, we select the top-K features to form a reduced feature set, which is then used to train and validate four classification models: Random Forest, Decision Tree, Support Vector Machine, and Multi-Layer Perceptron. Experimental results show that the Random Forest model, using 41 features selected by a Gain Ratio threshold of K = 0.01, achieves the highest accuracy of 99.33%. The Decision Tree, SVM, and MLP models also demonstrate strong performance with accuracies of 98.67%, 96.67%, and 98.75%, respectively. These findings confirm that careful feature selection can substantially reduce computational costs while maintaining high detection accuracy.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Im, Eul Gyu photo

Im, Eul Gyu: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE