Empirical feature learning in application-based samples: A case studyopen access
- Authors
- Nguyen-Vu, Long; Jung, Souhwan
- Issue Date
- Oct-2022
- Publisher
- ELSEVIER
- Keywords
- Feature learning; Malware detection; Model selection; Neural network; Mobile security
- Citation
- JOURNAL OF COMPUTATIONAL SCIENCE, v.64
- Journal Title
- JOURNAL OF COMPUTATIONAL SCIENCE
- Volume
- 64
- URI
- http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/43336
- DOI
- 10.1016/j.jocs.2022.101839
- ISSN
- 1877-7503
- Abstract
- In machine learning, feature selection is the intrinsic factor that contributes to the success of a model. Due to the rise of deep learning in recent years, this process sometimes can be overlooked. The reason is because neural networks enable automatic feature extraction - the ability to select relevant features from the feature space without manual intervention. While this is a powerful technique when the data samples are images, it is not straight-forward in application-based samples. In this work, we explore the use of machine learning in Android application classification. We start with a research question: "Given a classification problem on the same dataset, why do several proposed models achieve considerably good performance, despite the fact that they use different training features?". We hypothesize two reasons for this phenomenon: (1) the models overfit the dataset in question, and (2) the features are non-i.i.d (Independent and Identically Distributed). We confirm these cases by reviewing previous studies on the same datasets. By analyzing the mapping between components in Android, we conclude that the strong correlations among them allow machine learning models to learn efficiently with just the subset of those features. Experiments are conducted to realize that a feedforward DNN or CNN, when provided with sufficient features, can generalize as well as complicated ones. The findings show that it is possible to train application classifiers on less features and simpler network architectures with inconsiderable performance degradation.
- Files in This Item
-
Go to Link
- Appears in
Collections - ETC > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/43336)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.