Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Efficient Features for Function Matching in Multi-Architecture Binary Executablesopen access

Authors
Ullah, SamiJin, WenhuiOh, Heekuck
Issue Date
Aug-2021
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Feature extraction; Semantics; Optimization; Tools; Syntactics; Malware; Computer architecture; Binary diffing; efficient features; function matching; multi-architecture
Citation
IEEE Access, v.9, pp 104950 - 104968
Pages
19
Indexed
SCIE
SCOPUS
Journal Title
IEEE Access
Volume
9
Start Page
104950
End Page
104968
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/116254
DOI
10.1109/ACCESS.2021.3099429
ISSN
2169-3536
Abstract
Binary-binary function matching problem serves as a plinth in many reverse engineering techniques such as binary diffing, malware analysis, and code plagiarism detection. In literature, function matching is performed by first extracting function features (syntactic and semantic), and later these features are used as selection criteria to formulate an approximate 1:1 correspondence between binary functions. The accuracy of the approximation is dependent on the selection of efficient features. Although substantial research has been conducted on this topic, we have explored two major drawbacks in previous research. (i) The features are optimized only for a single architecture and their matching efficiency drops for other architectures. (ii) function matching algorithms mainly focus on the structural properties of a function, which are not inherently resilient against compiler optimizations. To resolve the architecture dependency and compiler optimizations, we benefit from the intermediate representation (IR) of function assembly and propose a set of syntactic and semantic (embedding-based) features which are efficient for multi-architectures, and sensitive to compiler-based optimizations. The proposed function matching algorithm employs one-shot encoding that is flexible to small changes and uses a KNN based approach to effectively map similar functions. We have evaluated proposed features and algorithms using various binaries, which were compiled for x86 and ARM architectures; and the prototype implementation is compared with Diaphora (an industry-standard tool), and other baseline research. Our proposed prototype has achieved a matching accuracy of approx. 96%, which is higher than the compared tools and consistent against optimizations and multi-architecture binaries.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Oh, Hee kuck photo

Oh, Hee kuck
ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE