Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

BinDiff( NN): Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences

Authors
Ullah, SamiOh, Heekuck
Issue Date
Sep-2022
Publisher
Institute of Electrical and Electronics Engineers
Keywords
Asm2Vec; attention network; binary diffing; exact match; Inst2vec; partial match; siamese neural network
Citation
IEEE Transactions on Software Engineering, v.48, no.9, pp 3442 - 3466
Pages
25
Indexed
SCIE
SCOPUS
Journal Title
IEEE Transactions on Software Engineering
Volume
48
Number
9
Start Page
3442
End Page
3466
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/112784
DOI
10.1109/TSE.2021.3093926
ISSN
0098-5589
1939-3520
Abstract
Binary diffing is a process to discover the differences and similarities in functionality between two binary programs. Previous research on binary diffing approaches it as a function matching problem to formulate an initial 1:1 mapping between functions, and later a sequence matching ratio is computed to classify two functions being an exact match, a partial match or no-match. The accuracy of existing techniques is best only when detecting exact matches and they are not efficient in detecting partially changed functions; especially those with minor patches. These drawbacks are due to two major challenges (i) In the 1:1 mapping phase, using a strict policy to match function features (ii) In the classification phase, considering an assembly snippet as a normal text, and using sequence matching for similarity comparison. Instruction has a unique structure i.e. mnemonics and registers have a specific position in instruction and also have a semantic relationship, which makes assembly code different from general text. Sequence matching performs best for general text but it fails to detect structural and semantic changes at an instruction level thus, its use for classification produces many false results. In this research, we have addressed the aforementioned underlying challenges by proposing a two-fold solution. For the 1:1 mapping phase, we have proposed computationally inexpensive features, which are compared with distance-based selection criteria to map similar functions and filter unmatched functions. For the classification phase, we have proposed a Siamese binary-classification neural network where each branch is an attention-based distributed learning embedding neural network - that learn the semantic similarity among assembly instructions, learn to highlight the changes at an instruction level and a final stage fully connected layer learn to accurately classify two 1:1 mapped function either an exact or a partial match. We have used x86 kernel binaries for training and achieved similar to 99% classification accuracy; which is higher than existing binary diffing techniques and tools.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Oh, Hee kuck photo

Oh, Hee kuck
ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE