BinDiff( NN): Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ullah, Sami | - |
dc.contributor.author | Oh, Heekuck | - |
dc.date.accessioned | 2023-05-03T09:48:15Z | - |
dc.date.available | 2023-05-03T09:48:15Z | - |
dc.date.issued | 2022-09 | - |
dc.identifier.issn | 0098-5589 | - |
dc.identifier.issn | 1939-3520 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/112784 | - |
dc.description.abstract | Binary diffing is a process to discover the differences and similarities in functionality between two binary programs. Previous research on binary diffing approaches it as a function matching problem to formulate an initial 1:1 mapping between functions, and later a sequence matching ratio is computed to classify two functions being an exact match, a partial match or no-match. The accuracy of existing techniques is best only when detecting exact matches and they are not efficient in detecting partially changed functions; especially those with minor patches. These drawbacks are due to two major challenges (i) In the 1:1 mapping phase, using a strict policy to match function features (ii) In the classification phase, considering an assembly snippet as a normal text, and using sequence matching for similarity comparison. Instruction has a unique structure i.e. mnemonics and registers have a specific position in instruction and also have a semantic relationship, which makes assembly code different from general text. Sequence matching performs best for general text but it fails to detect structural and semantic changes at an instruction level thus, its use for classification produces many false results. In this research, we have addressed the aforementioned underlying challenges by proposing a two-fold solution. For the 1:1 mapping phase, we have proposed computationally inexpensive features, which are compared with distance-based selection criteria to map similar functions and filter unmatched functions. For the classification phase, we have proposed a Siamese binary-classification neural network where each branch is an attention-based distributed learning embedding neural network - that learn the semantic similarity among assembly instructions, learn to highlight the changes at an instruction level and a final stage fully connected layer learn to accurately classify two 1:1 mapped function either an exact or a partial match. We have used x86 kernel binaries for training and achieved similar to 99% classification accuracy; which is higher than existing binary diffing techniques and tools. | - |
dc.format.extent | 25 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.title | BinDiff( NN): Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/TSE.2021.3093926 | - |
dc.identifier.scopusid | 2-s2.0-85111022125 | - |
dc.identifier.wosid | 000854591500014 | - |
dc.identifier.bibliographicCitation | IEEE Transactions on Software Engineering, v.48, no.9, pp 3442 - 3466 | - |
dc.citation.title | IEEE Transactions on Software Engineering | - |
dc.citation.volume | 48 | - |
dc.citation.number | 9 | - |
dc.citation.startPage | 3442 | - |
dc.citation.endPage | 3466 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Software Engineering | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.subject.keywordAuthor | Asm2Vec | - |
dc.subject.keywordAuthor | attention network | - |
dc.subject.keywordAuthor | binary diffing | - |
dc.subject.keywordAuthor | exact match | - |
dc.subject.keywordAuthor | Inst2vec | - |
dc.subject.keywordAuthor | partial match | - |
dc.subject.keywordAuthor | siamese neural network | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/9470904 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.