期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Binary Code Similarity Detection:Retrospective Review and Future Directions
1
作者 Shengjia Chang Baojiang Cui Shaocong Feng 《Computers, Materials & Continua》 2025年第12期4345-4374,共30页
Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation va... Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation variations,and obfuscation.Recent advances in artificial intelligence—particularly natural language processing(NLP),graph representation learning(GRL),and large language models(LLMs)—have markedly improved accuracy,enabling better recognition of code variants and deeper semantic understanding.This paper presents a comprehensive review of 82 studies published between 1975 and 2025,systematically tracing the historical evolution of BCSD and analyzing the progressive incorporation of artificial intelligence(AI)techniques.Particular emphasis is placed on the role of LLMs,which have recently emerged as transformative tools in advancing semantic representation and enhancing detection performance.The review is organized around five central research questions:(1)the chronological development and milestones of BCSD;(2)the construction of AI-driven technical roadmaps that chart methodological transitions;(3)the design and implementation of general analytical workflows for binary code analysis;(4)the applicability,strengths,and limitations of LLMs in capturing semantic and structural features of binary code;and(5)the persistent challenges and promising directions for future investigation.By synthesizing insights across these dimensions,the study demonstrates how LLMs reshape the landscape of binary code analysis,offering unprecedented opportunities to improve accuracy,scalability,and adaptability in real-world scenarios.This review not only bridges a critical gap in the existing literature but also provides a forward-looking perspective,serving as a valuable reference for researchers and practitioners aiming to advance AI-powered BCSD methodologies and applications. 展开更多
关键词 Binary code similarity detection semantic code representation graph-based modeling representation learning large language models
在线阅读 下载PDF
Gradient-Guided Assembly Instruction Relocation for Adversarial Attacks Against Binary Code Similarity Detection
2
作者 Ran Wei Hui Shu 《Computers, Materials & Continua》 2026年第1期1372-1394,共23页
Transformer-based models have significantly advanced binary code similarity detection(BCSD)by leveraging their semantic encoding capabilities for efficient function matching across diverse compilation settings.Althoug... Transformer-based models have significantly advanced binary code similarity detection(BCSD)by leveraging their semantic encoding capabilities for efficient function matching across diverse compilation settings.Although adversarial examples can strategically undermine the accuracy of BCSD models and protect critical code,existing techniques predominantly depend on inserting artificial instructions,which incur high computational costs and offer limited diversity of perturbations.To address these limitations,we propose AIMA,a novel gradient-guided assembly instruction relocation method.Our method decouples the detection model into tokenization,embedding,and encoding layers to enable efficient gradient computation.Since token IDs of instructions are discrete and nondifferentiable,we compute gradients in the continuous embedding space to evaluate the influence of each token.The most critical tokens are identified by calculating the L2 norm of their embedding gradients.We then establish a mapping between instructions and their corresponding tokens to aggregate token-level importance into instructionlevel significance.To maximize adversarial impact,a sliding window algorithm selects the most influential contiguous segments for relocation,ensuring optimal perturbation with minimal length.This approach efficiently locates critical code regions without expensive search operations.The selected segments are relocated outside their original function boundaries via a jump mechanism,which preserves runtime control flow and functionality while introducing“deletion”effects in the static instruction sequence.Extensive experiments show that AIMA reduces similarity scores by up to 35.8%in state-of-the-art BCSD models.When incorporated into training data,it also enhances model robustness,achieving a 5.9%improvement in AUROC. 展开更多
关键词 Assembly instruction relocation adversary attack binary code similarity detection
在线阅读 下载PDF
Unleashing the power of pseudo-code for binary code similarity analysis
3
作者 Weiwei Zhang Zhengzi Xu +1 位作者 Yang Xiao Yinxing Xue 《Cybersecurity》 EI CSCD 2023年第2期44-61,共18页
Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain... Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain under most circumstances,binary-level code similarity analysis(BCSA)has been paid much attention to.In recent years,many BCSA studies incorporating Al techniques focus on deriving semantic information from binary functions with code representations such as assembly code,intermediate representations,and control flow graphs to measure the similarity.However,due to the impacts of different compilers,architectures,and obfuscations,binaries compiled from the same source code may vary considerably,which becomes the major obstacle for these works to obtain robust features.In this paper,we propose a solution,named UPPC(Unleashing the Power of Pseudo-code),which leverages the pseudo-code of binary function as input,to address the binary code similarity analysis challenge,since pseudocode has higher abstraction and is platform-independent compared to binary instructions.UPPC selectively inlines the functions to capture the full function semantics across different compiler optimization levels and uses a deep pyramidal convolutional neural network to obtain the semantic embedding of the function.We evaluated UPPC on a data set containing vulnerabilities and a data set including different architectures(X86,ARM),different optimization options(O0-O3),different compilers(GCC,Clang),and four obfuscation strategies.The experimental results show that the accuracy of UPPC in function search is 33.2%higher than that of existing methods. 展开更多
关键词 Binary code similarity Machine learning Software security PSEUDO-code
原文传递
WGO: a similarly encoded whale-goshawk optimization algorithm for uncertain cloud manufacturing service composition
4
作者 Kezhou Chen Tao Wang +1 位作者 Huimin Zhuo Lianglun Cheng 《Autonomous Intelligent Systems》 2025年第1期302-314,共13页
Service Composition and Optimization Selection(SCOS)is crucial in Cloud Manufacturing(CMfg),but the uncertainties in service states and working environments pose challenges for existing QoS-based methods.Recently,digi... Service Composition and Optimization Selection(SCOS)is crucial in Cloud Manufacturing(CMfg),but the uncertainties in service states and working environments pose challenges for existing QoS-based methods.Recently,digital twins have gained prominence in CMfg due to their predictive capabilities,enhancing the reliability of service composition.Heuristic algorithms are widely used in this field for their flexibility and compatibility with uncertain environments.This paper proposes the Whale-Goshawk Optimization Algorithm(WGO),which combines the Whale Optimization Algorithm(WOA)and Northern Goshawk Optimization Algorithm(NGO).A novel similar integer coding method,incorporating spatial feature information,addresses the limitations of traditional integer coding,while a whale-optimized prey generation strategy improves NGO’s global optimization efficiency.Additionally,a local search method based on similar integer coding enhances WGO’s local search ability.Experimental results demonstrate the effectiveness of the proposed approach. 展开更多
关键词 Service composition Similar integer coding QOS Cloud manufacturing Uncertain environment
原文传递
Eth2Vec:Learning contract-wide code representations for vulnerability detection on Ethereum smart contracts 被引量:1
5
作者 Nami Ashizawa Naoto Yanai +1 位作者 Jason Paul Cruz Shingo Okamura 《Blockchain(Research and Applications)》 2022年第4期109-122,共14页
Ethereum smart contracts are computer programs that are deployed and executed on the Ethereum blockchain to enforce agreements among untrusting parties.Being the most prominent platform that supports smart contracts,E... Ethereum smart contracts are computer programs that are deployed and executed on the Ethereum blockchain to enforce agreements among untrusting parties.Being the most prominent platform that supports smart contracts,Ethereum has been targeted by many attacks and plagued by security incidents.Consequently,many smart contract vulnerabilities have been discovered in the past decade.To detect and prevent such vulnerabilities,different security analysis tools,including static and dynamic analysis tools,have been created,but their performance decreases drastically when codes to be analyzed are constantly being rewritten.In this paper,we propose Eth2Vec,a machine-learning-based static analysis tool that detects smart contract vulnerabilities.Eth2Vec maintains its robustness against code rewrites;i.e.,it can detect vulnerabilities even in rewritten codes.Other machine-learning-based static analysis tools require features,which analysts create manually,as inputs.In contrast,Eth2Vec uses a neural network for language processing to automatically learn the features of vulnerable contracts.In doing so,Eth2Vec can detect vulnerabilities in smart contracts by comparing the similarities between the codes of a target contract and those of the learned contracts.We performed experiments with existing open databases,such as Etherscan,and Eth2Vec was able to outperform a recent model based on support vector machine in terms of well-known metrics,i.e.,precision,recall,and F1-score. 展开更多
关键词 Ethereum Smart contracts Blockchain Neural networks Static analysis code similarity Vulnerability detection
原文传递
Detection of semantically similar code 被引量:1
6
作者 Tiantian WANG Kechao WANG +1 位作者 Xiaohong SU Peijun MA 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第6期996-1011,共16页
The traditional similar code detection approaches are limited in detecting semantically similar codes, impeding their applications in practice. In this paper, we have improved the traditional metrics-based approach as... The traditional similar code detection approaches are limited in detecting semantically similar codes, impeding their applications in practice. In this paper, we have improved the traditional metrics-based approach as well as the graph- based approach and presented a metrics-based and graph- based combined approach. First, source codes are represented as augmented system dependence graphs. Then, metrics- based candidate similar code extraction is performed to filter out most of the dissimilar code pairs so as to lower the computational complexity. After that, code normalization is performed on the candidate similar codes to remove code variations so as to detect similar code at the semantic level. Finally, program matching is performed on the normalized control dependence trees to output semantically similar codes. Experiment results show that our approach can detect similar codes with code variations, and it can be applied to large software. 展开更多
关键词 similar code detection system dependence graph code normalization semantically equivalent
原文传递
Learning Human-Written Commit Messages to Document Code Changes
7
作者 Yuan Huang Nan Jia +3 位作者 Hao-Jie Zhou Xiang-Ping Chen Zi-Bin Zheng Ming-Dong Tang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第6期1258-1277,共20页
Commit messages are important complementary information used in understanding code changes. To address message scarcity, some work is proposed for automatically generating commit messages. However, most of these appro... Commit messages are important complementary information used in understanding code changes. To address message scarcity, some work is proposed for automatically generating commit messages. However, most of these approaches focus on generating summary of the changed software entities at the superficial level, without considering the intent behind the code changes (e.g., the existing approaches cannot generate such message:"fixing 'null' pointer exception"). Considering developers often describe the intent behind the code change when writing the messages, we propose ChangeDoc, an approach to reuse existing messages in version control systems for automatical commit message generation. Our approach includes syntax, semantic, pre-syntax, and pre-semantic similarities. For a given commit without messages, it is able to discover its most similar past commit from a large commit repository, and recommend its message as the message of the given commit. Our repository contains half a million commits that were collected from SourceForge. We evaluate our approach on the commits from 10 projects. The results show that 21.5% of the recommended messages by ChangeDoc can be directly used without modification, and 62.8% require minor modifications. In order to evaluate the quality of the commit messages recommended by ChangeDoc, we performed two empirical studies involving a total of 40 participants (10 professional developers and 30 students). The results indicate that the recommended messages are very good approximations of the ones written by developers and often include important intent information that is not included in the messages generated by other tools. 展开更多
关键词 commit message recommendation code syntax similarity code semantic similarity code change comprehension
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部