期刊文献+
共找到6篇文章
< 1 >
每页显示 20 50 100
HCRVD: A Vulnerability Detection System Based on CST-PDG Hierarchical Code Representation Learning 被引量:1
1
作者 Zhihui Song Jinchen Xu +1 位作者 Kewei Li Zheng Shan 《Computers, Materials & Continua》 SCIE EI 2024年第6期4573-4601,共29页
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation... Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality. 展开更多
关键词 Vulnerability detection deep learning CST-PDG code representation tree-graph-gated-attention network CROSS-LANGUAGE
在线阅读 下载PDF
C-CORE:Clustering by Code Representation to Prioritize Test Cases in Compiler Testing
2
作者 Wei Zhou Xincong Jiang Chuan Qin 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第5期2069-2093,共25页
Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount impo... Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount importance in the emerging field of edge AI.One widely used testing method for this purpose is fuzz testing,which detects bugs by inputting random test cases into the target program.However,this process consumes significant time and resources.To improve the efficiency of compiler fuzz testing,it is common practice to utilize test case prioritization techniques.Some researchers use machine learning to predict the code coverage of test cases,aiming to maximize the test capability for the target compiler by increasing the overall predicted coverage of the test cases.Nevertheless,these methods can only forecast the code coverage of the compiler at a specific optimization level,potentially missing many optimization-related bugs.In this paper,we introduce C-CORE(short for Clustering by Code Representation),the first framework to prioritize test cases according to their code representations,which are derived directly from the source codes.This approach avoids being limited to specific compiler states and extends to a broader range of compiler bugs.Specifically,we first train a scaled pre-trained programming language model to capture as many common features as possible from the test cases generated by a fuzzer.Using this pre-trained model,we then train two downstream models:one for predicting the likelihood of triggering a bug and another for identifying code representations associated with bugs.Subsequently,we cluster the test cases according to their code representations and select the highest-scoring test case from each cluster as the high-quality test case.This reduction in redundant testing cases leads to time savings.Comprehensive evaluation results reveal that code representations are better at distinguishing test capabilities,and C-CORE significantly enhances testing efficiency.Across four datasets,C-CORE increases the average of the percentage of faults detected(APFD)value by 0.16 to 0.31 and reduces test time by over 50% in 46% of cases.When compared to the best results from approaches using predicted code coverage,C-CORE improves the APFD value by 1.1% to 12.3% and achieves an overall time-saving of 159.1%. 展开更多
关键词 Compiler testing test case prioritization code representation
在线阅读 下载PDF
A survey of binary code representation technology
3
作者 Taiyan WANG Qingsong XIE +2 位作者 Lu YU Zulie PAN Min ZHANG 《Frontiers of Information Technology & Electronic Engineering》 2025年第5期671-694,共24页
Binary analysis, as an important foundational technology, provides support for numerous applications in the fields of software engineering and security research. With the continuous expansion of software scale and the... Binary analysis, as an important foundational technology, provides support for numerous applications in the fields of software engineering and security research. With the continuous expansion of software scale and the complex evolution of software architecture, binary analysis technology is facing new challenges. To break through existing bottlenecks, researchers have applied artificial intelligence (AI) technology to the understanding and analysis of binary code. The core lies in characterizing binary code, i.e., how to use intelligent methods to generate representation vectors containing semantic information for binary code, and apply them to multiple downstream tasks of binary analysis. In this paper, we provide a comprehensive survey of recent advances in binary code representation technology, and introduce the workflow of existing research in two parts, i.e., binary code feature selection methods and binary code feature embedding methods. The feature selection section includes mainly two parts: definition and classification of features, and feature construction. First, the abstract definition and classification of features are systematically explained, and second, the process of constructing specific representations of features is introduced in detail. In the feature embedding section, based on the different intelligent semantic understanding models used, the embedding methods are classified into four categories based on the usage of text-embedding models and graph-embedding models. Finally, we summarize the overall development of existing research and provide prospects for some potential research directions related to binary code representation technology. 展开更多
关键词 Binary analysis Binary code representation Binary code feature selection Binary code feature embedding
原文传递
Bug localization based on syntactical and semantic information of source code
4
作者 YAN Xuefeng CHENG Shasha GUO Liqin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第1期236-246,共11页
The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactic... The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank. 展开更多
关键词 bug report abstract syntax tree code representation software bug localization
在线阅读 下载PDF
A Depth Video Coding In-Loop Median Filter Based on Joint Weighted Sparse Representation
5
作者 Lü Haitao YIN Cao +1 位作者 CUI Zongmin HU Jinhui 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2016年第4期351-357,共7页
The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representat... The existing depth video coding algorithms are generally based on in-loop depth filters, whose performance are unstable and easily affected by the outliers. In this paper, we design a joint weighted sparse representation-based median filter as the in-loop filter in depth video codec. It constructs depth candidate set which contains relevant neighboring depth pixel based on depth and intensity similarity weighted sparse coding, then the median operation is performed on this set to select a neighboring depth pixel as the result of the filtering. The experimental results indicate that the depth bitrate is reduced by about 9% compared with anchor method. It is confirmed that the proposed method is more effective in reducing the required depth bitrates for a given synthesis quality level. 展开更多
关键词 depth video coding virtual view synthesis joint weighted sparse representation
原文传递
Binary Code Similarity Detection:Retrospective Review and Future Directions
6
作者 Shengjia Chang Baojiang Cui Shaocong Feng 《Computers, Materials & Continua》 2025年第12期4345-4374,共30页
Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation va... Binary Code Similarity Detection(BCSD)is vital for vulnerability discovery,malware detection,and software security,especially when source code is unavailable.Yet,it faces challenges from semantic loss,recompilation variations,and obfuscation.Recent advances in artificial intelligence—particularly natural language processing(NLP),graph representation learning(GRL),and large language models(LLMs)—have markedly improved accuracy,enabling better recognition of code variants and deeper semantic understanding.This paper presents a comprehensive review of 82 studies published between 1975 and 2025,systematically tracing the historical evolution of BCSD and analyzing the progressive incorporation of artificial intelligence(AI)techniques.Particular emphasis is placed on the role of LLMs,which have recently emerged as transformative tools in advancing semantic representation and enhancing detection performance.The review is organized around five central research questions:(1)the chronological development and milestones of BCSD;(2)the construction of AI-driven technical roadmaps that chart methodological transitions;(3)the design and implementation of general analytical workflows for binary code analysis;(4)the applicability,strengths,and limitations of LLMs in capturing semantic and structural features of binary code;and(5)the persistent challenges and promising directions for future investigation.By synthesizing insights across these dimensions,the study demonstrates how LLMs reshape the landscape of binary code analysis,offering unprecedented opportunities to improve accuracy,scalability,and adaptability in real-world scenarios.This review not only bridges a critical gap in the existing literature but also provides a forward-looking perspective,serving as a valuable reference for researchers and practitioners aiming to advance AI-powered BCSD methodologies and applications. 展开更多
关键词 Binary code similarity detection semantic code representation graph-based modeling representation learning large language models
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部