Smart contracts are widely used on the blockchain to implement complex transactions,such as decentralized applications on Ethereum.Effective vulnerability detection of large-scale smart contracts is critical,as attack...Smart contracts are widely used on the blockchain to implement complex transactions,such as decentralized applications on Ethereum.Effective vulnerability detection of large-scale smart contracts is critical,as attacks on smart contracts often cause huge economic losses.Since it is difficult to repair and update smart contracts,it is necessary to find the vulnerabilities before they are deployed.However,code analysis,which requires traversal paths,and learning methods,which require many features to be trained,are too time-consuming to detect large-scale on-chain contracts.Learning-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol execution.But the existing features lack the interpretability of the detection results and training model,even worse,the large-scale feature space also affects the efficiency of detection.This paper focuses on improving the detection efficiency by reducing the dimension of the features,combined with expert knowledge.In this paper,a feature extraction model Block-gram is proposed to form low-dimensional knowledge-based features from bytecode.First,the metadata is separated and the runtime code is converted into a sequence of opcodes,which are divided into segments based on some instructions(jumps,etc.).Then,scalable Block-gram features,including 4-dimensional block features and 8-dimensional attribute features,are mined for the learning-based model training.Finally,feature contributions are calculated from SHAP values to measure the relationship between our features and the results of the detection model.In addition,six types of vulnerability labels are made on a dataset containing 33,885 contracts,and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms,which show that the average detection latency speeds up 25×to 650×,compared with the features extracted by N-gram,and also can enhance the interpretability of the detection model.展开更多
RESTful APIs have been adopted as the standard way of developing web services,allowing for smooth communication between clients and servers.Their simplicity,scalability,and compatibility have made them crucial to mode...RESTful APIs have been adopted as the standard way of developing web services,allowing for smooth communication between clients and servers.Their simplicity,scalability,and compatibility have made them crucial to modern web environments.However,the increased adoption of RESTful APIs has simultaneously exposed these interfaces to significant security threats that jeopardize the availability,confidentiality,and integrity of web services.This survey focuses exclusively on RESTful APIs,providing an in-depth perspective distinct from studies addressing other API types such as GraphQL or SOAP.We highlight concrete threats-such as injection attacks and insecure direct object references(IDOR)-to illustrate the evolving risk landscape.Our work systematically reviews state-of-the-art detection methods,including static code analysis and penetration testing,and proposes a novel taxonomy that categorizes vulnerabilities such as authentication and authorization issues.Unlike existing taxonomies focused on general web or network-level threats,our taxonomy emphasizes API-specific design flaws and operational dependencies,offering a more granular and actionable framework for RESTful API security.By critically assessing current detection methodologies and identifying key research gaps,we offer a structured framework that advances the understanding and mitigation of RESTful API vulnerabilities.Ultimately,this work aims to drive significant advancements in API security,thereby enhancing the resilience of web services against evolving cyber threats.展开更多
Source code vulnerabilities present significant security threats,necessitating effective detection techniques.Rigid rule-sets and pattern matching are the foundation of traditional static analysis tools,which drown de...Source code vulnerabilities present significant security threats,necessitating effective detection techniques.Rigid rule-sets and pattern matching are the foundation of traditional static analysis tools,which drown developers in false positives and miss context-sensitive vulnerabilities.Large Language Models(LLMs)like BERT,in particular,are examples of artificial intelligence(AI)that exhibit promise but frequently lack transparency.In order to overcome the issues with model interpretability,this work suggests a BERT-based LLM strategy for vulnerability detection that incorporates Explainable AI(XAI)methods like SHAP and attention heatmaps.Furthermore,to ensure auditable and comprehensible choices,we present a transparency obligation structure that covers the whole LLM lifetime.Our experiments on a comprehensive and extensive source code DiverseVul dataset show that the proposed method outperform,attaining 92.3%detection accuracy and surpassing CodeT5(89.4%),GPT-3.5(85.1%),and GPT-4(88.7%)under the same evaluation scenario.Through integrated SHAP analysis,this exhibits improved detection capabilities while preserving explainability,which is a crucial advantage over black-box LLM alternatives in security contexts.The XAI analysis discovers crucial predictive tokens such as susceptible and function through SHAP framework.Furthermore,the local token interactions that support the decision-making of the model process are graphically highlighted via attention heatmaps.This method provides a workable solution for reliable vulnerability identification in software systems by effectively fusing high detection accuracy with model explainability.Our findings imply that transparent AI models are capable of successfully detecting security flaws while preserving interpretability for human analysts.展开更多
Smart contracts are self-executing programs on blockchains that manage complex business logic with transparency and integrity.However,their immutability after deployment makes programming errors particularly critical,...Smart contracts are self-executing programs on blockchains that manage complex business logic with transparency and integrity.However,their immutability after deployment makes programming errors particularly critical,as such errors can be exploited to compromise blockchain security.Existing vulnerability detection methods often rely on fixed rules or target specific vulnerabilities,limiting their scalability and adaptability to diverse smart contract scenarios.Furthermore,natural language processing approaches for source code analysis frequently fail to capture program flow,which is essential for identifying structural vulnerabilities.To address these limitations,we propose a novel model that integrates textual and structural information for smart contract vulnerability detection.Our approach employs the CodeBERT NLP model for textual analysis,augmented with structural insights derived from control flow graphs created using the abstract syntax tree and opcode of smart contracts.Each graph node is embedded using Sent2Vec,and centrality analysis is applied to highlight critical paths and nodes within the code.The extracted features are normalized and combined into a prompt for a large language model to detect vulnerabilities effectivel.Experimental results demonstrate the superiority of our model,achieving an accuracy of 86.70%,a recall of 84.87%,a precision of 85.24%,and an F1-score of 84.46%.These outcomes surpass existing methods,including CodeBERT alone(accuracy:81.26%,F1-score:79.84%)and CodeBERT combined with abstract syntax tree analysis(accuracy:83.48%,F1-score:79.65%).The findings underscore the effectiveness of incorporating graph structural information alongside text-based analysis,offering improved scalability and performance in detecting diverse vulnerabilities.展开更多
The widespread adoption of blockchain technology has led to the exploration of its numerous applications in various fields.Cryptographic algorithms and smart contracts are critical components of blockchain security.De...The widespread adoption of blockchain technology has led to the exploration of its numerous applications in various fields.Cryptographic algorithms and smart contracts are critical components of blockchain security.Despite the benefits of virtual currency,vulnerabilities in smart contracts have resulted in substantial losses to users.While researchers have identified these vulnerabilities and developed tools for detecting them,the accuracy of these tools is still far from satisfactory,with high false positive and false negative rates.In this paper,we propose a new method for detecting vulnerabilities in smart contracts using the BERT pre-training model,which can quickly and effectively process and detect smart contracts.More specifically,we preprocess and make symbol substitution in the contract,which can make the pre-training model better obtain contract features.We evaluate our method on four datasets and compare its performance with other deep learning models and vulnerability detection tools,demonstrating its superior accuracy.展开更多
In recent years,the number of smart contracts deployed on blockchain has exploded.However,the issue of vulnerability has caused incalculable losses.Due to the irreversible and immutability of smart contracts,vulnerabi...In recent years,the number of smart contracts deployed on blockchain has exploded.However,the issue of vulnerability has caused incalculable losses.Due to the irreversible and immutability of smart contracts,vulnerability detection has become particularly important.With the popular use of neural network model,there has been a growing utilization of deep learning-based methods and tools for the identification of vulnerabilities within smart contracts.This paper commences by providing a succinct overview of prevalent categories of vulnerabilities found in smart contracts.Subsequently,it categorizes and presents an overview of contemporary deep learning-based tools developed for smart contract detection.These tools are categorized based on their open-source status,the data format and the type of feature extraction they employ.Then we conduct a comprehensive comparative analysis of these tools,selecting representative tools for experimental validation and comparing them with traditional tools in terms of detection coverage and accuracy.Finally,Based on the insights gained from the experimental results and the current state of research in the field of smart contract vulnerability detection tools,we suppose to provide a reference standard for developers of contract vulnerability detection tools.Meanwhile,forward-looking research directions are also proposed for deep learning-based smart contract vulnerability detection.展开更多
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
The detection of software vulnerabilities written in C and C++languages takes a lot of attention and interest today.This paper proposes a new framework called DrCSE to improve software vulnerability detection.It uses ...The detection of software vulnerabilities written in C and C++languages takes a lot of attention and interest today.This paper proposes a new framework called DrCSE to improve software vulnerability detection.It uses an intelligent computation technique based on the combination of two methods:Rebalancing data and representation learning to analyze and evaluate the code property graph(CPG)of the source code for detecting abnormal behavior of software vulnerabilities.To do that,DrCSE performs a combination of 3 main processing techniques:(i)building the source code feature profiles,(ii)rebalancing data,and(iii)contrastive learning.In which,the method(i)extracts the source code’s features based on the vertices and edges of the CPG.The method of rebalancing data has the function of supporting the training process by balancing the experimental dataset.Finally,contrastive learning techniques learn the important features of the source code by finding and pulling similar ones together while pushing the outliers away.The experiment part of this paper demonstrates the superiority of the DrCSE Framework for detecting source code security vulnerabilities using the Verum dataset.As a result,the method proposed in the article has brought a pretty good performance in all metrics,especially the Precision and Recall scores of 39.35%and 69.07%,respectively,proving the efficiency of the DrCSE Framework.It performs better than other approaches,with a 5%boost in Precision and a 5%boost in Recall.Overall,this is considered the best research result for the software vulnerability detection problem using the Verum dataset according to our survey to date.展开更多
Software vulnerabilities are the root cause of various information security incidents while dynamic taint analysis is an emerging program analysis technique. In this paper, to maximize the use of the technique to dete...Software vulnerabilities are the root cause of various information security incidents while dynamic taint analysis is an emerging program analysis technique. In this paper, to maximize the use of the technique to detect software vulnerabilities, we present SwordDTA, a tool that can perform dynamic taint analysis for binaries. This tool is flexible and extensible that it can work with commodity software and hardware. It can be used to detect software vulnerabilities with vulnerability modeling and taint check. We evaluate it with a number of commonly used real-world applications. The experimental results show that SwordDTA is capable of detecting at least four kinds of softavare vulnerabilities including buffer overflow, integer overflow, division by zero and use-after-free, and is applicable for a wide range of software.展开更多
In the context of modern software development characterized by increasing complexity and compressed development cycles,traditional static vulnerability detection methods face prominent challenges including high false ...In the context of modern software development characterized by increasing complexity and compressed development cycles,traditional static vulnerability detection methods face prominent challenges including high false positive rates and missed detections of complex logic due to their over-reliance on rule templates.This paper proposes a Syntax-Aware Hierarchical Attention Network(SAHAN)model,which achieves high-precision vulnerability detection through grammar-rule-driven multi-granularity code slicing and hierarchical semantic fusion mechanisms.The SAHAN model first generates Syntax Independent Units(SIUs),which slices the code based on Abstract Syntax Tree(AST)and predefined grammar rules,retaining vulnerability-sensitive contexts.Following this,through a hierarchical attention mechanism,the local syntax-aware layer encodes fine-grained patterns within SIUs,while the global semantic correlation layer captures vulnerability chains across SIUs,achieving synergistic modeling of syntax and semantics.Experiments show that on benchmark datasets like QEMU,SAHAN significantly improves detection performance by 4.8%to 13.1%on average compared to baseline models such as Devign and VulDeePecker.展开更多
Smart contracts hold billions of dollars in digital currency,and their security vulnerabilities have drawn a lot of attention in recent years.Traditional methods for detecting smart contract vulnerabilities rely prima...Smart contracts hold billions of dollars in digital currency,and their security vulnerabilities have drawn a lot of attention in recent years.Traditional methods for detecting smart contract vulnerabilities rely primarily on symbol execution,which makes them time-consuming with high false positive rates.Recently,deep learning approaches have alleviated these issues but still face several major limitations,such as lack of interpretability and susceptibility to evasion techniques.In this paper,we propose a feature selection method for uplifting modeling.The fundamental concept of this method is a feature selection algorithm,utilizing interpretation outcomes to select critical features,thereby reducing the scales of features.The learning process could be accelerated significantly because of the reduction of the feature size.The experiment shows that our proposed model performs well in six types of vulnerability detection.The accuracy of each type is higher than 93%and the average detection time of each smart contract is less than 1 ms.Notably,through our proposed feature selection algorithm,the training time of each type of vulnerability is reduced by nearly 80%compared with that of its original.展开更多
Ethereum smart contracts are computer programs that are deployed and executed on the Ethereum blockchain to enforce agreements among untrusting parties.Being the most prominent platform that supports smart contracts,E...Ethereum smart contracts are computer programs that are deployed and executed on the Ethereum blockchain to enforce agreements among untrusting parties.Being the most prominent platform that supports smart contracts,Ethereum has been targeted by many attacks and plagued by security incidents.Consequently,many smart contract vulnerabilities have been discovered in the past decade.To detect and prevent such vulnerabilities,different security analysis tools,including static and dynamic analysis tools,have been created,but their performance decreases drastically when codes to be analyzed are constantly being rewritten.In this paper,we propose Eth2Vec,a machine-learning-based static analysis tool that detects smart contract vulnerabilities.Eth2Vec maintains its robustness against code rewrites;i.e.,it can detect vulnerabilities even in rewritten codes.Other machine-learning-based static analysis tools require features,which analysts create manually,as inputs.In contrast,Eth2Vec uses a neural network for language processing to automatically learn the features of vulnerable contracts.In doing so,Eth2Vec can detect vulnerabilities in smart contracts by comparing the similarities between the codes of a target contract and those of the learned contracts.We performed experiments with existing open databases,such as Etherscan,and Eth2Vec was able to outperform a recent model based on support vector machine in terms of well-known metrics,i.e.,precision,recall,and F1-score.展开更多
It is difficult to formalize the causes of vulnerability, and there is no effective model to reveal the causes and characteristics of vulnerability. In this paper, a vulnerability model construction method is proposed...It is difficult to formalize the causes of vulnerability, and there is no effective model to reveal the causes and characteristics of vulnerability. In this paper, a vulnerability model construction method is proposed to realize the description of vulnerability attribute and the construction of a vulnerability model. A vulnerability model based on chemical abstract machine(CHAM) is constructed to realize the CHAM description of vulnerability model, and the framework of vulnerability model is also discussed. Case study is carried out to verify the feasibility and effectiveness of the proposed model. In addition, a prototype system is also designed and implemented based on the proposed vulnerability model. Experimental results show that the proposed model is more effective than other methods in the detection of software vulnerabilities.展开更多
Smart contracts have led to more efficient development in finance and healthcare,but vulnerabilities in contracts pose high risks to their future applications.The current vulnerability detection methods for contracts ...Smart contracts have led to more efficient development in finance and healthcare,but vulnerabilities in contracts pose high risks to their future applications.The current vulnerability detection methods for contracts are either based on fixed expert rules,which are inefficient,or rely on simplistic deep learning techniques that do not fully leverage contract semantic information.Therefore,there is ample room for improvement in terms of detection precision.To solve these problems,this paper proposes a vulnerability detector based on deep learning techniques,graph representation,and Transformer,called GRATDet.The method first performs swapping,insertion,and symbolization operations for contract functions,increasing the amount of small sample data.Each line of code is then treated as a basic semantic element,and information such as control and data relationships is extracted to construct a new representation in the form of a Line Graph(LG),which shows more structural features that differ from the serialized presentation of the contract.Finally,the node information and edge information of the graph are jointly learned using an improved Transformer-GP model to extract information globally and locally,and the fused features are used for vulnerability detection.The effectiveness of the method in reentrancy vulnerability detection is verified in experiments,where the F1 score reaches 95.16%,exceeding stateof-the-art methods.展开更多
Software an important way to vulnerability mining is detect whether there are some loopholes existing in the software, and also is an important way to ensure the secu- rity of information systems. With the rapid devel...Software an important way to vulnerability mining is detect whether there are some loopholes existing in the software, and also is an important way to ensure the secu- rity of information systems. With the rapid development of information technology and software industry, most of the software has not been rigorously tested before being put in use, so that the hidden vulnerabilities in software will be exploited by the attackers. Therefore, it is of great significance for us to actively de- tect the software vulnerabilities in the security maintenance of information systems. In this paper, we firstly studied some of the common- ly used vulnerability detection methods and detection tools, and analyzed the advantages and disadvantages of each method in different scenarios. Secondly, we designed a set of eval- uation criteria for different mining methods in the loopholes evaluation. Thirdly, we also proposed and designed an integration testing framework, on which we can test the typical static analysis methods and dynamic mining methods as well as make the comparison, so that we can obtain an intuitive comparative analysis for the experimental results. Final- ly, we reported the experimental analysis to verify the feasibility and effectiveness of the proposed evaluation method and the testingframework, with the results showing that the final test results will serve as a form of guid- ance to aid the selection of the most appropri- ate and effective method or tools in vulnera- bility detection activity.展开更多
With the development of the 5th generation of mobile communi-cation(5G)networks and artificial intelligence(AI)technologies,the use of the Internet of Things(IoT)has expanded throughout industry.Although IoT networks ...With the development of the 5th generation of mobile communi-cation(5G)networks and artificial intelligence(AI)technologies,the use of the Internet of Things(IoT)has expanded throughout industry.Although IoT networks have improved industrial productivity and convenience,they are highly dependent on nonstandard protocol stacks and open-source-based,poorly validated software,resulting in several security vulnerabilities.How-ever,conventional AI-based software vulnerability discovery technologies cannot be applied to IoT because they require excessive memory and com-puting power.This study developed a technique for optimizing training data size to detect software vulnerabilities rapidly while maintaining learning accuracy.Experimental results using a software vulnerability classification dataset showed that different optimal data sizes did not affect the learning performance of the learning models.Moreover,the minimal data size required to train a model without performance degradation could be determined in advance.For example,the random forest model saved 85.18%of memory and improved latency by 97.82%while maintaining a learning accuracy similar to that achieved when using 100%of data,despite using only 1%.展开更多
Smart contracts have signifcant losses due to various types of vulnerabilities. However, traditional vulnerability detec-tionmethods rely extensively on expert rules, resulting in low detection accuracy and poor adapt...Smart contracts have signifcant losses due to various types of vulnerabilities. However, traditional vulnerability detec-tionmethods rely extensively on expert rules, resulting in low detection accuracy and poor adaptability to novel attacks. To address these problems, in this paper, deep learning methods are combined with smart contract vulner-abilitycode detection approaches. Abstract syntax trees (ASTs), which are special isomorphic graph structures, are an important bridge between source code and graph neural networks. By learning the AST, the model can under-standthe semantics of the source code. Moreover, graph neural networks have an increasing ability to address com-plexheterogeneous graphs. Therefore, control fow graphs are fused with data fow graphs on the basis of the ASTs to build heterogeneous graphs with richer code semantics. Furthermore, multigranularity analysis of the vulnerability detection results is performed, including coarse-grained contract-level vulnerability detection and fne-grained line-levelvulnerability detection. Through this multigranularity detection approach, vulnerabilities in contracts can be identifed and analysed more comprehensively, providing a richer perspective and more solutions for vulnerability detection. The experimental results show that the proposed multigranularity vulnerability detection method based on heterogeneous graphs (MVD-HG) improves both the accuracy and range of the detected vulnerability types in contract-level vulnerability detection tasks;moreover, in the line-level vulnerability detection task, the MVD-HG model achieves signifcant results and addresses the shortcomings of existing methods. In addition, based on code generation methods used in related felds, a data enhancement method based on the source code is developed, which efectively expands the experimental dataset to address the reduced credibility of the results due to insufcient amounts of data.展开更多
By the analysis of vulnerabilities of Android native system services,we find that some vulnerabilities are caused by inconsistent data transmission and inconsistent data processing logic between client and server.The ...By the analysis of vulnerabilities of Android native system services,we find that some vulnerabilities are caused by inconsistent data transmission and inconsistent data processing logic between client and server.The existing research cannot find the above two types of vulnerabilities and the test cases of them face the problem of low coverage.In this paper,we propose an extraction method of test cases based on the native system services of the client and design a case construction method that supports multi-parameter mutation based on genetic algorithm and priority strategy.Based on the above method,we implement a detection tool-BArcherFuzzer to detect vulnerabilities of Android native system services.The experiment results show that BArcherFuzzer found four vulnerabilities of hundreds of exception messages,all of them were confirmed by Google and one was assigned a Common Vulnerabilities and Exposures(CVE)number(CVE-2020-0363).展开更多
The most resource-intensive and laborious part of debugging is finding the exact location of the fault from the more significant number of code snippets.Plenty of machine intelligence models has offered the effective ...The most resource-intensive and laborious part of debugging is finding the exact location of the fault from the more significant number of code snippets.Plenty of machine intelligence models has offered the effective localization of defects.Some models can precisely locate the faulty with more than 95%accuracy,resulting in demand for trustworthy models in fault localization.Confidence and trustworthiness within machine intelligencebased software models can only be achieved via explainable artificial intelligence in Fault Localization(XFL).The current study presents a model for generating counterfactual interpretations for the fault localization model’s decisions.Neural system approximations and disseminated presentation of input information may be achieved by building a nonlinear neural network model.That demonstrates a high level of proficiency in transfer learning,even with minimal training data.The proposed XFL would make the decisionmaking transparent simultaneously without impacting the model’s performance.The proposed XFL ranks the software program statements based on the possible vulnerability score approximated from the training data.The model’s performance is further evaluated using various metrics like the number of assessed statements,confidence level of fault localization,and TopN evaluation strategies.展开更多
SQL injection poses a major threat to the application level security of the database and there is no systematic solution to these attacks.Different from traditional run time security strategies such as IDS and fire-wa...SQL injection poses a major threat to the application level security of the database and there is no systematic solution to these attacks.Different from traditional run time security strategies such as IDS and fire-wall,this paper focuses on the solution at the outset;it presents a method to find vulnerabilities by analyzing the source codes.The concept of validated tree is developed to track variables referenced by database operations in scripts.By checking whether these variables are influenced by outside inputs,the database operations are proved to be secure or not.This method has advantages of high accuracy and efficiency as well as low costs,and it is universal to any type of web application platforms.It is implemented by the software code vulnerabilities of SQL injection detector(CVSID).The validity and efficiency are demonstrated with an example.展开更多
基金partially supported by the National Natural Science Foundation (62272248)the Open Project Fund of State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences (CARCHA202108,CARCH201905)+1 种基金the Natural Science Foundation of Tianjin (20JCZDJC00610)Sponsored by Zhejiang Lab (2021KF0AB04)。
文摘Smart contracts are widely used on the blockchain to implement complex transactions,such as decentralized applications on Ethereum.Effective vulnerability detection of large-scale smart contracts is critical,as attacks on smart contracts often cause huge economic losses.Since it is difficult to repair and update smart contracts,it is necessary to find the vulnerabilities before they are deployed.However,code analysis,which requires traversal paths,and learning methods,which require many features to be trained,are too time-consuming to detect large-scale on-chain contracts.Learning-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol execution.But the existing features lack the interpretability of the detection results and training model,even worse,the large-scale feature space also affects the efficiency of detection.This paper focuses on improving the detection efficiency by reducing the dimension of the features,combined with expert knowledge.In this paper,a feature extraction model Block-gram is proposed to form low-dimensional knowledge-based features from bytecode.First,the metadata is separated and the runtime code is converted into a sequence of opcodes,which are divided into segments based on some instructions(jumps,etc.).Then,scalable Block-gram features,including 4-dimensional block features and 8-dimensional attribute features,are mined for the learning-based model training.Finally,feature contributions are calculated from SHAP values to measure the relationship between our features and the results of the detection model.In addition,six types of vulnerability labels are made on a dataset containing 33,885 contracts,and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms,which show that the average detection latency speeds up 25×to 650×,compared with the features extracted by N-gram,and also can enhance the interpretability of the detection model.
文摘RESTful APIs have been adopted as the standard way of developing web services,allowing for smooth communication between clients and servers.Their simplicity,scalability,and compatibility have made them crucial to modern web environments.However,the increased adoption of RESTful APIs has simultaneously exposed these interfaces to significant security threats that jeopardize the availability,confidentiality,and integrity of web services.This survey focuses exclusively on RESTful APIs,providing an in-depth perspective distinct from studies addressing other API types such as GraphQL or SOAP.We highlight concrete threats-such as injection attacks and insecure direct object references(IDOR)-to illustrate the evolving risk landscape.Our work systematically reviews state-of-the-art detection methods,including static code analysis and penetration testing,and proposes a novel taxonomy that categorizes vulnerabilities such as authentication and authorization issues.Unlike existing taxonomies focused on general web or network-level threats,our taxonomy emphasizes API-specific design flaws and operational dependencies,offering a more granular and actionable framework for RESTful API security.By critically assessing current detection methodologies and identifying key research gaps,we offer a structured framework that advances the understanding and mitigation of RESTful API vulnerabilities.Ultimately,this work aims to drive significant advancements in API security,thereby enhancing the resilience of web services against evolving cyber threats.
文摘Source code vulnerabilities present significant security threats,necessitating effective detection techniques.Rigid rule-sets and pattern matching are the foundation of traditional static analysis tools,which drown developers in false positives and miss context-sensitive vulnerabilities.Large Language Models(LLMs)like BERT,in particular,are examples of artificial intelligence(AI)that exhibit promise but frequently lack transparency.In order to overcome the issues with model interpretability,this work suggests a BERT-based LLM strategy for vulnerability detection that incorporates Explainable AI(XAI)methods like SHAP and attention heatmaps.Furthermore,to ensure auditable and comprehensible choices,we present a transparency obligation structure that covers the whole LLM lifetime.Our experiments on a comprehensive and extensive source code DiverseVul dataset show that the proposed method outperform,attaining 92.3%detection accuracy and surpassing CodeT5(89.4%),GPT-3.5(85.1%),and GPT-4(88.7%)under the same evaluation scenario.Through integrated SHAP analysis,this exhibits improved detection capabilities while preserving explainability,which is a crucial advantage over black-box LLM alternatives in security contexts.The XAI analysis discovers crucial predictive tokens such as susceptible and function through SHAP framework.Furthermore,the local token interactions that support the decision-making of the model process are graphically highlighted via attention heatmaps.This method provides a workable solution for reliable vulnerability identification in software systems by effectively fusing high detection accuracy with model explainability.Our findings imply that transparent AI models are capable of successfully detecting security flaws while preserving interpretability for human analysts.
基金supported by the Seoul Business Agency(SBA),funded by the Seoul Metropolitan Government,through the Seoul R&BD Program(FB240022)by the Korea Institute for Advancement of Technology(KIAT),funded by the Korea Government(MOTIE)(RS-2024-00406796)+1 种基金through the HRD Program for Industrial Innovationby the Excellent Researcher Support Project of Kwangwoon University in 2024.
文摘Smart contracts are self-executing programs on blockchains that manage complex business logic with transparency and integrity.However,their immutability after deployment makes programming errors particularly critical,as such errors can be exploited to compromise blockchain security.Existing vulnerability detection methods often rely on fixed rules or target specific vulnerabilities,limiting their scalability and adaptability to diverse smart contract scenarios.Furthermore,natural language processing approaches for source code analysis frequently fail to capture program flow,which is essential for identifying structural vulnerabilities.To address these limitations,we propose a novel model that integrates textual and structural information for smart contract vulnerability detection.Our approach employs the CodeBERT NLP model for textual analysis,augmented with structural insights derived from control flow graphs created using the abstract syntax tree and opcode of smart contracts.Each graph node is embedded using Sent2Vec,and centrality analysis is applied to highlight critical paths and nodes within the code.The extracted features are normalized and combined into a prompt for a large language model to detect vulnerabilities effectivel.Experimental results demonstrate the superiority of our model,achieving an accuracy of 86.70%,a recall of 84.87%,a precision of 85.24%,and an F1-score of 84.46%.These outcomes surpass existing methods,including CodeBERT alone(accuracy:81.26%,F1-score:79.84%)and CodeBERT combined with abstract syntax tree analysis(accuracy:83.48%,F1-score:79.65%).The findings underscore the effectiveness of incorporating graph structural information alongside text-based analysis,offering improved scalability and performance in detecting diverse vulnerabilities.
基金supported by the National Key Research and Development Plan in China(Grant No.2020YFB1005500)。
文摘The widespread adoption of blockchain technology has led to the exploration of its numerous applications in various fields.Cryptographic algorithms and smart contracts are critical components of blockchain security.Despite the benefits of virtual currency,vulnerabilities in smart contracts have resulted in substantial losses to users.While researchers have identified these vulnerabilities and developed tools for detecting them,the accuracy of these tools is still far from satisfactory,with high false positive and false negative rates.In this paper,we propose a new method for detecting vulnerabilities in smart contracts using the BERT pre-training model,which can quickly and effectively process and detect smart contracts.More specifically,we preprocess and make symbol substitution in the contract,which can make the pre-training model better obtain contract features.We evaluate our method on four datasets and compare its performance with other deep learning models and vulnerability detection tools,demonstrating its superior accuracy.
基金funded by the Major PublicWelfare Special Fund of Henan Province(No.201300210200)the Major Science and Technology Research Special Fund of Henan Province(No.221100210400).
文摘In recent years,the number of smart contracts deployed on blockchain has exploded.However,the issue of vulnerability has caused incalculable losses.Due to the irreversible and immutability of smart contracts,vulnerability detection has become particularly important.With the popular use of neural network model,there has been a growing utilization of deep learning-based methods and tools for the identification of vulnerabilities within smart contracts.This paper commences by providing a succinct overview of prevalent categories of vulnerabilities found in smart contracts.Subsequently,it categorizes and presents an overview of contemporary deep learning-based tools developed for smart contract detection.These tools are categorized based on their open-source status,the data format and the type of feature extraction they employ.Then we conduct a comprehensive comparative analysis of these tools,selecting representative tools for experimental validation and comparing them with traditional tools in terms of detection coverage and accuracy.Finally,Based on the insights gained from the experimental results and the current state of research in the field of smart contract vulnerability detection tools,we suppose to provide a reference standard for developers of contract vulnerability detection tools.Meanwhile,forward-looking research directions are also proposed for deep learning-based smart contract vulnerability detection.
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
文摘The detection of software vulnerabilities written in C and C++languages takes a lot of attention and interest today.This paper proposes a new framework called DrCSE to improve software vulnerability detection.It uses an intelligent computation technique based on the combination of two methods:Rebalancing data and representation learning to analyze and evaluate the code property graph(CPG)of the source code for detecting abnormal behavior of software vulnerabilities.To do that,DrCSE performs a combination of 3 main processing techniques:(i)building the source code feature profiles,(ii)rebalancing data,and(iii)contrastive learning.In which,the method(i)extracts the source code’s features based on the vertices and edges of the CPG.The method of rebalancing data has the function of supporting the training process by balancing the experimental dataset.Finally,contrastive learning techniques learn the important features of the source code by finding and pulling similar ones together while pushing the outliers away.The experiment part of this paper demonstrates the superiority of the DrCSE Framework for detecting source code security vulnerabilities using the Verum dataset.As a result,the method proposed in the article has brought a pretty good performance in all metrics,especially the Precision and Recall scores of 39.35%and 69.07%,respectively,proving the efficiency of the DrCSE Framework.It performs better than other approaches,with a 5%boost in Precision and a 5%boost in Recall.Overall,this is considered the best research result for the software vulnerability detection problem using the Verum dataset according to our survey to date.
基金Supported by the National High Technology Research and Development Program of China(863 Program)(2012AA012902)the“HGJ”National Major Technological Projects(2013ZX01045-004)
文摘Software vulnerabilities are the root cause of various information security incidents while dynamic taint analysis is an emerging program analysis technique. In this paper, to maximize the use of the technique to detect software vulnerabilities, we present SwordDTA, a tool that can perform dynamic taint analysis for binaries. This tool is flexible and extensible that it can work with commodity software and hardware. It can be used to detect software vulnerabilities with vulnerability modeling and taint check. We evaluate it with a number of commonly used real-world applications. The experimental results show that SwordDTA is capable of detecting at least four kinds of softavare vulnerabilities including buffer overflow, integer overflow, division by zero and use-after-free, and is applicable for a wide range of software.
基金supported by the research start-up funds for invited doctor of Lanzhou University of Technology under Grant 14/062402。
文摘In the context of modern software development characterized by increasing complexity and compressed development cycles,traditional static vulnerability detection methods face prominent challenges including high false positive rates and missed detections of complex logic due to their over-reliance on rule templates.This paper proposes a Syntax-Aware Hierarchical Attention Network(SAHAN)model,which achieves high-precision vulnerability detection through grammar-rule-driven multi-granularity code slicing and hierarchical semantic fusion mechanisms.The SAHAN model first generates Syntax Independent Units(SIUs),which slices the code based on Abstract Syntax Tree(AST)and predefined grammar rules,retaining vulnerability-sensitive contexts.Following this,through a hierarchical attention mechanism,the local syntax-aware layer encodes fine-grained patterns within SIUs,while the global semantic correlation layer captures vulnerability chains across SIUs,achieving synergistic modeling of syntax and semantics.Experiments show that on benchmark datasets like QEMU,SAHAN significantly improves detection performance by 4.8%to 13.1%on average compared to baseline models such as Devign and VulDeePecker.
基金funded by the National Natural Science Foundation of China No.61902157 and No.62002139.
文摘Smart contracts hold billions of dollars in digital currency,and their security vulnerabilities have drawn a lot of attention in recent years.Traditional methods for detecting smart contract vulnerabilities rely primarily on symbol execution,which makes them time-consuming with high false positive rates.Recently,deep learning approaches have alleviated these issues but still face several major limitations,such as lack of interpretability and susceptibility to evasion techniques.In this paper,we propose a feature selection method for uplifting modeling.The fundamental concept of this method is a feature selection algorithm,utilizing interpretation outcomes to select critical features,thereby reducing the scales of features.The learning process could be accelerated significantly because of the reduction of the feature size.The experiment shows that our proposed model performs well in six types of vulnerability detection.The accuracy of each type is higher than 93%and the average detection time of each smart contract is less than 1 ms.Notably,through our proposed feature selection algorithm,the training time of each type of vulnerability is reduced by nearly 80%compared with that of its original.
基金This research was supported in part by the Japan Society for the Promotion of Science KAKENHI Number 22H03591the MEXT"Innovation Platform for Society 5.0"Program Grant Number JPMXP0518071489.
文摘Ethereum smart contracts are computer programs that are deployed and executed on the Ethereum blockchain to enforce agreements among untrusting parties.Being the most prominent platform that supports smart contracts,Ethereum has been targeted by many attacks and plagued by security incidents.Consequently,many smart contract vulnerabilities have been discovered in the past decade.To detect and prevent such vulnerabilities,different security analysis tools,including static and dynamic analysis tools,have been created,but their performance decreases drastically when codes to be analyzed are constantly being rewritten.In this paper,we propose Eth2Vec,a machine-learning-based static analysis tool that detects smart contract vulnerabilities.Eth2Vec maintains its robustness against code rewrites;i.e.,it can detect vulnerabilities even in rewritten codes.Other machine-learning-based static analysis tools require features,which analysts create manually,as inputs.In contrast,Eth2Vec uses a neural network for language processing to automatically learn the features of vulnerable contracts.In doing so,Eth2Vec can detect vulnerabilities in smart contracts by comparing the similarities between the codes of a target contract and those of the learned contracts.We performed experiments with existing open databases,such as Etherscan,and Eth2Vec was able to outperform a recent model based on support vector machine in terms of well-known metrics,i.e.,precision,recall,and F1-score.
基金Supported by the National Natural Science Foundation of China(61202110 and 61502205)the Project of Jiangsu Provincial Six Talent Peaks(XYDXXJS-016)
文摘It is difficult to formalize the causes of vulnerability, and there is no effective model to reveal the causes and characteristics of vulnerability. In this paper, a vulnerability model construction method is proposed to realize the description of vulnerability attribute and the construction of a vulnerability model. A vulnerability model based on chemical abstract machine(CHAM) is constructed to realize the CHAM description of vulnerability model, and the framework of vulnerability model is also discussed. Case study is carried out to verify the feasibility and effectiveness of the proposed model. In addition, a prototype system is also designed and implemented based on the proposed vulnerability model. Experimental results show that the proposed model is more effective than other methods in the detection of software vulnerabilities.
基金supported by the Science and Technology Program Project(No.2020A02001-1)of Xinjiang Autonomous Region,China.
文摘Smart contracts have led to more efficient development in finance and healthcare,but vulnerabilities in contracts pose high risks to their future applications.The current vulnerability detection methods for contracts are either based on fixed expert rules,which are inefficient,or rely on simplistic deep learning techniques that do not fully leverage contract semantic information.Therefore,there is ample room for improvement in terms of detection precision.To solve these problems,this paper proposes a vulnerability detector based on deep learning techniques,graph representation,and Transformer,called GRATDet.The method first performs swapping,insertion,and symbolization operations for contract functions,increasing the amount of small sample data.Each line of code is then treated as a basic semantic element,and information such as control and data relationships is extracted to construct a new representation in the form of a Line Graph(LG),which shows more structural features that differ from the serialized presentation of the contract.Finally,the node information and edge information of the graph are jointly learned using an improved Transformer-GP model to extract information globally and locally,and the fused features are used for vulnerability detection.The effectiveness of the method in reentrancy vulnerability detection is verified in experiments,where the F1 score reaches 95.16%,exceeding stateof-the-art methods.
基金partly supported by National Natural Science Foundation of China (NSFC grant numbers: 61202110 and 61502205)the project of Jiangsu provincial Six Talent Peaks (Grant numbers: XYDXXJS-016)
文摘Software an important way to vulnerability mining is detect whether there are some loopholes existing in the software, and also is an important way to ensure the secu- rity of information systems. With the rapid development of information technology and software industry, most of the software has not been rigorously tested before being put in use, so that the hidden vulnerabilities in software will be exploited by the attackers. Therefore, it is of great significance for us to actively de- tect the software vulnerabilities in the security maintenance of information systems. In this paper, we firstly studied some of the common- ly used vulnerability detection methods and detection tools, and analyzed the advantages and disadvantages of each method in different scenarios. Secondly, we designed a set of eval- uation criteria for different mining methods in the loopholes evaluation. Thirdly, we also proposed and designed an integration testing framework, on which we can test the typical static analysis methods and dynamic mining methods as well as make the comparison, so that we can obtain an intuitive comparative analysis for the experimental results. Final- ly, we reported the experimental analysis to verify the feasibility and effectiveness of the proposed evaluation method and the testingframework, with the results showing that the final test results will serve as a form of guid- ance to aid the selection of the most appropri- ate and effective method or tools in vulnera- bility detection activity.
基金supported by a National Research Foundation of Korea (NRF)grant funded by the Ministry of Science and ICT (MSIT) (No.2020R1F1A1061107)the Korea Institute for Advancement of Technology (KIAT)grant funded by the Korean Government (MOTIE) (P0008703,The Competency Development Program for Industry Specialists)the MSIT under the ICAN (ICT Challenge and Advanced Network of HRD)program (No.IITP-2022-RS-2022-00156310)supervised by the Institute of Information&Communication Technology Planning and Evaluation (IITP).
文摘With the development of the 5th generation of mobile communi-cation(5G)networks and artificial intelligence(AI)technologies,the use of the Internet of Things(IoT)has expanded throughout industry.Although IoT networks have improved industrial productivity and convenience,they are highly dependent on nonstandard protocol stacks and open-source-based,poorly validated software,resulting in several security vulnerabilities.How-ever,conventional AI-based software vulnerability discovery technologies cannot be applied to IoT because they require excessive memory and com-puting power.This study developed a technique for optimizing training data size to detect software vulnerabilities rapidly while maintaining learning accuracy.Experimental results using a software vulnerability classification dataset showed that different optimal data sizes did not affect the learning performance of the learning models.Moreover,the minimal data size required to train a model without performance degradation could be determined in advance.For example,the random forest model saved 85.18%of memory and improved latency by 97.82%while maintaining a learning accuracy similar to that achieved when using 100%of data,despite using only 1%.
基金supported by the Major Program of Natural Science Foundation of Zhejiang Province(No.LD22F020002)the National Natural Science Founda-tion of China(Nos.62372410,U22B2028)+2 种基金the Zhejiang Provincial Natural Science Foundation of China(No.LZ23F020011)the Fundamental Research Funds for the Provincial Universities of Zhejiang(No.RF-A2023009)the Key R&D Projects in Zhejiang Province(No.2021C01117).
文摘Smart contracts have signifcant losses due to various types of vulnerabilities. However, traditional vulnerability detec-tionmethods rely extensively on expert rules, resulting in low detection accuracy and poor adaptability to novel attacks. To address these problems, in this paper, deep learning methods are combined with smart contract vulner-abilitycode detection approaches. Abstract syntax trees (ASTs), which are special isomorphic graph structures, are an important bridge between source code and graph neural networks. By learning the AST, the model can under-standthe semantics of the source code. Moreover, graph neural networks have an increasing ability to address com-plexheterogeneous graphs. Therefore, control fow graphs are fused with data fow graphs on the basis of the ASTs to build heterogeneous graphs with richer code semantics. Furthermore, multigranularity analysis of the vulnerability detection results is performed, including coarse-grained contract-level vulnerability detection and fne-grained line-levelvulnerability detection. Through this multigranularity detection approach, vulnerabilities in contracts can be identifed and analysed more comprehensively, providing a richer perspective and more solutions for vulnerability detection. The experimental results show that the proposed multigranularity vulnerability detection method based on heterogeneous graphs (MVD-HG) improves both the accuracy and range of the detected vulnerability types in contract-level vulnerability detection tasks;moreover, in the line-level vulnerability detection task, the MVD-HG model achieves signifcant results and addresses the shortcomings of existing methods. In addition, based on code generation methods used in related felds, a data enhancement method based on the source code is developed, which efectively expands the experimental dataset to address the reduced credibility of the results due to insufcient amounts of data.
基金This work was supported by the National Key R&D Program of China(2023YFB3106800)the National Natural Science Foundation of China(Grant No.62072051).We are overwhelmed in all humbleness and gratefulness to acknowledge my depth to all those who have helped me to put these ideas.
文摘By the analysis of vulnerabilities of Android native system services,we find that some vulnerabilities are caused by inconsistent data transmission and inconsistent data processing logic between client and server.The existing research cannot find the above two types of vulnerabilities and the test cases of them face the problem of low coverage.In this paper,we propose an extraction method of test cases based on the native system services of the client and design a case construction method that supports multi-parameter mutation based on genetic algorithm and priority strategy.Based on the above method,we implement a detection tool-BArcherFuzzer to detect vulnerabilities of Android native system services.The experiment results show that BArcherFuzzer found four vulnerabilities of hundreds of exception messages,all of them were confirmed by Google and one was assigned a Common Vulnerabilities and Exposures(CVE)number(CVE-2020-0363).
文摘The most resource-intensive and laborious part of debugging is finding the exact location of the fault from the more significant number of code snippets.Plenty of machine intelligence models has offered the effective localization of defects.Some models can precisely locate the faulty with more than 95%accuracy,resulting in demand for trustworthy models in fault localization.Confidence and trustworthiness within machine intelligencebased software models can only be achieved via explainable artificial intelligence in Fault Localization(XFL).The current study presents a model for generating counterfactual interpretations for the fault localization model’s decisions.Neural system approximations and disseminated presentation of input information may be achieved by building a nonlinear neural network model.That demonstrates a high level of proficiency in transfer learning,even with minimal training data.The proposed XFL would make the decisionmaking transparent simultaneously without impacting the model’s performance.The proposed XFL ranks the software program statements based on the possible vulnerability score approximated from the training data.The model’s performance is further evaluated using various metrics like the number of assessed statements,confidence level of fault localization,and TopN evaluation strategies.
基金supported by the National Natural Science Foundation of China (Grant No.60574087)the Hi-Tech Research and Development Program of China (Nos.2007AA01Z475,2007AA01Z480,2007AA01Z464)the 111 International Collaboration Program of China.
文摘SQL injection poses a major threat to the application level security of the database and there is no systematic solution to these attacks.Different from traditional run time security strategies such as IDS and fire-wall,this paper focuses on the solution at the outset;it presents a method to find vulnerabilities by analyzing the source codes.The concept of validated tree is developed to track variables referenced by database operations in scripts.By checking whether these variables are influenced by outside inputs,the database operations are proved to be secure or not.This method has advantages of high accuracy and efficiency as well as low costs,and it is universal to any type of web application platforms.It is implemented by the software code vulnerabilities of SQL injection detector(CVSID).The validity and efficiency are demonstrated with an example.