Code obfuscation is a crucial technique for protecting software against reverse engineering and security attacks.Among various obfuscation methods,opaque predicates,which are recognized as flexible and promising,are w...Code obfuscation is a crucial technique for protecting software against reverse engineering and security attacks.Among various obfuscation methods,opaque predicates,which are recognized as flexible and promising,are widely used to increase control-flow complexity.However,traditional opaque predicates are increasingly vulnerable to Dynamic Symbolic Execution(DSE)attacks,which can efficiently identify and eliminate them.To address this issue,this paper proposes a novel approach for anti-DSE opaque predicates that effectively resists symbolic execution-based deobfuscation.Our method introduces two key techniques:single-way function opaque predicates,which leverage hash functions and logarithmic transformations to prevent constraint solvers from generating feasible inputs,and path-explosion opaque predicates,which generate an excessive number of execution paths,overwhelming symbolic execution engines.To evaluate the effectiveness of our approach,we implemented a prototype obfuscation tool and tested it against prominent symbolic execution engines.Experimental results demonstrate that our approach signifi-cantly increases resilience against symbolic execution attacks while maintaining acceptable performance overhead.This paper provides a robust and scalable obfuscation technique,contributing to the enhancement of software protection strategies in adversarial environments.展开更多
Since the advent of smart contracts,security vulnerabilities have remained a persistent challenge,compromsing both the reliability of contract execution and the overall stability of the virtual currency market.Consequ...Since the advent of smart contracts,security vulnerabilities have remained a persistent challenge,compromsing both the reliability of contract execution and the overall stability of the virtual currency market.Consequently,the academic community has devoted increasing attention to these security risks.However,conventional approaches to vulnerability detection frequently exhibit limited accuracy.To address this limitation,the present study introduces a novel vulnerability detection framework called GNNSE that integrates symbolic execution with graph neural networks(GNNs).The proposedmethod first constructs semantic graphs to comprehensively capture the control flow and data flow dependencies within smart contracts.These graphs are subsequently processed using GNNs to efficiently identify contracts with a high likelihood of vulnerabilities.For these high-risk contracts,symbolic execution is employed to perform fine-grained,path-level analysis,thereby improving overall detection precision.Experimental results on a dataset comprising 10,079 contracts demonstrate that the proposed method achieves detection precisions of 93.58% for reentrancy vulnerabilities and 92.73% for timestamp-dependent vulnerabilities.展开更多
Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execu- tion exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researcher...Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execu- tion exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researchers have par- alleled existing symbolic execution tools (e.g., KLEE). In particular, Cloud9 is a widely used paralleled symbolic exe- cution tool, and researchers have used the tool to analyze real code. However, researchers criticize that tools such as Cloud9 still cannot analyze large scale code. In this paper, we con- duct a field study on Cloud9, in which we use KLEE and Cloud9 to analyze benchmarks in C. Our results confirm the criticism. Based on the results, we identify three bottlenecks that hinder the performance of Cloud9: the communication time gap, the job transfer policy, and the cache management of the solved constraints. To handle these problems, we tune the communication time gap with better parameters, modify the job transfer policy, and implement an approach for cache management of solved constraints. We conduct two evalua- tions on our benchmarks and a real application to understand our improvements. Our results show that our tuned Cloud9 reduces the execution time significantly, both on our bench- marks and the real application. Furthermore, our evaluation results show that our tuning techniques improve the effective- ness on all the devices, and the improvement can be achievedupto five times, depending upon a tuning value of our ap- proach and the behaviour of program under test.展开更多
Software security analysts typically only have access to the executable program and cannot directly access the source code of the program.This poses significant challenges to security analysis.While it is crucial to i...Software security analysts typically only have access to the executable program and cannot directly access the source code of the program.This poses significant challenges to security analysis.While it is crucial to identify vulnerabilities in such non-source code programs,there exists a limited set of generalized tools due to the low versatility of current vulnerability mining methods.However,these tools suffer from some shortcomings.In terms of targeted fuzzing,the path searching for target points is not streamlined enough,and the completely random testing leads to an excessively large search space.Additionally,when it comes to code similarity analysis,there are issues with incomplete code feature extraction,which may result in information loss.In this paper,we propose a cross-platform and cross-architecture approach to exploit vulnerabilities using neural network obfuscation techniques.By leveraging the Angr framework,a deobfuscation technique is introduced,along with the adoption of a VEX-IR-based intermediate language conversion method.This combination allows for the unified handling of binary programs across various architectures,compilers,and compilation options.Subsequently,binary programs are processed to extract multi-level spatial features using a combination of a skip-gram model with self-attention mechanism and a bidirectional Long Short-Term Memory(LSTM)network.Finally,the graph embedding network is utilized to evaluate the similarity of program functionalities.Based on these similarity scores,a target function is determined,and symbolic execution is applied to solve the target function.The solved content serves as the initial seed for targeted fuzzing.The binary program is processed by using the de-obfuscation technique and intermediate language transformation method,and then the similarity of program functions is evaluated by using a graph embedding network,and symbolic execution is performed based on these similarity scores.This approach facilitates cross-architecture analysis of executable programs without their source codes and concurrently reduces the risk of symbolic execution path explosion.展开更多
Symbolic execution is an effective way of systematically exploring the search space of a program,and is often used for automatic software testing and bug finding.The program to be analyzed is usually compiled into a b...Symbolic execution is an effective way of systematically exploring the search space of a program,and is often used for automatic software testing and bug finding.The program to be analyzed is usually compiled into a binary or an intermediate representation,on which symbolic execution is carried out.During this process,compiler optimizations influence the effectiveness and efficiency of symbolic execution.However,to the best of our knowledge,there exists no work on compiler optimization recommendation for symbolic execution with respect to(w.r.t.)modified condition/decision coverage(MC/DC),which is an important testing coverage criterion widely used for mission-critical software.This study describes our use of a state-of-the-art symbolic execution tool to carry out extensive experiments to study the impact of compiler optimizations on symbolic execution w.r.t.MC/DC.The results indicate that instruction combining(IC)optimization is the important and dominant optimization for symbolic execution w.r.t.MC/DC.We designed and implemented a support vector machine based optimization recommendation method w.r.t.IC(denoted as auto).The experiments on two standard benchmarks(Coreutils and NECLA)showed that auto achieves the best MC/DC on 67.47%of Coreutils programs and 78.26%of NECLA programs.展开更多
Constraint based program analysis is widely used in program validation, program vulnerability analysis, etc. This paper proposes a temporal correlation function to protect programs from analysis. The temporal correlat...Constraint based program analysis is widely used in program validation, program vulnerability analysis, etc. This paper proposes a temporal correlation function to protect programs from analysis. The temporal correlation function can be applied to resist against both static and dynamic function summary and eoncolie testing. What' s more, the temporal correlation function can produce different outputs even with same input. This feature can be used to damage the premise of function summary as well as prevent concolie testing process to run the new branch with new input. Experiment results show that this method can reduce efficiency and path coverage of concolic testing, while greatly in- creasing the difficulty of constraint based program analysis.展开更多
With the increase of software complexity,the security threats faced by the software are also increasing day by day.So people pay more and more attention to the mining of software vulnerabilities.Although source code h...With the increase of software complexity,the security threats faced by the software are also increasing day by day.So people pay more and more attention to the mining of software vulnerabilities.Although source code has rich semantics and strong comprehensibility,source code vulnerability mining has been widely used and has achieved significant development.However,due to the protection of commercial interests and intellectual property rights,it is difficult to obtain source code.Therefore,the research on the vulnerability mining technology of binary code has strong practical value.Based on the investigation of related technologies,this article firstly introduces the current typical binary vulnerability analysis framework,and then briefly introduces the research background and significance of the intermediate language;with the rise of artificial intelligence,a large number of machine learning methods have been tried to solve the problem of binary vulnerability mining.This article divides the current related binary vulnerabilities mining technology into traditional mining technology and machine learning mining technology,respectively introduces its basic principles,research status and existing problems,and briefly summarizes them.Finally,based on the existing research work,this article puts forward the prospect of the future research on the technology of binary program vulnerability mining.展开更多
Debugging software code has been a challenge for software developers since the early days of computer programming. A simple need, because the world is run by software. So perhaps the biggest engineering challenge is f...Debugging software code has been a challenge for software developers since the early days of computer programming. A simple need, because the world is run by software. So perhaps the biggest engineering challenge is finding ways to make software more reliable. This review provides an overview of techniques developed over time in the field of software model checking to solve the problem of detecting errors in program code. In addition, the challenges posed by this technology are discussed and ways to mitigate them in future research and applications are proposed. A comprehensive examination of the various model verification methods used to detect program code errors is intended to lay the foundation for future research in this area.展开更多
An application programming interface (API) usage specifcation, which includes the conditions, calling sequences, and semantic relationships of the API, is important for verifying its correct usage, which is in turn cr...An application programming interface (API) usage specifcation, which includes the conditions, calling sequences, and semantic relationships of the API, is important for verifying its correct usage, which is in turn critical for ensur-ingthe security and availability of the target program. However, existing techniques either mine the co-occurring relationships of multiple APIs without considering their semantic relationships, or they use data fow and control fow information to extract semantic beliefs on API pairs but difcult to incorporate when mining specifcations for mul-tipleAPIs. Hence, we propose an API specifcation mining approach that efciently extracts a relatively complete list of the API combinations and semantic relationships between APIs. This approach analyzes a target program in two stages. The frst stage uses frequent API set mining based on frequent common API identifcation and fltra-tionto extract the maximal set of frequent context-sensitive API sequences. In the second stage, the API relationship graph is constructed using three semantic relationships extracted from the symbolic path information, and the speci-fcationscontaining semantic relationships for multiple APIs are mined. The experimental results on six popular open-source code bases of diferent scales show that the proposed two-stage approach not only yields better results than existing typical approaches, but also can efectively discover the specifcations along with the semantic rela-tionshipsfor multiple APIs. Instance analysis shows that the analysis of security-related API call violations can assist in the cause analysis and patch of software vulnerabilities.展开更多
Memory leaks are a common type of defect that is hard to detect manually. Existing memory leak detection tools suffer from lack of precise interprocedural analysis and path-sensitivity. To address this problem, we pre...Memory leaks are a common type of defect that is hard to detect manually. Existing memory leak detection tools suffer from lack of precise interprocedural analysis and path-sensitivity. To address this problem, we present a static interprocedural analysis algorithm, that performs fully pathsensitive analysis and captures precise function behaviors, to detect memory leak in C programs. The proposed algorithm uses path-sensitive symbolic execution to track memory actions in different program paths guarded by path conditions. A novel analysis model called memory state transition graph (MSTG) is proposed to describe the tracking process and its results. In order to do interprocedural analysis, the proposed algorithm generates a summary for each procedure from MSTG and applies the summary at the procedure's call sites. A prototype tool called Melton is implemented for this procedure. Melton was applied to five open source C programs and 41 leaks were found. More than 90% of these leaks were subsequently confirmed and fixed by their maintainers. For comparison with other tools, Melton was also applied to some programs in standard performance evaluation corporation (SPEC) CPU 2000 benchmark suite and detected more leaks than the state of the art approaches.展开更多
To enhance training in software development,we argue that students of software engineering should be exposed to software development activities early in the curriculum.This entails meeting the challenge of engaging st...To enhance training in software development,we argue that students of software engineering should be exposed to software development activities early in the curriculum.This entails meeting the challenge of engaging students in software development before they take the software engineering course.In this paper,we propose a method to connect courses in the software engineering curriculum by setting comprehensive development projects to students in prerequisite courses for software development.Using the Discrete Mathematics(DM)course as an example,we describe the implementation of the proposed method and teaching practices using several practical and comprehensive projects derived from topics in discrete mathematics.Detailed descriptions of the sample projects,their application,and training results are given.Results and lessons learned from applying these practices show that it is a promising way to connect courses in the software engineering curriculum.展开更多
Static buffer overflow detection techniques tend to report too many false positives fundamentally due to the lack of software execution information. It is very time consuming to manually inspect all the static warning...Static buffer overflow detection techniques tend to report too many false positives fundamentally due to the lack of software execution information. It is very time consuming to manually inspect all the static warnings. In this paper, we propose BovInspector, a framework for automatically validating static buffer overflow warnings and providing suggestions for automatic repair of true buffer overflow warnings for C programs. Given the program source code and the static buffer overflow warnings, BovInspector first performs warning reachability analysis. Then, BovInspector executes the source code symbolically under the guidance of reachable warnings. Each reachable warning is validated and classified by checking whether all the path conditions and the buffer overflow constraints can be satisfied simultaneously. For each validated true warning, BovInspector provides suggestions to automatically repair it with 11 repair strategies. BovInspector is complementary to prior static buffer overflow discovery schemes. Experimental results on real open source programs show that BovInspector can automatically validate on average 60% of total warnings reported by static tools.展开更多
Exploitability assessment of vulnerabilities is important for both defenders and attackers.The ultimate way to assess the exploitability is crafting a working exploit.However,it usually takes tremendous hours and sign...Exploitability assessment of vulnerabilities is important for both defenders and attackers.The ultimate way to assess the exploitability is crafting a working exploit.However,it usually takes tremendous hours and significant manual efforts.To address this issue,automated techniques can be adopted.Existing solutions usually explore in depth the crashing paths,i.e.,paths taken by proof-of-concept(PoC)inputs triggering vulnerabilities,and assess exploitability by finding exploitable states along the paths.However,exploitable states do not always exist in crashing paths.Moreover,existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation.In this paper,we propose a novel solution to generate exploit for userspace programs or facilitate the process of crafting a kernel UAF exploit.Technically,we utilize oriented fuzzing to explore diverging paths from vulnerability point.For userspace programs,we adopt a control-flow stitching solution to stitch crashing paths and diverging paths together to generate exploit.For kernel UAF,we leverage a lightweight symbolic execution to identify,analyze and evaluate the system calls valuable and useful for exploiting vulnerabilities.We have developed a prototype system and evaluated it on a set of 19 CTF(capture the flag)programs and 15 realworld Linux kernel UAF vulnerabilities.Experiment results showed it could generate exploit for most of the userspace test set,and it could also facilitate security mitigation bypassing and exploitability evaluation for kernel test set.展开更多
Decompiling, as a means of analysing and understanding software, has great practical value. This paper presents a kind of decompiling method offered by the authors,in which the techniques of library-function pattern r...Decompiling, as a means of analysing and understanding software, has great practical value. This paper presents a kind of decompiling method offered by the authors,in which the techniques of library-function pattern recognition, intermediate language,symbolic execution, rule-based 4ata type recovery program transformation, and knowledge engineering are separately aPPlied to diIfernt phases of decompiling. Then it is discussed that the techulques of developing expert systems are adopted to build a decompiling system shell independent of the knowledge of language and program runningenvironment. The shell will become a real decompiler, as long as the new knowledgeof application environment is interactively acqired.展开更多
Automated test generation tools enable test automation and further alleviate the low efficiency caused by writing hand-crafted test cases.However,existing automated tools are not mature enough to be widely used by sof...Automated test generation tools enable test automation and further alleviate the low efficiency caused by writing hand-crafted test cases.However,existing automated tools are not mature enough to be widely used by software testing groups.This paper conducts an empirical study on the state-of-the-art automated tools for Java,i.e.,EvoSuite,Randoop,JDoop,JTeXpert,T3,and Tardis.We design a test workflow to facilitate the process,which can automatically run tools for test generation,collect data,and evaluate various metrics.Furthermore,we conduct empirical analysis on these six tools and their related techniques from different aspects,i.e.,code coverage,mutation score,test suite size,readability,and real fault detection ability.We discuss about the benefits and drawbacks of hybrid techniques based on experimental results.Besides,we introduce our experience in setting up and executing these tools,and summarize their usability and user-friendliness.Finally,we give some insights into automated tools in terms of test suite readability improvement,meaningful assertion generation,test suite reduction for random testing tools,and symbolic execution integration.展开更多
Exploitability assessment of vulnerabilities is important for both defenders and attackers.The ultimate way to assess the exploitability is crafting a working exploit.However,it usually takes tremendous hours and sign...Exploitability assessment of vulnerabilities is important for both defenders and attackers.The ultimate way to assess the exploitability is crafting a working exploit.However,it usually takes tremendous hours and significant manual efforts.To address this issue,automated techniques can be adopted.Existing solutions usually explore in depth the crashing paths,i.e.,paths taken by proof-of-concept(PoC)inputs triggering vulnerabilities,and assess exploitability by finding exploitable states along the paths.However,exploitable states do not always exist in crashing paths.Moreover,existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation.In this paper,we propose a novel solution to generate exploit for userspace programs or facilitate the process of crafting a kernel UAF exploit.Technically,we utilize oriented fuzzing to explore diverging paths from vulnerability point.For userspace programs,we adopt a control-flow stitching solution to stitch crashing paths and diverging paths together to generate exploit.For kernel UAF,we leverage a lightweight symbolic execution to identify,analyze and evaluate the system calls valuable and useful for exploiting vulnerabilities.We have developed a prototype system and evaluated it on a set of 19 CTF(capture the flag)programs and 15 realworld Linux kernel UAF vulnerabilities.Experiment results showed it could generate exploit for most of the userspace test set,and it could also facilitate security mitigation bypassing and exploitability evaluation for kernel test set.展开更多
基金supported byOpen Foundation of Key Laboratory of Cyberspace Security,Ministry of Education of China(No.KLCS20240211)Henan Science and Technology Major Project No.241110210100.
文摘Code obfuscation is a crucial technique for protecting software against reverse engineering and security attacks.Among various obfuscation methods,opaque predicates,which are recognized as flexible and promising,are widely used to increase control-flow complexity.However,traditional opaque predicates are increasingly vulnerable to Dynamic Symbolic Execution(DSE)attacks,which can efficiently identify and eliminate them.To address this issue,this paper proposes a novel approach for anti-DSE opaque predicates that effectively resists symbolic execution-based deobfuscation.Our method introduces two key techniques:single-way function opaque predicates,which leverage hash functions and logarithmic transformations to prevent constraint solvers from generating feasible inputs,and path-explosion opaque predicates,which generate an excessive number of execution paths,overwhelming symbolic execution engines.To evaluate the effectiveness of our approach,we implemented a prototype obfuscation tool and tested it against prominent symbolic execution engines.Experimental results demonstrate that our approach signifi-cantly increases resilience against symbolic execution attacks while maintaining acceptable performance overhead.This paper provides a robust and scalable obfuscation technique,contributing to the enhancement of software protection strategies in adversarial environments.
基金supported by the National Key Research and Development Program of China(2020YFB1005704).
文摘Since the advent of smart contracts,security vulnerabilities have remained a persistent challenge,compromsing both the reliability of contract execution and the overall stability of the virtual currency market.Consequently,the academic community has devoted increasing attention to these security risks.However,conventional approaches to vulnerability detection frequently exhibit limited accuracy.To address this limitation,the present study introduces a novel vulnerability detection framework called GNNSE that integrates symbolic execution with graph neural networks(GNNs).The proposedmethod first constructs semantic graphs to comprehensively capture the control flow and data flow dependencies within smart contracts.These graphs are subsequently processed using GNNs to efficiently identify contracts with a high likelihood of vulnerabilities.For these high-risk contracts,symbolic execution is employed to perform fine-grained,path-level analysis,thereby improving overall detection precision.Experimental results on a dataset comprising 10,079 contracts demonstrate that the proposed method achieves detection precisions of 93.58% for reentrancy vulnerabilities and 92.73% for timestamp-dependent vulnerabilities.
文摘Symbolic execution is widely used in many code analysis, testing, and verification tools. As symbolic execu- tion exhaustively explores all feasible paths, it is quite time consuming. To handle the problem, researchers have par- alleled existing symbolic execution tools (e.g., KLEE). In particular, Cloud9 is a widely used paralleled symbolic exe- cution tool, and researchers have used the tool to analyze real code. However, researchers criticize that tools such as Cloud9 still cannot analyze large scale code. In this paper, we con- duct a field study on Cloud9, in which we use KLEE and Cloud9 to analyze benchmarks in C. Our results confirm the criticism. Based on the results, we identify three bottlenecks that hinder the performance of Cloud9: the communication time gap, the job transfer policy, and the cache management of the solved constraints. To handle these problems, we tune the communication time gap with better parameters, modify the job transfer policy, and implement an approach for cache management of solved constraints. We conduct two evalua- tions on our benchmarks and a real application to understand our improvements. Our results show that our tuned Cloud9 reduces the execution time significantly, both on our bench- marks and the real application. Furthermore, our evaluation results show that our tuning techniques improve the effective- ness on all the devices, and the improvement can be achievedupto five times, depending upon a tuning value of our ap- proach and the behaviour of program under test.
文摘Software security analysts typically only have access to the executable program and cannot directly access the source code of the program.This poses significant challenges to security analysis.While it is crucial to identify vulnerabilities in such non-source code programs,there exists a limited set of generalized tools due to the low versatility of current vulnerability mining methods.However,these tools suffer from some shortcomings.In terms of targeted fuzzing,the path searching for target points is not streamlined enough,and the completely random testing leads to an excessively large search space.Additionally,when it comes to code similarity analysis,there are issues with incomplete code feature extraction,which may result in information loss.In this paper,we propose a cross-platform and cross-architecture approach to exploit vulnerabilities using neural network obfuscation techniques.By leveraging the Angr framework,a deobfuscation technique is introduced,along with the adoption of a VEX-IR-based intermediate language conversion method.This combination allows for the unified handling of binary programs across various architectures,compilers,and compilation options.Subsequently,binary programs are processed to extract multi-level spatial features using a combination of a skip-gram model with self-attention mechanism and a bidirectional Long Short-Term Memory(LSTM)network.Finally,the graph embedding network is utilized to evaluate the similarity of program functionalities.Based on these similarity scores,a target function is determined,and symbolic execution is applied to solve the target function.The solved content serves as the initial seed for targeted fuzzing.The binary program is processed by using the de-obfuscation technique and intermediate language transformation method,and then the similarity of program functions is evaluated by using a graph embedding network,and symbolic execution is performed based on these similarity scores.This approach facilitates cross-architecture analysis of executable programs without their source codes and concurrently reduces the risk of symbolic execution path explosion.
基金Project supported by the National Key R&D Program of China(No.2017YFB1001802)the National Natural Science Foundation of China(Nos.61472440,61632015,61690203,and 61532007)。
文摘Symbolic execution is an effective way of systematically exploring the search space of a program,and is often used for automatic software testing and bug finding.The program to be analyzed is usually compiled into a binary or an intermediate representation,on which symbolic execution is carried out.During this process,compiler optimizations influence the effectiveness and efficiency of symbolic execution.However,to the best of our knowledge,there exists no work on compiler optimization recommendation for symbolic execution with respect to(w.r.t.)modified condition/decision coverage(MC/DC),which is an important testing coverage criterion widely used for mission-critical software.This study describes our use of a state-of-the-art symbolic execution tool to carry out extensive experiments to study the impact of compiler optimizations on symbolic execution w.r.t.MC/DC.The results indicate that instruction combining(IC)optimization is the important and dominant optimization for symbolic execution w.r.t.MC/DC.We designed and implemented a support vector machine based optimization recommendation method w.r.t.IC(denoted as auto).The experiments on two standard benchmarks(Coreutils and NECLA)showed that auto achieves the best MC/DC on 67.47%of Coreutils programs and 78.26%of NECLA programs.
基金Supported by the National Natural Science Foundation of China(No.61121061)National Key Technology R&D Program(No.2012BAH38B02,2012BAH06B00)
文摘Constraint based program analysis is widely used in program validation, program vulnerability analysis, etc. This paper proposes a temporal correlation function to protect programs from analysis. The temporal correlation function can be applied to resist against both static and dynamic function summary and eoncolie testing. What' s more, the temporal correlation function can produce different outputs even with same input. This feature can be used to damage the premise of function summary as well as prevent concolie testing process to run the new branch with new input. Experiment results show that this method can reduce efficiency and path coverage of concolic testing, while greatly in- creasing the difficulty of constraint based program analysis.
基金This paper is based on the funding of the following two projects:Research on Key Technologies of User Location Privacy Protection and Data Integrity Verification under Mobile P2P Architecture,Project No.(619QN193)Research on Security Vulnerability Detection Technology of Open Source Software Based on Deep Learning,Project No.(ZDYF2020212).
文摘With the increase of software complexity,the security threats faced by the software are also increasing day by day.So people pay more and more attention to the mining of software vulnerabilities.Although source code has rich semantics and strong comprehensibility,source code vulnerability mining has been widely used and has achieved significant development.However,due to the protection of commercial interests and intellectual property rights,it is difficult to obtain source code.Therefore,the research on the vulnerability mining technology of binary code has strong practical value.Based on the investigation of related technologies,this article firstly introduces the current typical binary vulnerability analysis framework,and then briefly introduces the research background and significance of the intermediate language;with the rise of artificial intelligence,a large number of machine learning methods have been tried to solve the problem of binary vulnerability mining.This article divides the current related binary vulnerabilities mining technology into traditional mining technology and machine learning mining technology,respectively introduces its basic principles,research status and existing problems,and briefly summarizes them.Finally,based on the existing research work,this article puts forward the prospect of the future research on the technology of binary program vulnerability mining.
文摘Debugging software code has been a challenge for software developers since the early days of computer programming. A simple need, because the world is run by software. So perhaps the biggest engineering challenge is finding ways to make software more reliable. This review provides an overview of techniques developed over time in the field of software model checking to solve the problem of detecting errors in program code. In addition, the challenges posed by this technology are discussed and ways to mitigate them in future research and applications are proposed. A comprehensive examination of the various model verification methods used to detect program code errors is intended to lay the foundation for future research in this area.
文摘An application programming interface (API) usage specifcation, which includes the conditions, calling sequences, and semantic relationships of the API, is important for verifying its correct usage, which is in turn critical for ensur-ingthe security and availability of the target program. However, existing techniques either mine the co-occurring relationships of multiple APIs without considering their semantic relationships, or they use data fow and control fow information to extract semantic beliefs on API pairs but difcult to incorporate when mining specifcations for mul-tipleAPIs. Hence, we propose an API specifcation mining approach that efciently extracts a relatively complete list of the API combinations and semantic relationships between APIs. This approach analyzes a target program in two stages. The frst stage uses frequent API set mining based on frequent common API identifcation and fltra-tionto extract the maximal set of frequent context-sensitive API sequences. In the second stage, the API relationship graph is constructed using three semantic relationships extracted from the symbolic path information, and the speci-fcationscontaining semantic relationships for multiple APIs are mined. The experimental results on six popular open-source code bases of diferent scales show that the proposed two-stage approach not only yields better results than existing typical approaches, but also can efectively discover the specifcations along with the semantic rela-tionshipsfor multiple APIs. Instance analysis shows that the analysis of security-related API call violations can assist in the cause analysis and patch of software vulnerabilities.
基金This work was partially supported by the 973 Program of China (2014CB340701) and the National Natural Science Foundation of China (Grant No. 61003026).
文摘Memory leaks are a common type of defect that is hard to detect manually. Existing memory leak detection tools suffer from lack of precise interprocedural analysis and path-sensitivity. To address this problem, we present a static interprocedural analysis algorithm, that performs fully pathsensitive analysis and captures precise function behaviors, to detect memory leak in C programs. The proposed algorithm uses path-sensitive symbolic execution to track memory actions in different program paths guarded by path conditions. A novel analysis model called memory state transition graph (MSTG) is proposed to describe the tracking process and its results. In order to do interprocedural analysis, the proposed algorithm generates a summary for each procedure from MSTG and applies the summary at the procedure's call sites. A prototype tool called Melton is implemented for this procedure. Melton was applied to five open source C programs and 41 leaks were found. More than 90% of these leaks were subsequently confirmed and fixed by their maintainers. For comparison with other tools, Melton was also applied to some programs in standard performance evaluation corporation (SPEC) CPU 2000 benchmark suite and detected more leaks than the state of the art approaches.
基金supported in part by the National Key R&D Program of China (No. 2018YFB1004202)
文摘To enhance training in software development,we argue that students of software engineering should be exposed to software development activities early in the curriculum.This entails meeting the challenge of engaging students in software development before they take the software engineering course.In this paper,we propose a method to connect courses in the software engineering curriculum by setting comprehensive development projects to students in prerequisite courses for software development.Using the Discrete Mathematics(DM)course as an example,we describe the implementation of the proposed method and teaching practices using several practical and comprehensive projects derived from topics in discrete mathematics.Detailed descriptions of the sample projects,their application,and training results are given.Results and lessons learned from applying these practices show that it is a promising way to connect courses in the software engineering curriculum.
基金This work was supported by the National Natural Science Foundation of China under Grant No.62032010partially by the Postgraduate Research and Practice Innovation Program of Jiangsu Province of China.
文摘Static buffer overflow detection techniques tend to report too many false positives fundamentally due to the lack of software execution information. It is very time consuming to manually inspect all the static warnings. In this paper, we propose BovInspector, a framework for automatically validating static buffer overflow warnings and providing suggestions for automatic repair of true buffer overflow warnings for C programs. Given the program source code and the static buffer overflow warnings, BovInspector first performs warning reachability analysis. Then, BovInspector executes the source code symbolically under the guidance of reachable warnings. Each reachable warning is validated and classified by checking whether all the path conditions and the buffer overflow constraints can be satisfied simultaneously. For each validated true warning, BovInspector provides suggestions to automatically repair it with 11 repair strategies. BovInspector is complementary to prior static buffer overflow discovery schemes. Experimental results on real open source programs show that BovInspector can automatically validate on average 60% of total warnings reported by static tools.
基金This work is supported by the Key Laboratory of Network Assessment Technology,Chinese Academy of Sciences and Beijing Key Laboratory of Network Security and Protection Technology,as well as Beijing Municipal Science and Technology Project(No.Z181100002718002)National Natural Science Foundation of China(No.61572481 and 61602470,61772308,61472209,61502536,and U1736209)and Young Elite Scientists Sponsorship Program by CAST(No.2016QNRC001).
文摘Exploitability assessment of vulnerabilities is important for both defenders and attackers.The ultimate way to assess the exploitability is crafting a working exploit.However,it usually takes tremendous hours and significant manual efforts.To address this issue,automated techniques can be adopted.Existing solutions usually explore in depth the crashing paths,i.e.,paths taken by proof-of-concept(PoC)inputs triggering vulnerabilities,and assess exploitability by finding exploitable states along the paths.However,exploitable states do not always exist in crashing paths.Moreover,existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation.In this paper,we propose a novel solution to generate exploit for userspace programs or facilitate the process of crafting a kernel UAF exploit.Technically,we utilize oriented fuzzing to explore diverging paths from vulnerability point.For userspace programs,we adopt a control-flow stitching solution to stitch crashing paths and diverging paths together to generate exploit.For kernel UAF,we leverage a lightweight symbolic execution to identify,analyze and evaluate the system calls valuable and useful for exploiting vulnerabilities.We have developed a prototype system and evaluated it on a set of 19 CTF(capture the flag)programs and 15 realworld Linux kernel UAF vulnerabilities.Experiment results showed it could generate exploit for most of the userspace test set,and it could also facilitate security mitigation bypassing and exploitability evaluation for kernel test set.
文摘Decompiling, as a means of analysing and understanding software, has great practical value. This paper presents a kind of decompiling method offered by the authors,in which the techniques of library-function pattern recognition, intermediate language,symbolic execution, rule-based 4ata type recovery program transformation, and knowledge engineering are separately aPPlied to diIfernt phases of decompiling. Then it is discussed that the techulques of developing expert systems are adopted to build a decompiling system shell independent of the knowledge of language and program runningenvironment. The shell will become a real decompiler, as long as the new knowledgeof application environment is interactively acqired.
基金supported by the National Natural Science Foundation of China under Grant Nos.62072225 and 62025202.
文摘Automated test generation tools enable test automation and further alleviate the low efficiency caused by writing hand-crafted test cases.However,existing automated tools are not mature enough to be widely used by software testing groups.This paper conducts an empirical study on the state-of-the-art automated tools for Java,i.e.,EvoSuite,Randoop,JDoop,JTeXpert,T3,and Tardis.We design a test workflow to facilitate the process,which can automatically run tools for test generation,collect data,and evaluate various metrics.Furthermore,we conduct empirical analysis on these six tools and their related techniques from different aspects,i.e.,code coverage,mutation score,test suite size,readability,and real fault detection ability.We discuss about the benefits and drawbacks of hybrid techniques based on experimental results.Besides,we introduce our experience in setting up and executing these tools,and summarize their usability and user-friendliness.Finally,we give some insights into automated tools in terms of test suite readability improvement,meaningful assertion generation,test suite reduction for random testing tools,and symbolic execution integration.
基金supported by the Key Laboratory of Network Assessment TechnologyChinese Academy of Sciences and Beijing Key Laboratory of Network Security and Protection Technology+2 种基金Beijing Municipal Science and Technology Project(No.Z181100002718002)National Natural Science Foundation of China(No.61572481 and 61602470,61772308,61472209,61502536,and U1736209)Young Elite Scientists Sponsorship Program by CAST(No.2016QNRC001).
文摘Exploitability assessment of vulnerabilities is important for both defenders and attackers.The ultimate way to assess the exploitability is crafting a working exploit.However,it usually takes tremendous hours and significant manual efforts.To address this issue,automated techniques can be adopted.Existing solutions usually explore in depth the crashing paths,i.e.,paths taken by proof-of-concept(PoC)inputs triggering vulnerabilities,and assess exploitability by finding exploitable states along the paths.However,exploitable states do not always exist in crashing paths.Moreover,existing solutions heavily rely on symbolic execution and are not scalable in path exploration and exploit generation.In this paper,we propose a novel solution to generate exploit for userspace programs or facilitate the process of crafting a kernel UAF exploit.Technically,we utilize oriented fuzzing to explore diverging paths from vulnerability point.For userspace programs,we adopt a control-flow stitching solution to stitch crashing paths and diverging paths together to generate exploit.For kernel UAF,we leverage a lightweight symbolic execution to identify,analyze and evaluate the system calls valuable and useful for exploiting vulnerabilities.We have developed a prototype system and evaluated it on a set of 19 CTF(capture the flag)programs and 15 realworld Linux kernel UAF vulnerabilities.Experiment results showed it could generate exploit for most of the userspace test set,and it could also facilitate security mitigation bypassing and exploitability evaluation for kernel test set.