Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax...Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax or programming language technologies.However,writing Pseudo-code to each code instruction is laborious.Recently,neural machine translation is used to generate textual descriptions for the source code.In this paper,a novel deep learning-based transformer(DLBT)model is proposed for automatic Pseudo-code generation from the source code.The proposed model uses deep learning which is based on Neural Machine Translation(NMT)to work as a language translator.The DLBT is based on the transformer which is an encoder-decoder structure.There are three major components:tokenizer and embeddings,transformer,and post-processing.Each code line is tokenized to dense vector.Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network(RNN).At the post-processing step,the generated Pseudo-code is optimized.The proposed model is assessed using a real Python dataset,which contains more than 18,800 lines of a source code written in Python.The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network(RNN).The proposed DLBT records 47.32,68.49 accuracy and BLEU performance measures,respectively.展开更多
Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain...Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain under most circumstances,binary-level code similarity analysis(BCSA)has been paid much attention to.In recent years,many BCSA studies incorporating Al techniques focus on deriving semantic information from binary functions with code representations such as assembly code,intermediate representations,and control flow graphs to measure the similarity.However,due to the impacts of different compilers,architectures,and obfuscations,binaries compiled from the same source code may vary considerably,which becomes the major obstacle for these works to obtain robust features.In this paper,we propose a solution,named UPPC(Unleashing the Power of Pseudo-code),which leverages the pseudo-code of binary function as input,to address the binary code similarity analysis challenge,since pseudocode has higher abstraction and is platform-independent compared to binary instructions.UPPC selectively inlines the functions to capture the full function semantics across different compiler optimization levels and uses a deep pyramidal convolutional neural network to obtain the semantic embedding of the function.We evaluated UPPC on a data set containing vulnerabilities and a data set including different architectures(X86,ARM),different optimization options(O0-O3),different compilers(GCC,Clang),and four obfuscation strategies.The experimental results show that the accuracy of UPPC in function search is 33.2%higher than that of existing methods.展开更多
GNSS信号的空间信号(SIS)质量直接影响用户的定位、测试和授时(Positioning,Velocity and Timing,PVT)服务精度,但由于授权信号保密等原因,授权信号的伪码序列未知,卫星导航系统授权信号质量评估存在一定的困难性。该文主要分析GPS BII...GNSS信号的空间信号(SIS)质量直接影响用户的定位、测试和授时(Positioning,Velocity and Timing,PVT)服务精度,但由于授权信号保密等原因,授权信号的伪码序列未知,卫星导航系统授权信号质量评估存在一定的困难性。该文主要分析GPS BIIF-5卫星L1频点的相干自适应副载波调制(CASM)信号,利用匹配滤波理论恢复出采集数据中的P(Y)码和M码两个授权信号分量的伪码符号,采用极大似然估计结合信号分布特点准确求解出各信号分量之间的功率分配。重点分析P(Y)码和M码信号相关性能,包含相关曲线、相关损失和S曲线过零点偏差(S-Curve bias),定量地评估了授权信号的空间信号质量。提出完整的基于GPS L1频点授权信号质量评估方法,研究成果可作为其他卫星导航系统授权信号质量评估的参考。展开更多
文摘Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax or programming language technologies.However,writing Pseudo-code to each code instruction is laborious.Recently,neural machine translation is used to generate textual descriptions for the source code.In this paper,a novel deep learning-based transformer(DLBT)model is proposed for automatic Pseudo-code generation from the source code.The proposed model uses deep learning which is based on Neural Machine Translation(NMT)to work as a language translator.The DLBT is based on the transformer which is an encoder-decoder structure.There are three major components:tokenizer and embeddings,transformer,and post-processing.Each code line is tokenized to dense vector.Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network(RNN).At the post-processing step,the generated Pseudo-code is optimized.The proposed model is assessed using a real Python dataset,which contains more than 18,800 lines of a source code written in Python.The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network(RNN).The proposed DLBT records 47.32,68.49 accuracy and BLEU performance measures,respectively.
文摘Code similarity analysis has become more popular due to its significant applicantions,including vulnerability detection,malware detection,and patch analysis.Since the source code of the software is difficult to obtain under most circumstances,binary-level code similarity analysis(BCSA)has been paid much attention to.In recent years,many BCSA studies incorporating Al techniques focus on deriving semantic information from binary functions with code representations such as assembly code,intermediate representations,and control flow graphs to measure the similarity.However,due to the impacts of different compilers,architectures,and obfuscations,binaries compiled from the same source code may vary considerably,which becomes the major obstacle for these works to obtain robust features.In this paper,we propose a solution,named UPPC(Unleashing the Power of Pseudo-code),which leverages the pseudo-code of binary function as input,to address the binary code similarity analysis challenge,since pseudocode has higher abstraction and is platform-independent compared to binary instructions.UPPC selectively inlines the functions to capture the full function semantics across different compiler optimization levels and uses a deep pyramidal convolutional neural network to obtain the semantic embedding of the function.We evaluated UPPC on a data set containing vulnerabilities and a data set including different architectures(X86,ARM),different optimization options(O0-O3),different compilers(GCC,Clang),and four obfuscation strategies.The experimental results show that the accuracy of UPPC in function search is 33.2%higher than that of existing methods.