The advent of CUDA-enabled GPU makes it possible to provide cloud applications with high-performance data security services.Unfortunately,recent studies have shown that GPU-based applications are also susceptible to s...The advent of CUDA-enabled GPU makes it possible to provide cloud applications with high-performance data security services.Unfortunately,recent studies have shown that GPU-based applications are also susceptible to side-channel attacks.These published work studied the side-channel vulnerabilities of GPU-based AES implementations by taking the advantage of the cache sharing among multiple threads or high parallelism of GPUs.Therefore,for GPU-based bitsliced cryptographic implementations,which are immune to the cache-based attacks referred to above,only a power analysis method based on the high-parallelism of GPUs may be effective.However,the leakage model used in the power analysis is not efficient at all in practice.In light of this,we investigate electro-magnetic(EM)side-channel vulnerabilities of a GPU-based bitsliced AES implementation from the perspective of bit-level parallelism and thread-level parallelism in order to make the best of the localization effect of EM leakage with parallelism.Specifically,we propose efficient multi-bit and multi-thread combinational analysis techniques based on the intrinsic properties of bitsliced ciphers and the effect of multi-thread parallelism of GPUs,respectively.The experimental result shows that the proposed combinational analysis methods perform better than non-combinational and intuitive ones.Our research suggests that multi-thread leakages can be used to improve attacks if the multi-thread leakages are not synchronous in the time domain.展开更多
The advent of CUDA-enabled GPU makes it possible to provide cloud applications with high-performance data security services.Unfortunately,recent studies have shown that GPU-based applications are also susceptible to s...The advent of CUDA-enabled GPU makes it possible to provide cloud applications with high-performance data security services.Unfortunately,recent studies have shown that GPU-based applications are also susceptible to side-channel attacks.These published work studied the side-channel vulnerabilities of GPU-based AES implementations by taking the advantage of the cache sharing among multiple threads or high parallelism of GPUs.Therefore,for GPU-based bitsliced cryptographic implementations,which are immune to the cache-based attacks referred to above,only a power analysis method based on the high-parallelism of GPUs may be effective.However,the leakage model used in the power analysis is not efficient at all in practice.In light of this,we investigate electro-magnetic(EM)side-channel vulnerabilities of a GPU-based bitsliced AES implementation from the perspective of bit-level parallelism and thread-level parallelism in order to make the best of the localization effect of EM leakage with parallelism.Specifically,we propose efficient multi-bit and multi-thread combinational analysis techniques based on the intrinsic properties of bitsliced ciphers and the effect of multi-thread parallelism of GPUs,respectively.The experimental result shows that the proposed combinational analysis methods perform better than non-combinational and intuitive ones.Our research suggests that multi-thread leakages can be used to improve attacks if the multi-thread leakages are not synchronous in the time domain.展开更多
ttigh-performance computing (HPC) is essential for both traditional and emerging scientific fields, enabling scientific activities to make progress. With the development of high-performance computing, it is foreseea...ttigh-performance computing (HPC) is essential for both traditional and emerging scientific fields, enabling scientific activities to make progress. With the development of high-performance computing, it is foreseeable that exascale computing will be put into practice around 2020. As Moore's law approaches its limit, high-perfornlance computing will face severe challenges when moving from exaseale to zettascale, making tile next 10 years after 2020 a vital period to develop key HPC techniques. In this study, we discuss the challenges of enabling zettascale computing with respect to both hardware and software. We then present a perspective of fllture HPC technology evolution and revolution, leading to our main recommendations in support of zettaseale computing in the coming future.展开更多
The performance of high-performance computing(HPC)and other real-world applications is becoming unpredictable as the micro-architecture of the modern central processing unit(CPU)turns to be more and more complex.As a ...The performance of high-performance computing(HPC)and other real-world applications is becoming unpredictable as the micro-architecture of the modern central processing unit(CPU)turns to be more and more complex.As a consequence,predicting the execution time of a code snippet is notoriously difficult.Basic block throughput predictor is a crucial feature of the static code analyzer.It offers a ubiquitous method for predicting the execution time of a basic block.In this article,we build a workflow to faithfully run,collect and analyze basic blocks from real-world applications.Several static code analyzers are introduced,compared,and optimized to show which one performs better on accuracy and other metrics on a Kunpeng 920 processor.Through extensive experiments,we achieve state-of-the-art 86.7%accuracy in predicting the throughput of all basic blocks.Moreover,we showcase the potential applications of our optimized static code analyzer in two certain aspects:1.Guiding the application’s optimization through bottleneck analysis and 2.Exploiting the potential bottleneck of a CPU on a certain workload through fast hardware pre-evaluation.展开更多
基金This work was supported in part by National Natural Science Foundation of China(No.61632020,UI936209)Beijing National Science Foundation(No.4192067).
文摘The advent of CUDA-enabled GPU makes it possible to provide cloud applications with high-performance data security services.Unfortunately,recent studies have shown that GPU-based applications are also susceptible to side-channel attacks.These published work studied the side-channel vulnerabilities of GPU-based AES implementations by taking the advantage of the cache sharing among multiple threads or high parallelism of GPUs.Therefore,for GPU-based bitsliced cryptographic implementations,which are immune to the cache-based attacks referred to above,only a power analysis method based on the high-parallelism of GPUs may be effective.However,the leakage model used in the power analysis is not efficient at all in practice.In light of this,we investigate electro-magnetic(EM)side-channel vulnerabilities of a GPU-based bitsliced AES implementation from the perspective of bit-level parallelism and thread-level parallelism in order to make the best of the localization effect of EM leakage with parallelism.Specifically,we propose efficient multi-bit and multi-thread combinational analysis techniques based on the intrinsic properties of bitsliced ciphers and the effect of multi-thread parallelism of GPUs,respectively.The experimental result shows that the proposed combinational analysis methods perform better than non-combinational and intuitive ones.Our research suggests that multi-thread leakages can be used to improve attacks if the multi-thread leakages are not synchronous in the time domain.
基金supported in part by National Natural Science Foundation of China(No.61632020,UI936209)Beijing National Science Foundation(No.4192067).
文摘The advent of CUDA-enabled GPU makes it possible to provide cloud applications with high-performance data security services.Unfortunately,recent studies have shown that GPU-based applications are also susceptible to side-channel attacks.These published work studied the side-channel vulnerabilities of GPU-based AES implementations by taking the advantage of the cache sharing among multiple threads or high parallelism of GPUs.Therefore,for GPU-based bitsliced cryptographic implementations,which are immune to the cache-based attacks referred to above,only a power analysis method based on the high-parallelism of GPUs may be effective.However,the leakage model used in the power analysis is not efficient at all in practice.In light of this,we investigate electro-magnetic(EM)side-channel vulnerabilities of a GPU-based bitsliced AES implementation from the perspective of bit-level parallelism and thread-level parallelism in order to make the best of the localization effect of EM leakage with parallelism.Specifically,we propose efficient multi-bit and multi-thread combinational analysis techniques based on the intrinsic properties of bitsliced ciphers and the effect of multi-thread parallelism of GPUs,respectively.The experimental result shows that the proposed combinational analysis methods perform better than non-combinational and intuitive ones.Our research suggests that multi-thread leakages can be used to improve attacks if the multi-thread leakages are not synchronous in the time domain.
基金Project supported by the National Key Technology R&D Program of China(No.2016YFB0200401)
文摘ttigh-performance computing (HPC) is essential for both traditional and emerging scientific fields, enabling scientific activities to make progress. With the development of high-performance computing, it is foreseeable that exascale computing will be put into practice around 2020. As Moore's law approaches its limit, high-perfornlance computing will face severe challenges when moving from exaseale to zettascale, making tile next 10 years after 2020 a vital period to develop key HPC techniques. In this study, we discuss the challenges of enabling zettascale computing with respect to both hardware and software. We then present a perspective of fllture HPC technology evolution and revolution, leading to our main recommendations in support of zettaseale computing in the coming future.
基金partially supported by the National Natural Science Foundation of China(Grant No.62102389).
文摘The performance of high-performance computing(HPC)and other real-world applications is becoming unpredictable as the micro-architecture of the modern central processing unit(CPU)turns to be more and more complex.As a consequence,predicting the execution time of a code snippet is notoriously difficult.Basic block throughput predictor is a crucial feature of the static code analyzer.It offers a ubiquitous method for predicting the execution time of a basic block.In this article,we build a workflow to faithfully run,collect and analyze basic blocks from real-world applications.Several static code analyzers are introduced,compared,and optimized to show which one performs better on accuracy and other metrics on a Kunpeng 920 processor.Through extensive experiments,we achieve state-of-the-art 86.7%accuracy in predicting the throughput of all basic blocks.Moreover,we showcase the potential applications of our optimized static code analyzer in two certain aspects:1.Guiding the application’s optimization through bottleneck analysis and 2.Exploiting the potential bottleneck of a CPU on a certain workload through fast hardware pre-evaluation.