期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Parallelized Hashing via <i>j</i>-Lanes and <i>j</i>-Pointers Tree Modes, with Applications to SHA-256
1
作者 Shay Gueron 《Journal of Information Security》 2014年第3期91-113,共23页
j-lanes tree hashing is a tree mode that splits an input message into?j?slices, computes?j?independent digests of each slice, and outputs the hash value of their concatenation.?j-pointers tree hashing is a similar tre... j-lanes tree hashing is a tree mode that splits an input message into?j?slices, computes?j?independent digests of each slice, and outputs the hash value of their concatenation.?j-pointers tree hashing is a similar tree mode that receives, as input,?j?pointers to?j?messages (or slices of a single message), computes their digests and outputs the hash value of their concatenation. Such modes expose parallelization opportunities in a hashing process that is otherwise serial by nature. As a result, they have a performance advantage on modern processor architectures. This paper provides precise specifications for these hashing modes, proposes appropriate IVs, and demonstrates their performance on the latest processors. Our hope is that it would be useful for standardization of these modes. 展开更多
关键词 TREE mode HASHING SHA-256 simd Architecture Advanced Vector Extensions Architectures AVX AVX2
暂未订购
A <i>j</i>-Lanes Tree Hashing Mode and <i>j</i>-Lanes SHA-256
2
作者 Shay Gueron 《Journal of Information Security》 2013年第1期7-11,共5页
j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-l... j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation IntelR CoreTM. For messages whose lengths range from 2 KB to 132 KB, we show that the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly available implementation that we are aware of, and between ~2 to ~2.5 times faster than the OpenSSL 1.0.1c implementation. For long messages, there is no significant performance difference between different choices of j. We show that the 4-lanes SHA-256 is faster than the two SHA3 finalists (BLAKE and Keccak) that have a published tree mode implementation. Finally, we explain why j-lanes hashing will be faster on the coming AVX2 architecture that facilitates using 256 bits registers. These results suggest that standardizing a tree mode for hash functions (SHA-256 in particular) could be useful for performance hungry applications. 展开更多
关键词 TREE mode HASHING SHA-256 SHA3 Competition simd Architecture Advanced Vector Extensions Architectures AVX AVX2
暂未订购
Flex-DMA:支持多模式高效传输的DMA系统设计 被引量:1
3
作者 李德建 冯曦 +4 位作者 王国旋 谭浪 沈冲飞 范志华 李文明 《微电子学与计算机》 2024年第6期103-114,共12页
随着数据密集型科学和高通量应用的迅速发展,专用集成电路设计不断涌现,传输系统不再只有数据传输的需求。现有的一些直接存储器访问(Data Memory Access,DMA)设计可以支持高效的矩阵转置传输,但这些设计不能满足复杂的访存模式,也不具... 随着数据密集型科学和高通量应用的迅速发展,专用集成电路设计不断涌现,传输系统不再只有数据传输的需求。现有的一些直接存储器访问(Data Memory Access,DMA)设计可以支持高效的矩阵转置传输,但这些设计不能满足复杂的访存模式,也不具有灵活的可配置性,从而降低计算效率。针对这些问题设计了一种可配置的多模式传输系统Flex-DMA,该系统包含可配置的寄存器以及传输通道,拥有基础模式和单指令多数据(Single Instruction Multiple Data,SIMD)模式。因此,Flex-DMA可根据不同的数据传输需求选择不同的传输模式,灵活配置数据规模和数据格式,支持数据向量化转换、矩阵转置传输等功能。在大规模并行模拟框架中对Flex-DMA做性能评估,其结果表明,Flex-DMA在数据向量化处理中可以获得平均5.14倍的加速比。此外,与MT-DMA结构相比,Flex-DMA在矩阵转置传输中可以获得平均2.52倍性能提升。实验证明:Flex-DMA能满足复杂的访存模式和传输需求,在低传输时延下实现数据的重组和预处理。 展开更多
关键词 直接存储器访问 simd模式 数据传输 矩阵转置 预处理
在线阅读 下载PDF
基于Intel Xeon Phi的激光等离子体粒子模拟研究 被引量:1
4
作者 姚文科 杜云飞 +1 位作者 吴强 杨灿群 《计算机工程与科学》 CSCD 北大核心 2014年第5期809-813,共5页
激光等离子体粒子模拟广泛用于探索极端物质状态下的科学问题。将一种基于粒子云网格方法的三维等离子体粒子模拟程序LARED-P移植到Intel Xeon Phi协处理器上。在移植的过程中,综合运用了Native和Offload两种编程模式:首先运用Native模... 激光等离子体粒子模拟广泛用于探索极端物质状态下的科学问题。将一种基于粒子云网格方法的三维等离子体粒子模拟程序LARED-P移植到Intel Xeon Phi协处理器上。在移植的过程中,综合运用了Native和Offload两种编程模式:首先运用Native模式对LARED-P程序中热点计算任务进行优化研究,通过采用SIMD扩展指令使该计算任务获得了4.61倍的加速;然后运用Offload模式将程序移植到CPU-Intel Xeon Phi异构系统上,并通过使用异步数据传输和双缓冲技术分别提升了程序性能9.8%和21.8%。 展开更多
关键词 LARED-P INTEL XEON PHI Native模式 Offload模式 512位simd扩展指令 异步数据传输 双缓冲
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部