期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Single-particle 3D reconstruction on specialized stream architecture and comparison with GPGPUs
1
作者 段勃 Wang Wendi +1 位作者 Tan Guangming Meng Dan 《High Technology Letters》 EI CAS 2014年第4期333-345,共13页
The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the fi... The wide acceptance and data deluge in medical imaging processing require faster and more efficient systems to be built.Due to the advances in heterogeneous architectures recently,there has been a resurgence in the first research aimed at FPGA-based as well as GPGPU-based accelerator design.This paper quantitatively analyzes the workload,computational intensity and memory performance of a single-particle 3D reconstruction application,called EMAN,and parallelizes it on CUDA GPGPU architectures and decouples the memory operations from the computing flow and orchestrates the thread-data mapping to reduce the overhead of off-chip memory operations.Then it exploits the trend towards FPGA-based accelerator design,which is achieved by offloading computingintensive kernels to dedicated hardware modules.Furthermore,a customized memory subsystem is also designed to facilitate the decoupling and optimization of computing dominated data access patterns.This paper evaluates the proposed accelerator design strategies by comparing it with a parallelized program on a 4-cores CPU.The CUDA version on a GTX480 shows a speedup of about 6 times.The performance of the stream architecture implemented on a Xilinx Virtex LX330 FPGA is justified by the reported speedup of 2.54 times.Meanwhile,measured in terms of power efficiency,the FPGA-based accelerator outperforms a 4-cores CPU and a GTX480 by 7.3 times and 3.4 times,respectively. 展开更多
关键词 stream architecture general purpose graphic processing unit GPGPU) field programmable gate array (FPGA) CRYO-EM
在线阅读 下载PDF
Multiple-Morphs Adaptive Stream Architecture 被引量:3
2
作者 Mei Wen Nan Wu Hai-Yan Li Chun-Yuan Zhang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第5期635-646,共12页
In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm^2 chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the... In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm^2 chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the applications suited for typical stream architecture are limited. This paper presents the definition of regular stream and irregular stream, and then describes MASA (Multiple-morphs Adaptive Stream Architecture) prototype system which supports different execution models according to applications' stream characteristics. This paper first discusses MASA architecture and stream model, and then explores the features and advantages of MASA through mapping stream applications to hardware. Finally MASA is evaluated by ten benchmarks. The result is encouraging. 展开更多
关键词 stream architecture stream application stream execution model
原文传递
PODALA:Power-Efficient Object Detection Accelerator With Customized Layer Fusion Engine
3
作者 TING YUE LIANG CHANG +3 位作者 HAOBO XU CHENGZHI WANG SHUISHENG LIN JUN ZHOU 《Integrated Circuits and Systems》 2024年第4期196-205,共10页
The object detection algorithm based on convolutional neural networks(CNNs)significantly enhances accuracy by expanding network scale.As network parameters increase,large-scale networks demand substantial memory resou... The object detection algorithm based on convolutional neural networks(CNNs)significantly enhances accuracy by expanding network scale.As network parameters increase,large-scale networks demand substantial memory resources,making deployment on hardware challenging.Although most neural network accelerators utilize off-chip storage,frequent access to external memory restricts processing speed,hindering the ability to meet the frame rate requirements for embedded systems.This creates a trade-off in which the speed and accuracy of embedded target detection accelerators cannot be simultaneously optimized.In this paper,we propose PODALA,an energy-efficient accelerator developed through the algorithm-hardware co-design methodology.For object detection algorithm,we develop an optimized algorithm combined with the inverse-residual structure and depthwise separable convolution,effectively reducing network parameters while preserving high detection accuracy.For hardware accelerator,we develop a custom layer fusion technique for PODALA to minimize memory access requirements.The overall design employs a streaming hardware architecture that combines a computing array with a refined ping-pong output buffer to execute different layer fusion computing modes efficiently.Our approach substantially reduces memory usage through optimizations in both algorithmic and hardware design.Evaluated on the Xilinx ZCU102 FPGA platform,PODALA achieves 78 frames per second(FPS)and 79.73 GOPS/W energy efficiency,underscoring its superiority over state-of-the-art solutions. 展开更多
关键词 Customized layer fusion lightweight network object detection streaming architecture
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部