期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
High Performance General-Purpose Microprocessors: Past and Future 被引量:5
1
作者 胡伟武 侯锐 +1 位作者 肖俊华 章隆宾 《Journal of Computer Science & Technology》 SCIE EI CSCD 2006年第5期631-640,共10页
It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to sim... It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to simple, and new innovative architecture will emerge to utilize the continuously increasing transistor budgets. The growing importance of wire delays, changing workloads, power consumption, and design/verification complexity will drive the forthcoming era of Chip Multiprocessors (CMPs). Furthermore, typical CMP projects both from industries and from academics are investigated. Through going into depths for some primary theoretical and implementation problems of CMPs, the great challenges and opportunities to future CMPs are presented and discussed. Finally, the Godson series microprocessors designed in China are introduced. 展开更多
关键词 high performance general-purpose microprocessor instruction level parallelism data level parallelism thread level parallelism chip multiprocessors Godson processor
原文传递
An FFT Performance Model for Optimizing General-Purpose Processor Architecture
2
作者 李玲 陈云霁 +2 位作者 刘道福 钱诚 胡伟武 《Journal of Computer Science & Technology》 SCIE EI CSCD 2011年第5期875-889,共15页
General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and m... General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well. 展开更多
关键词 fast Fourier transform (FFT) general-purpose processor (GPP) performance prediction model vector unit DMA
原文传递
Real-time flow-based video abstraction using OpenCL
3
作者 Yong-jin PARK Jin-woo KIM +1 位作者 Jin-hong PARK Tack-don HAN 《Journal of Measurement Science and Instrumentation》 CAS 2012年第1期46-50,共5页
A non-photorealistic rendering technique is a method to show various effects different from those of realistic image generation.Of the various techniques,flow-based image abstraction displays the shape and color featu... A non-photorealistic rendering technique is a method to show various effects different from those of realistic image generation.Of the various techniques,flow-based image abstraction displays the shape and color features well and performs a stylistic visual abstraction.But real-time rendering is impossible when CPU is used because it applies various filtering and iteration methods.In this paper,we present real-time processing methods of video abstraction using open open computing language(OpenCL),technique of general-purpose computing on graphics processing units(GPGPU).Through the acceleration of general-purpose computing(GPU),16 frame-per-second(FPS)or greater is shown to process video abstraction. 展开更多
关键词 non-photorealistic rendering video abstraction general-purpose computing on graphics processing units(GPGPU) open computing language(OpenCL)
在线阅读 下载PDF
Toward Artificial General Intelligence: Deep Reinforcement Learning Method to AI in Medicine
4
作者 Daniel Schilling Weiss Nguyen Richard Odigie 《Journal of Computer and Communications》 2023年第9期84-120,共37页
Artificial general intelligence (AGI) is the ability of an artificial intelligence (AI) agent to solve somewhat-arbitrary tasks in somewhat-arbitrary environments. Despite being a long-standing goal in the field of AI... Artificial general intelligence (AGI) is the ability of an artificial intelligence (AI) agent to solve somewhat-arbitrary tasks in somewhat-arbitrary environments. Despite being a long-standing goal in the field of AI, achieving AGI remains elusive. In this study, we empirically assessed the generalizability of AI agents by applying a deep reinforcement learning (DRL) approach to the medical domain. Our investigation involved examining how modifying the agent’s structure, task, and environment impacts its generality. Sample: An NIH chest X-ray dataset with 112,120 images and 15 medical conditions. We evaluated the agent’s performance on binary and multiclass classification tasks through a baseline model, a convolutional neural network model, a deep Q network model, and a proximal policy optimization model. Results: Our results suggest that DRL agents with the algorithmic flexibility to autonomously vary their macro/microstructures can generalize better across given tasks and environments. 展开更多
关键词 Artificial Intelligence Deep Learning general-purpose Learning Agent GENERALIZABILITY Algorithmic Flexibility Internal Autonomy
在线阅读 下载PDF
CUSMART:effective parallelization of stringmatching algorithms using GPGPU accelerators
5
作者 Adnan OZSOY Mengu NAZLI +1 位作者 Onur CANKUR Cagri SAHIN 《Frontiers of Information Technology & Electronic Engineering》 2025年第6期877-895,共19页
This study presents a parallel version of the string matching algorithms research tool(SMART)library,implemented on NVIDIA’s compute unified device architecture(CUDA)platform,and uses general-purpose computing on gra... This study presents a parallel version of the string matching algorithms research tool(SMART)library,implemented on NVIDIA’s compute unified device architecture(CUDA)platform,and uses general-purpose computing on graphics processing unit(GPGPU)programming concepts to enhance performance and gain insight into the parallel versions of these algorithms.We have developed the CUDA-enhanced SMART(CUSMART)library,which incorporates parallelized iterations of 64 string matching algorithms,leveraging the CUDA application programming interface.The performance of these algorithms has been assessed across various scenarios to ensure a comprehensive and impartial comparison,allowing for the identification of their strengths and weaknesses in specific application contexts.We have explored and established optimization techniques to gauge their influence on the performance of these algorithms.The results of this study highlight the potential of GPGPU computing in string matching applications through the scalability of algorithms,suggesting significant performance improvements.Furthermore,we have identified the best and worst performing algorithms in various scenarios. 展开更多
关键词 String matching Parallel programming Graphics processing unit(GPU)programming general-purpose computing on GPU(GPGPU) NVIDIA Compute unified device architecture(CUDA) String matching algorithms research tool(SMART)
原文传递
Implementing a 1GHz Four-Issue Out-of-Order Execution Microprocessor in a Standard Cell ASIC Methodology 被引量:14
6
作者 胡伟武 赵继业 +3 位作者 钟石强 杨旭 Elio Guidetti 吴永强 《Journal of Computer Science & Technology》 SCIE EI CSCD 2007年第1期1-14,共14页
This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the a... This paper introduces the microarchitecture and physical implementation of the Godson-2E processor, which is a four-issue superscalar RISC processor that supports the 64-bit MIPS instruction set. The adoption of the aggressive out-of-order execution and memory hierarchy techniques help Godson-2E to achieve high performance. The Godson-2E processor has been physically designed in a 7-metal 90nm CMOS process using the cell-based methodology with some bitsliced manual placement and a number of crafted cells and macros. The processor can be run at 1GHz and achieves a SPEC CPU2000 rate higher than 500. 展开更多
关键词 general-purpose processor superscalar pipeline out-of-order execution non-blocking cache physical design synthesis flow bit-sliced placement crafted cell performance evaluation
原文传递
Fast OBJ file importing and parsing in CUDA 被引量:2
7
作者 Aidan L.Possemiers Ickjai Lee 《Computational Visual Media》 2015年第3期229-238,共10页
Alias – Wavefront OBJ meshes are a common text file type for transferring 3D mesh data between applications made by different vendors.However, as the mesh complexity gets higher and denser, the files become larger an... Alias – Wavefront OBJ meshes are a common text file type for transferring 3D mesh data between applications made by different vendors.However, as the mesh complexity gets higher and denser, the files become larger and slower to import.This paper explores the use of GPUs to accelerate the importing and parsing of OBJ files by studying file read-time, runtime, and load resistance. We propose a new method of reading and parsing that circumvents GPU architecture limitations and improves performance, seeing the new GPU method outperforms CPU methods with a 6×– 8× speedup. When running on a heavily loaded system, the new method only received an 80% performance hit, compared to the160% that the CPU methods received. The loaded GPU speedup compared to unloaded CPU methods was3.5×, and, when compared to loaded CPU methods,8×. These results demonstrate that the time is right for further research into the use of data-parallel GPU acceleration beyond that of computer graphics and high performance computing. 展开更多
关键词 PARSING OBJ vertex buffer object(VBO) general-purpose programming on the graphics processing unit(GPGPU) compute unified device architecture(CUDA)
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部