期刊文献+

基于OpenCL的图像模糊化算法优化研究 被引量:6

Research on Image Blur Algorithm Optimization Using OpenCL
在线阅读 下载PDF
导出
摘要 现代GPU一般都提供特定硬件(如纹理部件、光栅化部件及各种片上缓存)以加速二维图像的处理和显示过程,相应的编程模型(CUDA、OpenCL)都定义了特定程序设计接口(CUDA的纹理内存,OpenCL的图像对象)以便图像应用能利用相关硬件支持。以典型图像模糊化处理算法在AMD平台GPU的优化为例,探讨了OpenCL的图像对象在图像算法优化上的适用范围,尤其是分析了其相对于更通用的基于全局内存加片上局部存储进行性能优化的方法的优劣。实验结果表明,图像对象只有在图像为四通道且计算过程中需要缓存的数据量较小时才能带来较好的性能改善,其余情况采用全局内存加局部存储都能获得较好性能。优化后的算法性能相对于精心实现的CPU版加速比为200~1000;相对于NVIDIA NPP库相应函数的性能加速比为1.3~5。 Modem GPUs generally provide specific hardware(such as texture,grating components and various on-chip cache) to accelerate the 2D image processing and displaying process. Programming model defines specific APIs to facilitate image applications taking advantage of image-related GPU hardware, such as CUDA's texture memory and OpenCL's Images Object. Taking the optimization of image blur algorithm on AMD GPU as an example, the paper made a deep insight into the using of OpenCL's image object on image applications, especially its advantage and disadvantage compared to the more general optimization method based on global memory and the on-chip local memory. The experimental results demonstrate that the image object can provide better performance only when the processing image is four-channel and the amount of data to be cached is small. For other cases, optimizing with global memory and local memory can get better performance. After optimization, the speedup reaches 200x to 1000x in comparison with the well optimized CPU code,and the speedup over NVIDIA NPP version is upto 1.3x to 5x.
出处 《计算机科学》 CSCD 北大核心 2012年第3期260-264,共5页 Computer Science
基金 国家自然科学基金(60303020) 国家自然科学基金重点项目(60533020) 国家高科技技术研究发展计划(2006AA01A102)资助
关键词 AMD GPU BLUR OPENCL 图像对象 AMD GPU, Blur, OpenCL, Images object
  • 相关文献

参考文献13

  • 1NVIDIA Corporation. NVIDIA CUDA C Programming Guide version 4. 0[R]. 2011.
  • 2KHRONOS group. OpenCL-The open standard for parallel programming of heterogeneous systems[OL]. http://www, khronos. org/opencl/.
  • 3Udeepta D. Bordoloi. Optimization Techniques: Image Convolution [R]. 2011.
  • 4Andrade D. Case study: High performance convolution using OpenCL _local memory lOLl. http://www. cmsoft, com. br/ index, php? option = com_ content&view = category&layout = blog&id= 142 &Itemid= 201.
  • 5Kong Jing-fei, Dimitrov M, Yang Yi, et al. Accelerating MATLAB Image Processing Toolbox Functions on GPUs [M]. Pittsburg, 2010:75-85.
  • 6OpenCV:Open Source Computer Vision library [OL]. http:// opencv, willowgarage, eom/wiki/.
  • 7NVIDIA Corporation. NVIDIA OpenCL Best Practices Guide version 3. 2[R]. 2010.
  • 8AMD Corporation. AMD Fusion^TM Family of APUs.. Enabling a Superior, Immersive PC Experience [R]. 2010.
  • 9Khronos OpenCL Working Group. The OpenCL Specification Version 1.1 [R]. 2010.
  • 10AMD Corporation. AMD Accelerated Parallel Processing Open- CL^TM[R]. 2011.

同被引文献38

  • 1吴恩华,柳有权.基于图形处理器(GPU)的通用计算[J].计算机辅助设计与图形学学报,2004,16(5):601-612. 被引量:228
  • 2吴恩华.图形处理器用于通用计算的技术、现状及其挑战[J].软件学报,2004,15(10):1493-1504. 被引量:141
  • 3Fukunaga K,Hostetler L D.The estimation of the gradient of a density function,with applications in pattern recognition[J].IEEE Transactions on Information Theory,1975,21 (1):32-40.
  • 4Cheng Yi-zong.Mean Shift,Mode Seeking,and Clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1995,17(8):790-799.
  • 5Comaniciu D,Meer P.Mean shift:a robust approach toward feature space analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24 (5):603-619.
  • 6Comanieiu D,Meer P.Real-time tracking of non-rigid objects using mean shift[C] //IEEE Conference on Computer Vision and Pattern Recognition,Vol.2,2000:142-149.
  • 7http://zh.wikipedia.org/wiki/CUDA.
  • 8KHRONOS OpenCL Working Group.The OpenCL Specification[M].v1.1,2010.
  • 9NVIDIA Corporation.NVIDIA OpenCL Best Practices Guide[M].v1.0,2009.
  • 10NVIDIA Corporation.NVIDIA CUDA计算统一设备架构编程指南[M].v2.o,2008.

引证文献6

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部