期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
An Approach to Parallelization of SIFT Algorithm on GPUs for Real-Time Applications 被引量:4
1
作者 Raghu Raj Prasanna Kumar Suresh Muknahallipatna John McInroy 《Journal of Computer and Communications》 2016年第17期18-50,共33页
Scale Invariant Feature Transform (SIFT) algorithm is a widely used computer vision algorithm that detects and extracts local feature descriptors from images. SIFT is computationally intensive, making it infeasible fo... Scale Invariant Feature Transform (SIFT) algorithm is a widely used computer vision algorithm that detects and extracts local feature descriptors from images. SIFT is computationally intensive, making it infeasible for single threaded im-plementation to extract local feature descriptors for high-resolution images in real time. In this paper, an approach to parallelization of the SIFT algorithm is demonstrated using NVIDIA’s Graphics Processing Unit (GPU). The parallel-ization design for SIFT on GPUs is divided into two stages, a) Algorithm de-sign-generic design strategies which focuses on data and b) Implementation de-sign-architecture specific design strategies which focuses on optimally using GPU resources for maximum occupancy. Increasing memory latency hiding, eliminating branches and data blocking achieve a significant decrease in aver-age computational time. Furthermore, it is observed via Paraver tools that our approach to parallelization while optimizing for maximum occupancy allows GPU to execute memory bound SIFT algorithm at optimal levels. 展开更多
关键词 Scale Invariant Feature Transform (SIFT) Parallel Computing GPU GPU Occupancy Portable Parallel programming CUDA
在线阅读 下载PDF
Extending OP2 framework to support portable parallel programming of complex applications
2
作者 Zongjing Chen Kangjin Huang +4 位作者 Yonggang Che Chuanfu Xu Jian Zhang Zhe Dai Ming Li 《CCF Transactions on High Performance Computing》 2024年第3期330-342,共13页
Current HPC hardware presents the characteristics of heterogeneity and diversity.Portable parallel programming technologies are attractive for application developers.OP2 is a domain specific programming framework for ... Current HPC hardware presents the characteristics of heterogeneity and diversity.Portable parallel programming technologies are attractive for application developers.OP2 is a domain specific programming framework for unstructured applications.It supports unified programming and automatic code generation for multiple hardware platforms.However,current OP2 implementation is faced with some difficulties in programming application with complex data structures and function calls.To address this issue,we improve the implementation of OP2 framework in this paper.We modified the source-to-source translator and the runtime library of OP2,making it possible to automatically support applications with complex data structures and function calls during the generation of serial,OpenMP,CUDA,and MPI versions of codes.This avoids tedious manual code rewriting process for the OP2 application developers.HOUR2D,a high order and complex unstructured CFD application,is used as an example to verify the applicability of our extension to the OP2 framework.The results show that our extension enables OP2 to support portable programming for complex unstructured applications without changing its programming mode,ensures the correctness of the results,and achieves comparable or even better performance than manual parallelizations on Intel Xeon Gold CPU,HUAWEI Kunpeng CPU and NVIDIA V100 GPU. 展开更多
关键词 Unstructured mesh applications Portable parallel programming OP2 Complex applications APPLICABILITY Performance
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部