期刊文献+
共找到962篇文章
< 1 2 49 >
每页显示 20 50 100
An incompressible flow solver on a GPU/CPU heterogeneous architecture parallel computing platform 被引量:1
1
作者 Qianqian Li Rong Li Zixuan Yang 《Theoretical & Applied Mechanics Letters》 CSCD 2023年第5期387-393,共7页
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,t... A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid points.To solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a preconditioner.The developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than 2083.In the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU cores.The developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex flow.The results are consistent with previous results in the literature. 展开更多
关键词 gpu Acceleration Parallel computing Poisson equation PRECONDITIONER
在线阅读 下载PDF
Study of a GPU-based parallel computing method for the Monte Carlo program 被引量:2
2
作者 罗志飞 邱睿 +3 位作者 李明 武祯 曾志 李君利 《Nuclear Science and Techniques》 SCIE CAS CSCD 2014年第A01期27-30,共4页
关键词 并行计算方法 蒙特卡罗程序 gpu GEANT4 模拟程序 蒙特卡洛方法 并行处理能力 图形处理单元
在线阅读 下载PDF
Regularized focusing inversion for large-scale gravity data based on GPU parallel computing
3
作者 WANG Haoran DING Yidan +1 位作者 LI Feida LI Jing 《Global Geology》 2019年第3期179-187,共9页
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes... Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape. 展开更多
关键词 LARGE-SCALE gravity data gpu parallel computing CUDA equivalent geometric TRELLIS FOCUSING INVERSION
在线阅读 下载PDF
A Hybrid Parallel Strategy for Isogeometric Topology Optimization via CPU/GPU Heterogeneous Computing
4
作者 Zhaohui Xia Baichuan Gao +3 位作者 Chen Yu Haotian Han Haobo Zhang Shuting Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第2期1103-1137,共35页
This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstr... This paper aims to solve large-scale and complex isogeometric topology optimization problems that consumesignificant computational resources. A novel isogeometric topology optimization method with a hybrid parallelstrategy of CPU/GPU is proposed, while the hybrid parallel strategies for stiffness matrix assembly, equationsolving, sensitivity analysis, and design variable update are discussed in detail. To ensure the high efficiency ofCPU/GPU computing, a workload balancing strategy is presented for optimally distributing the workload betweenCPU and GPU. To illustrate the advantages of the proposedmethod, three benchmark examples are tested to verifythe hybrid parallel strategy in this paper. The results show that the efficiency of the hybrid method is faster thanserial CPU and parallel GPU, while the speedups can be up to two orders of magnitude. 展开更多
关键词 Topology optimization high-efficiency isogeometric analysis CPU/gpu parallel computing hybrid OpenMPCUDA
在线阅读 下载PDF
A Rayleigh Wave Globally Optimal Full Waveform Inversion Framework Based on GPU Parallel Computing
5
作者 Zhao Le Wei Zhang +3 位作者 Xin Rong Yiming Wang Wentao Jin Zhengxuan Cao 《Journal of Geoscience and Environment Protection》 2023年第3期327-338,共12页
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi... Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on GPU parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. . 展开更多
关键词 Full Waveform Inversion Finite-Difference Method Globally Optimal Framework gpu Parallel computing Particle Swarm Optimization
在线阅读 下载PDF
Large-Eddy Simulation of Airflow over a Steep, Three-Dimensional Isolated Hill with Multi-GPUs Computing
6
作者 Takanori Uchida 《Open Journal of Fluid Dynamics》 2018年第4期416-434,共19页
The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence si... The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers. 展开更多
关键词 LES ISOLATED HILL Multi-Cores Multi-CPUs computing Multi-gpus computing
暂未订购
异构CPU-GPU系统机密计算综述
7
作者 郝萌 李佳勇 +1 位作者 杨洪伟 张伟哲 《信息网络安全》 北大核心 2025年第11期1658-1672,共15页
随着人工智能等数据密集型应用的普及,以CPU与GPU为核心的异构计算系统已成为关键基础设施。然而,在云和边缘等非可信环境中,敏感数据在处理阶段面临着严峻的安全威胁,传统加密方法对此无能为力。机密计算利用硬件可信执行环境(TEE)为... 随着人工智能等数据密集型应用的普及,以CPU与GPU为核心的异构计算系统已成为关键基础设施。然而,在云和边缘等非可信环境中,敏感数据在处理阶段面临着严峻的安全威胁,传统加密方法对此无能为力。机密计算利用硬件可信执行环境(TEE)为保护使用中的数据提供了有效方案,但现有技术主要集中在CPU端。将TEE安全边界无缝扩展至计算引擎核心GPU,已成为当前学术界与工业界关注的焦点。文章对CPU-GPU异构系统中的机密计算技术进行系统性综述。首先,文章回顾了机密计算的基本概念并剖析了针对GPU的典型攻击向量。然后,对现有GPU机密计算方案进行分类,涵盖硬件辅助、软硬件协同及纯软件实现等技术范式。最后,文章总结了该领域面临的关键挑战,并展望了未来研究方向。 展开更多
关键词 机密计算 可信执行环境 异构计算 gpu
在线阅读 下载PDF
基于GPU并行计算的拓扑优化全流程加速设计方法
8
作者 张长东 吴奕凡 +3 位作者 周铉华 李旭东 肖息 张自来 《航空制造技术》 北大核心 2025年第12期34-41,67,共9页
随着大尺寸航空航天装备的发展需求,高效高精度的大规模拓扑优化设计成为该领域关注的焦点。针对现有大规模拓扑优化设计存在的计算量巨大、计算效率低下等问题,基于GPU并行计算开展了拓扑优化全流程加速设计方法的研究。对网格划分、... 随着大尺寸航空航天装备的发展需求,高效高精度的大规模拓扑优化设计成为该领域关注的焦点。针对现有大规模拓扑优化设计存在的计算量巨大、计算效率低下等问题,基于GPU并行计算开展了拓扑优化全流程加速设计方法的研究。对网格划分、刚度矩阵计算与组装、有限元求解等过程进行了并行加速,实现了高效高精度的体素网格划分及有限元过程的高效求解。此外,该方法针对拓扑优化设计过程的加速需求,对灵敏度过滤过程进行了并行加速处理。以300万体素单元的姿态推力器模型为设计对象,发现相比于Abaqus 2022软件的拓扑优化并行加速计算,本文所提方法的加速比提高了1259%,且两种方法的相似度极高,验证了所提方法的有效性与实用性。 展开更多
关键词 拓扑优化 并行计算 gpu加速 符号距离场 稀疏矩阵 网格划分
在线阅读 下载PDF
基于CPU-GPU的超音速流场N-S方程数值模拟
9
作者 卢志伟 张皓茹 +3 位作者 刘锡尧 王亚东 张卓凯 张君安 《中国机械工程》 北大核心 2025年第9期1942-1950,共9页
为深入分析超音速流场的特性并提高数值计算效率,设计了一种高效的加速算法。该算法充分利用中央处理器-图形处理器(CPU-GPU)异构并行模式,通过异步流方式实现数据传输及处理,显著加速了超音速流场数值模拟的计算过程。结果表明:GPU并... 为深入分析超音速流场的特性并提高数值计算效率,设计了一种高效的加速算法。该算法充分利用中央处理器-图形处理器(CPU-GPU)异构并行模式,通过异步流方式实现数据传输及处理,显著加速了超音速流场数值模拟的计算过程。结果表明:GPU并行计算速度明显高于CPU串行计算速度,其加速比随流场网格规模的增大而明显提高。GPU并行计算可以有效提高超音速流场的计算速度,为超音速飞行器的设计、优化、性能评估及其研发提供一种强有力的并行计算方法。 展开更多
关键词 超音速流场 中央处理器-图形处理器 异构计算 有限差分
在线阅读 下载PDF
Programming for scientific computing on peta-scale heterogeneous parallel systems 被引量:1
10
作者 杨灿群 吴强 +2 位作者 唐滔 王锋 薛京灵 《Journal of Central South University》 SCIE EI CAS 2013年第5期1189-1203,共15页
Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to co... Peta-scale high-perfomlance computing systems are increasingly built with heterogeneous CPU and GPU nodes to achieve higher power efficiency and computation throughput. While providing unprecedented capabilities to conduct computational experiments of historic significance, these systems are presently difficult to program. The users, who are domain experts rather than computer experts, prefer to use programming models closer to their domains (e.g., physics and biology) rather than MPI and OpenME This has led the development of domain-specific programming that provides domain-specific programming interfaces but abstracts away some performance-critical architecture details. Based on experience in designing large-scale computing systems, a hybrid programming framework for scientific computing on heterogeneous architectures is proposed in this work. Its design philosophy is to provide a collaborative mechanism for domain experts and computer experts so that both domain-specific knowledge and performance-critical architecture details can be adequately exploited. Two real-world scientific applications have been evaluated on TH-IA, a peta-scale CPU-GPU heterogeneous system that is currently the 5th fastest supercomputer in the world. The experimental results show that the proposed framework is well suited for developing large-scale scientific computing applications on peta-scale heterogeneous CPU/GPU systems. 展开更多
关键词 heterogeneous parallel system programming framework scientific computing gpu computing molecular dynamic
在线阅读 下载PDF
Managing Computing Infrastructure for IoT Data 被引量:1
11
作者 Sapna Tyagi Ashraf Darwish Mohammad Yahiya Khan 《Advances in Internet of Things》 2014年第3期29-35,共7页
Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, de... Digital data have become a torrent engulfing every area of business, science and engineering disciplines, gushing into every economy, every organization and every user of digital technology. In the age of big data, deriving values and insights from big data using rich analytics becomes important for achieving competitiveness, success and leadership in every field. The Internet of Things (IoT) is causing the number and types of products to emit data at an unprecedented rate. Heterogeneity, scale, timeliness, complexity, and privacy problems with large data impede progress at all phases of the pipeline that can create value from data issues. With the push of such massive data, we are entering a new era of computing driven by novel and ground breaking research innovation on elastic parallelism, partitioning and scalability. Designing a scalable system for analysing, processing and mining huge real world datasets has become one of the challenging problems facing both systems researchers and data management researchers. In this paper, we will give an overview of computing infrastructure for IoT data processing, focusing on architectural and major challenges of massive data. We will briefly discuss about emerging computing infrastructure and technologies that are promising for improving massive data management. 展开更多
关键词 BIG DATA Cloud computing DATA ANALYTICS Elastic SCALABILITY Heterogeneous computing gpu PCM Massive DATA Processing
在线阅读 下载PDF
EG-STC: An Efficient Secure Two-Party Computation Scheme Based on Embedded GPU for Artificial Intelligence Systems
12
作者 Zhenjiang Dong Xin Ge +2 位作者 Yuehua Huang Jiankuo Dong Jiang Xu 《Computers, Materials & Continua》 SCIE EI 2024年第6期4021-4044,共24页
This paper presents a comprehensive exploration into the integration of Internet of Things(IoT),big data analysis,cloud computing,and Artificial Intelligence(AI),which has led to an unprecedented era of connectivity.W... This paper presents a comprehensive exploration into the integration of Internet of Things(IoT),big data analysis,cloud computing,and Artificial Intelligence(AI),which has led to an unprecedented era of connectivity.We delve into the emerging trend of machine learning on embedded devices,enabling tasks in resource-limited environ-ments.However,the widespread adoption of machine learning raises significant privacy concerns,necessitating the development of privacy-preserving techniques.One such technique,secure multi-party computation(MPC),allows collaborative computations without exposing private inputs.Despite its potential,complex protocols and communication interactions hinder performance,especially on resource-constrained devices.Efforts to enhance efficiency have been made,but scalability remains a challenge.Given the success of GPUs in deep learning,lever-aging embedded GPUs,such as those offered by NVIDIA,emerges as a promising solution.Therefore,we propose an Embedded GPU-based Secure Two-party Computation(EG-STC)framework for Artificial Intelligence(AI)systems.To the best of our knowledge,this work represents the first endeavor to fully implement machine learning model training based on secure two-party computing on the Embedded GPU platform.Our experimental results demonstrate the effectiveness of EG-STC.On an embedded GPU with a power draw of 5 W,our implementation achieved a secure two-party matrix multiplication throughput of 5881.5 kilo-operations per millisecond(kops/ms),with an energy efficiency ratio of 1176.3 kops/ms/W.Furthermore,leveraging our EG-STC framework,we achieved an overall time acceleration ratio of 5–6 times compared to solutions running on server-grade CPUs.Our solution also exhibited a reduced runtime,requiring only 60%to 70%of the runtime of previously best-known methods on the same platform.In summary,our research contributes to the advancement of secure and efficient machine learning implementations on resource-constrained embedded devices,paving the way for broader adoption of AI technologies in various applications. 展开更多
关键词 Secure two-party computation embedded gpu acceleration privacy-preserving machine learning edge computing
在线阅读 下载PDF
基于ROACH2-GPU的集群相关器研究——Hashpipe软件在X-engine模块中的应用
13
作者 张科 王钊 +6 位作者 李吉夏 吴锋泉 田海俊 牛晨辉 张巨勇 陈志平 陈学雷 《贵州师范大学学报(自然科学版)》 北大核心 2025年第2期114-121,共8页
随着国际上越来越多干涉阵列设备的建造与运行,为人类探测未知宇宙的奥秘提供了丰富的观测数据,然而随之带来高速和密集型数据实时处理的巨大困难,对传统的数据处理技术提出了严峻的挑战。基于我国已建造的天籁计划一期项目在数据实时... 随着国际上越来越多干涉阵列设备的建造与运行,为人类探测未知宇宙的奥秘提供了丰富的观测数据,然而随之带来高速和密集型数据实时处理的巨大困难,对传统的数据处理技术提出了严峻的挑战。基于我国已建造的天籁计划一期项目在数据实时关联计算的需求,利用GPU在高性能并行计算上的优势,为天籁柱形探路者阵列设计并实现一套基于ROACH2-GPU的集群相关器,深入探究Hashpipe(High availibility shared pipeline engine)软件在集群相关器X-engine模块中的应用。首先介绍ROACH2-GPU集群相关器的整体架构,然后研究Hashpipe的核心功能和数据处理方法,实现了完整的分布式异构处理功能,优化了Hashpipe控制和参数接口。根据实际观测需求,可修改程序参数,能实现不同通道数量的相关器配置,降低后端软硬件设计的难度和成本。最后,在完成软件正确性测试的基础上,进行了强射电天文源的观测和处理,能够获得准确的干涉条纹。 展开更多
关键词 ROACH2-gpu Hashpipe 集群相关器 X-engine模块 并行计算
在线阅读 下载PDF
面向GPU平台的通用Stencil自动调优框架
14
作者 孙庆骁 杨海龙 《计算机研究与发展》 北大核心 2025年第10期2622-2634,共13页
Stencil计算在科学应用中得到了广泛采用.许多高性能计算(HPC)平台利用GPU的高计算能力来加速Stencil计算.近年来,Stencil计算在阶数、内存访问和计算模式等方面变得更加复杂.为了使Stencil计算适配GPU架构,学术界提出了各种基于流处理... Stencil计算在科学应用中得到了广泛采用.许多高性能计算(HPC)平台利用GPU的高计算能力来加速Stencil计算.近年来,Stencil计算在阶数、内存访问和计算模式等方面变得更加复杂.为了使Stencil计算适配GPU架构,学术界提出了各种基于流处理和分块的优化技术.由于Stencil计算模式和GPU架构的多样性,没有单一的优化技术适合所有Stencil实例.因此,研究人员提出了Stencil自动调优机制来对给定优化技术组合进行参数搜索.然而,现有机制引入了庞大的离线分析成本和在线预测开销,并且无法灵活地推广到任意Stencil模式.为了解决上述问题,提出了通用Stencil自动调优框架GeST,其在GPU平台上实现Stencil计算的极致性能优化.具体来说,GeST通过零填充格式构建全局搜索空间,利用变异系数量化参数相关性并生成参数组;之后,GeST迭代地从参数组选取参数值,根据奖励策略调整采样比例并通过哈希编码避免冗余执行.实验结果表明,与其他先进的自动调优工作相比,Ge ST能够在短时间内识别出性能更优的参数设置. 展开更多
关键词 Stencil计算 gpu 自动调优 性能优化 参数搜索
在线阅读 下载PDF
基于国产GPU的国产公钥密码SM2高性能并行加速方法
15
作者 吴雯 董建阔 +4 位作者 刘鹏博 董振江 胡昕 张品昌 肖甫 《通信学报》 北大核心 2025年第5期15-28,共14页
为了满足国家信息安全自主可控的战略需求,确保算法的透明性和安全性,提出基于国产GPU的国产公钥密码SM2数字签名算法的高性能并行加速方法。首先,设计适用于域运算的底层函数,优化有限域运算的效率,约减采用2轮进位消解以抵御计时攻击... 为了满足国家信息安全自主可控的战略需求,确保算法的透明性和安全性,提出基于国产GPU的国产公钥密码SM2数字签名算法的高性能并行加速方法。首先,设计适用于域运算的底层函数,优化有限域运算的效率,约减采用2轮进位消解以抵御计时攻击。其次,基于雅可比(Jacobian)坐标实现点加和倍点运算,充分利用寄存器和全局内存的特性,设计离线/在线预计算表以提高点乘计算效率。最后,根据海光深度计算单元(DCU)的特点进行实验设计,实现高性能的SM2签名和验签算法,分别达到6816kops/s的签名吞吐量和1385kops/s的验签吞吐量。研究验证了基于国产GPU的国产公钥密码SM2数字签名算法的可行性和有效性,为国内信息安全自主可控领域提供了重要的技术支持。 展开更多
关键词 国家商用密码 数字签名 图形处理器 异构计算
在线阅读 下载PDF
复杂地质条件的间断有限元地震波数值模拟及GPU加速
16
作者 韩德超 刘卫华 +2 位作者 张春丽 袁媛 白鹏 《石油物探》 北大核心 2025年第4期639-652,共14页
间断Galerkin有限元方法(DGFEM)是一种具有较高模拟精度的有限元方法,但其算法编程难度大,其针对各类复杂介质的波动方程的算法目前未见统一的计算格式。为此,基于三角形非结构化网格以及局部Lax-Friedrichs数值流,构建了针对复杂介质... 间断Galerkin有限元方法(DGFEM)是一种具有较高模拟精度的有限元方法,但其算法编程难度大,其针对各类复杂介质的波动方程的算法目前未见统一的计算格式。为此,基于三角形非结构化网格以及局部Lax-Friedrichs数值流,构建了针对复杂介质波动方程模拟的DGFEM编程计算矩阵,并进一步得出了适用于各类复杂介质模拟的单一波场分量的通用计算格式。该通用计算格式能够有效提升DGFEM算法编程的可拓展性。基于该格式给出了DGFEM的通用CUDA核函数的构建方法,并形成CPU+GPU的二维DGFEM并行计算程序框架。通用CUDA核函数可以将DGFEM算法进一步延伸到更加复杂的介质以及三维情况。理论模型和复杂山地模型的数值实验结果表明,构建的通用计算格式和CUDA核函数可以准确模拟声波、弹性波、粘弹性波、孔隙弹性波方程描述的纵波、横波以及慢纵波等波现象。相比单核CPU模拟,二维DGFEM弹性波GPU计算加速比平均在100倍左右。同时,弹性波、粘弹性波、孔隙弹性波模拟耗时约为声波模拟的1.7,2.3,3.0倍。此结果可以指导复杂介质耦合条件模拟时多进程的负载平衡。 展开更多
关键词 间断Galerkin有限元方法 弹性波 粘弹性波 孔隙弹性波 数值模拟 gpu并行计算.
在线阅读 下载PDF
基于GPU的FFT高性能算法库的实现和优化
17
作者 杜振鹏 徐建良 +1 位作者 张先轶 黄强 《数据与计算发展前沿(中英文)》 2025年第6期124-135,共12页
【目的】本研究旨在优化GPU上FFT算法库的性能,并填补国产GPU高性能FFT算法库的空缺。【方法】主要采取的优化策略如下:一是基于GPU的并行优势,充分利用DFT的数学特性并提出分块处理和层级化计算方案。二是提出了一种去除位元反转的宽... 【目的】本研究旨在优化GPU上FFT算法库的性能,并填补国产GPU高性能FFT算法库的空缺。【方法】主要采取的优化策略如下:一是基于GPU的并行优势,充分利用DFT的数学特性并提出分块处理和层级化计算方案。二是提出了一种去除位元反转的宽度优先与深度优先相结合的新型蝶形网络结构。三是针对多批次数据,采用共享内存和分块处理策略。【结果】与CPU端FFTW库对比,在大规模数据上加速比在2以上。与业内先进的clFFT库相比,在128和256批次的小规模数据上,2的幂次规模的平均加速比为1.47、1.58,非2的幂次规模的平均加速比为3.58、4.07。对于大规模数据,2的幂次规模的平均加速比为2.04、2.38,非2的幂次规模的平均加速比为5.39,5.28。【结论】实验表明,GPU在处理大规模数据时性能显著优于CPU,且PerfFFT在不同规模数据上性能均优于clFFT,验证了优化策略的有效性。 展开更多
关键词 快速傅里叶变换 图形处理单元 开放计算语言 并行计算
在线阅读 下载PDF
GPU并行计算驱动的大规模无人机集群仿真研究
18
作者 余玲 郭川东 +2 位作者 王国庆 刘艳菊 曹立佳 《计算机工程与应用》 北大核心 2025年第22期114-122,共9页
随着无人机集群仿真规模的不断扩大,传统基于CPU的串行计算方式已难以满足大规模仿真高效率、高实时性和低成本的实际需求。提出一种GPU并行计算驱动的大规模无人机集群仿真方法,从方法设计层面出发,系统构建契合GPU执行特性的并行仿真... 随着无人机集群仿真规模的不断扩大,传统基于CPU的串行计算方式已难以满足大规模仿真高效率、高实时性和低成本的实际需求。提出一种GPU并行计算驱动的大规模无人机集群仿真方法,从方法设计层面出发,系统构建契合GPU执行特性的并行仿真机制。该方法以CUDA为并行计算基础,结合仿真对象的独立性特征,采用细粒度的数据并行策略,依据静态负载均衡原则与线程映射模型,将仿真任务均匀分配至并行线程,实现计算资源的高效利用。为验证所提出方法的有效性,构建面向大规模集群的并行仿真系统,并与传统CPU仿真方法进行性能对比测试,从仿真实时性与系统扩展性两个维度深入开展性能评估。实验结果表明,该方法在保障仿真精度和系统稳定性的同时,能实现在NVIDIA GeForce GTX 1650显卡上至多2322架无人机模型的实时仿真,不仅为无人机集群的高效仿真提供了可行路径,也为其他领域的大规模并行计算问题提供了技术参考。 展开更多
关键词 gpu计算 无人机集群 并行仿真 实时性
在线阅读 下载PDF
GPU加速的卫星DSSS遥测信号解调技术现状
19
作者 陈其敏 焦义文 +2 位作者 吴涛 李雪健 冯浩 《航天工程大学学报》 2025年第5期66-73,共8页
针对卫星测控中直接序列扩频(Direct Sequence Spread Spectrum,DSSS)信号解调面临的高动态场景处理效率低、传统可编程门阵列(Field Programmable Gate Array,FPGA)平台灵活适应性不足等问题,对图形处理器(Graphics Processing Unit,G... 针对卫星测控中直接序列扩频(Direct Sequence Spread Spectrum,DSSS)信号解调面临的高动态场景处理效率低、传统可编程门阵列(Field Programmable Gate Array,FPGA)平台灵活适应性不足等问题,对图形处理器(Graphics Processing Unit,GPU)加速的DSSS遥测信号解调技术的发展现状进行了分析和研究,结合GPU异构计算架构与统一计算设备架构(Computer Unified Device Architecture,CUDA)编程模型,探讨其现状、不足及改进方向。 展开更多
关键词 直接序列扩频遥测信号 捕获跟踪 图形处理器 并行计算 实时解调
在线阅读 下载PDF
基于GPU并行计算的目标声散射Kirchhoff近似积分方法
20
作者 杨晨轩 安俊英 +1 位作者 孙阳 张毅 《声学技术》 北大核心 2025年第4期499-505,共7页
为提高水下目标中高频声散射的计算效率,文章建立了基于图形处理器(graphics processing unit,GPU)并行计算方式的目标声散射基尔霍夫(Kirchhoff)近似积分计算模型。首先,针对目标声散射的Kirchhoff近似积分方法的常量元模型和面元精确... 为提高水下目标中高频声散射的计算效率,文章建立了基于图形处理器(graphics processing unit,GPU)并行计算方式的目标声散射基尔霍夫(Kirchhoff)近似积分计算模型。首先,针对目标声散射的Kirchhoff近似积分方法的常量元模型和面元精确积分模型,建立基于GPU线程分配的并行化模式,形成可并行计算的算法模型;然后,以半径为1 m的刚性球为目标,采用GPU并行模型计算其声散射目标强度,并通过与解析解的对比验证算法的准确性;最后,以Benchmark模型为目标,通过仿真计算不同条件下的声散射目标强度,对比分析GPU并行计算模型的加速比。结果表明,常量元模型的GPU并行计算效率相比传统串行计算效率提高4~5倍;面元精确积分模型的GPU并行计算效率相比于传统串行计算效率提高8~11倍。基于GPU的并行化模式对目标声散射的Kirchhoff近似积分方法的计算具有明显的加速效果,且随着面元数增加,GPU计算优势更加明显。 展开更多
关键词 基尔霍夫(Kirchhoff)近似积分 图形处理器(gpu) 并行计算 目标散射
在线阅读 下载PDF
上一页 1 2 49 下一页 到第
使用帮助 返回顶部