期刊文献+
共找到584篇文章
< 1 2 30 >
每页显示 20 50 100
The inversion of density structure by graphic processing unit(GPU) and identification of igneous rocks in Xisha area 被引量:1
1
作者 Lei Yu Jian Zhang +2 位作者 Wei Lin Rongqiang Wei Shiguo Wu 《Earthquake Science》 2014年第1期117-125,共9页
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig... Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration. 展开更多
关键词 Xisha area Organic reefs and igneous rocks -Frequency decomposition of potential field 3D inversionof the graphic processing unit gpu parallel processing
在线阅读 下载PDF
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
2
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(gpu) gpu parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
在线阅读 下载PDF
Optimization of a precise integration method for seismic modeling based on graphic processing unit 被引量:2
3
作者 Jingyu Li Genyang Tang Tianyue Hu 《Earthquake Science》 CSCD 2010年第4期387-393,共7页
General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has ... General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has a huge quantity of data and calculation steps. In this study, we introduce a GPU-based parallel calculation method of a precise integration method (PIM) for seismic forward modeling. Compared with CPU single-core calculation, GPU parallel calculating perfectly keeps the features of PIM, which has small bandwidth, high accuracy and capability of modeling complex substructures, and GPU calculation brings high computational efficiency, which means that high-performing GPU parallel calculation can make seismic forward modeling closer to real seismic records. 展开更多
关键词 precise integration method seismic modeling general purpose gpu graphic processing unit
在线阅读 下载PDF
TIME-DOMAIN INTERPOLATION ON GRAPHICS PROCESSING UNIT 被引量:1
4
作者 XIQI LI GUOHUA SHI YUDONG ZHANG 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2011年第1期89-95,共7页
The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get ... The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get better signal-to-noise ratio(SNR)but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zeropadding interpolation method.Additionally,the resampled data can be obtained by a few data and coefficients in the cutoff window.Thus,a lot of interpolations can be performed simultaneously.So,this interpolation method is suitable for parallel computing.By using graphics processing unit(GPU)and the compute unified device architecture(CUDA)program model,time-domain interpolation can be accelerated significantly.The computing capability can be achieved more than 250,000 A-lines,200,000 A-lines,and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L=11,L=21,and L=31,respectively.A frame SD-OCT data(400A-lines×2,048 pixel per line)is acquired and processed on GPU in real time.The results show that signal processing time of SD-OCT can befinished in 6.223 ms when the cutoff length L=21,which is much faster than that on central processing unit(CPU).Real-time signal processing of acquired data can be realized. 展开更多
关键词 Optical coherence tomography real-time signal processing graphics processing unit gpu CUDA
原文传递
A graphics processing unit-based robust numerical model for solute transport driven by torrential flow condition 被引量:1
5
作者 Jing-ming HOU Bao-shan SHI +6 位作者 Qiu-hua LIANG Yu TONG Yong-de KANG Zhao-an ZHANG Gang-gang BAI Xu-jun GAO Xiao YANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2021年第10期835-850,共16页
Solute transport simulations are important in water pollution events.This paper introduces a finite volume Godunovtype model for solving a 4×4 matrix form of the hyperbolic conservation laws consisting of 2D shal... Solute transport simulations are important in water pollution events.This paper introduces a finite volume Godunovtype model for solving a 4×4 matrix form of the hyperbolic conservation laws consisting of 2D shallow water equations and transport equations.The model adopts the Harten-Lax-van Leer-contact(HLLC)-approximate Riemann solution to calculate the cell interface fluxes.It can deal well with the changes in the dry and wet interfaces in an actual complex terrain,and it has a strong shock-wave capturing ability.Using monotonic upstream-centred scheme for conservation laws(MUSCL)linear reconstruction with finite slope and the Runge-Kutta time integration method can achieve second-order accuracy.At the same time,the introduction of graphics processing unit(GPU)-accelerated computing technology greatly increases the computing speed.The model is validated against multiple benchmarks,and the results are in good agreement with analytical solutions and other published numerical predictions.The third test case uses the GPU and central processing unit(CPU)calculation models which take 3.865 s and 13.865 s,respectively,indicating that the GPU calculation model can increase the calculation speed by 3.6 times.In the fourth test case,comparing the numerical model calculated by GPU with the traditional numerical model calculated by CPU,the calculation efficiencies of the numerical model calculated by GPU under different resolution grids are 9.8–44.6 times higher than those by CPU.Therefore,it has better potential than previous models for large-scale simulation of solute transport in water pollution incidents.It can provide a reliable theoretical basis and strong data support in the rapid assessment and early warning of water pollution accidents. 展开更多
关键词 Solute transport Shallow water equations Godunov-type scheme Harten-Lax-van Leer-contact(HLLC)Riemann solver graphics processing unit(gpu)acceleration technology Torrential flow
原文传递
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
6
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit gpu computing unified device architecture (CUDA)
在线阅读 下载PDF
Graphic Processing Unit Based Phase Retrieval and CT Reconstruction for Differential X-Ray Phase Contrast Imaging
7
作者 陈晓庆 王宇杰 孙建奇 《Journal of Shanghai Jiaotong university(Science)》 EI 2014年第5期550-554,共5页
Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of ... Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of phase-contrast imaging, the grating-based phase contrast imaging has been widely accepted owing to the advantage of wide range of sample selections and exemption of coherent source. However, the downside is the substantially larger amount of data generated from the phase-stepping method which slows down the reconstruction process. Graphic processing unit(GPU) has the advantage of allowing parallel computing which is very useful for large quantity data processing. In this paper, a compute unified device architecture(CUDA) C program based on GPU is introduced to accelerate the phase retrieval and filtered back projection(FBP) algorithm for grating-based tomography. Depending on the size of the data, the CUDA C program shows different amount of speed-up over the standard C program on the same Visual Studio 2010 platform. Meanwhile, the speed-up ratio increases as the size of data increases. 展开更多
关键词 grating-based phase contrast imaging parallel computing graphic processing unit(gpu) compute unified device architecture(CUDA) filtered back projection(FBP)
原文传递
Complex hexagonal close-packed dendritic growth during alloy solidification by graphics processing unit-accelerated three-dimensional phase-field simulations:demo for Mg–Gd alloy
8
作者 Sheng-Lan Yang Jing Zhong +5 位作者 Kai Wang Xun Kang Jian-Bao Gao Jiong Wang Qian Li Li-Jun Zhang 《Rare Metals》 SCIE EI CAS CSCD 2023年第10期3468-3484,共17页
In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dim... In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dimensional(3D) phase-field simulations,as demonstrated for a Mg-Gd alloy.An anisotropic phasefield model with finite interface dissipation was developed by incorporating the contribution of the anisotropy of interfacial energy into the total free energy functional.The modified spherical harmonic anisotropy function was then chosen for the hcp crystal.The GPU parallel computing algorithm was implemented in the present phase-field model,and a corresponding code was developed in the compute unified device architecture parallel computing platform.Benchmark tests indicated that the calculation efficiency of a single TESLA V100 GPU could be~80times that of open multi-processing(OpenMP) with eight central processing unit cores.By coupling the phase-field model with reliable thermodynamic and interfacial energy descriptions,the 3D phase-field simulation of α-Mg dendritic growth in the Mg-6Gd(in wt%) alloy during solidification was performed.Various two-dimensional dendrite morphologies were revealed by cutting the simulated 3D dendrite along different crystallographic planes.Typical sixfold equiaxed and butterflied microstructures observed in experiments were well reproduced. 展开更多
关键词 Interfacial anisotropy Dendrite solidification Phase-field model graphics processing unit(gpu) Mg–Gd
原文传递
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
9
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(gpu) compute unified device architecture (CUDA) neural network species recognition
在线阅读 下载PDF
Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units
10
作者 LI Bingchao WEI Jizeng +1 位作者 GUO Wei SUN Jizhou 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第2期245-256,共12页
Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a war... Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a warp may jump to different paths after conditional branches.Such divergent control flow makes some lanes idle and hence reduces the SIMD utilization of GPUs.To alleviate the waste of SIMD lanes,threads from multiple warps can be collected together to improve the SIMD lane utilization by compacting threads into idle lanes.However,this mechanism induces extra barrier synchronizations since warps have to be stalled to wait for other warps for compactions,resulting in that no warps are scheduled in some cases.In this paper,we propose an approach to reduce the overhead of barrier synchronizat ions induced by compactions,In our approach,a compaction is bypassed by warps whose threads all jump to the same path after branches.Moreover,warps waiting for a compaction can also bypass this compaction when no warps are ready for issuing.In addition,a compaction is canceled if idle lanes can not be reduced via this compaction.The experimental results demonstrate that our approach provides an average improvement of 21%over the baseline GPU for applications with massive divergent branches,while recovering the performance loss induced by compactions by 13%on average for applications with many non-divergent control flows. 展开更多
关键词 graphics processing unit(gpu) single instruction ultiple data(SIMD) THREAD warps BYPASS
原文传递
Graphic Processing Unit-Accelerated Mutual Information-Based 3D Image Rigid Registration
11
作者 李冠华 欧宗瑛 +1 位作者 苏铁明 韩军 《Transactions of Tianjin University》 EI CAS 2009年第5期375-380,共6页
Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual ... Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation, and performing 3D spatial transformation and trilinear interpolation on graphic processing unit (GPU). The 3D floating image is downloaded to GPU as flat 3D texture, and then fetched and interpolated for each new voxel location in fragment shader. The transformed resuits are rendered to textures by using frame buffer object (FBO) extension, and then read to the main memory used for the remaining computation on CPU. Experimental results show that GPU-accelerated method can achieve speedup about an order of magnitude with better registration result compared with the software implementation on a single-core CPU. 展开更多
关键词 image registration mutual information graphic processing unit gpu
在线阅读 下载PDF
基于CPU-GPU的超音速流场N-S方程数值模拟
12
作者 卢志伟 张皓茹 +3 位作者 刘锡尧 王亚东 张卓凯 张君安 《中国机械工程》 北大核心 2025年第9期1942-1950,共9页
为深入分析超音速流场的特性并提高数值计算效率,设计了一种高效的加速算法。该算法充分利用中央处理器-图形处理器(CPU-GPU)异构并行模式,通过异步流方式实现数据传输及处理,显著加速了超音速流场数值模拟的计算过程。结果表明:GPU并... 为深入分析超音速流场的特性并提高数值计算效率,设计了一种高效的加速算法。该算法充分利用中央处理器-图形处理器(CPU-GPU)异构并行模式,通过异步流方式实现数据传输及处理,显著加速了超音速流场数值模拟的计算过程。结果表明:GPU并行计算速度明显高于CPU串行计算速度,其加速比随流场网格规模的增大而明显提高。GPU并行计算可以有效提高超音速流场的计算速度,为超音速飞行器的设计、优化、性能评估及其研发提供一种强有力的并行计算方法。 展开更多
关键词 超音速流场 中央处理器-图形处理器 异构计算 有限差分
在线阅读 下载PDF
Fast modeling of gravity gradients from topographic surface data using GPU parallel algorithm 被引量:1
13
作者 Xuli Tan Qingbin Wang +2 位作者 Jinkai Feng Yan Huang Ziyan Huang 《Geodesy and Geodynamics》 CSCD 2021年第4期288-297,共10页
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part... The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments. 展开更多
关键词 Gravity gradient Topographic surface data Rectangle prism method Parallel computation graphical processing unit(gpu)
原文传递
基于GPU的OMCSS水声通信M元解扩算法并行实现
14
作者 彭海源 王巍 +4 位作者 李德瑞 刘彦君 李宇 迟骋 田亚男 《系统工程与电子技术》 北大核心 2025年第3期978-986,共9页
针对正交多载波扩频(orthogonal multi-carrier spread spectrum,OMCSS)水声通信系统接收信号快速处理需求,提出一种基于图形处理模块(graphic processing unit,GPU)的M元解扩算法的并行实现方法。首先,分析M元解扩算法在GPU平台上实现... 针对正交多载波扩频(orthogonal multi-carrier spread spectrum,OMCSS)水声通信系统接收信号快速处理需求,提出一种基于图形处理模块(graphic processing unit,GPU)的M元解扩算法的并行实现方法。首先,分析M元解扩算法在GPU平台上实现的可行性,针对算法内部基础运算单元进行并行优化处理。然后,为了进一步提升GPU并行运行速度,对算法进行基于并发内核执行的M元并行解扩计算架构设计。在中央处理器(central processing unit,CPU)+GPU异构平台上对算法性能进行测试。测试结果表明,设计的M元并行解扩算法相比M元串行解扩算法在运行速度上有最大90.47%的提升,最大加速比为10.5。 展开更多
关键词 正交多载波扩频 水声通信 M元解扩 图形处理模块 并行实现
在线阅读 下载PDF
GPU加速的卫星DSSS遥测信号解调技术现状
15
作者 陈其敏 焦义文 +2 位作者 吴涛 李雪健 冯浩 《航天工程大学学报》 2025年第5期66-73,共8页
针对卫星测控中直接序列扩频(Direct Sequence Spread Spectrum,DSSS)信号解调面临的高动态场景处理效率低、传统可编程门阵列(Field Programmable Gate Array,FPGA)平台灵活适应性不足等问题,对图形处理器(Graphics Processing Unit,G... 针对卫星测控中直接序列扩频(Direct Sequence Spread Spectrum,DSSS)信号解调面临的高动态场景处理效率低、传统可编程门阵列(Field Programmable Gate Array,FPGA)平台灵活适应性不足等问题,对图形处理器(Graphics Processing Unit,GPU)加速的DSSS遥测信号解调技术的发展现状进行了分析和研究,结合GPU异构计算架构与统一计算设备架构(Computer Unified Device Architecture,CUDA)编程模型,探讨其现状、不足及改进方向。 展开更多
关键词 直接序列扩频遥测信号 捕获跟踪 图形处理器 并行计算 实时解调
在线阅读 下载PDF
基于GPU并行计算的目标声散射Kirchhoff近似积分方法
16
作者 杨晨轩 安俊英 +1 位作者 孙阳 张毅 《声学技术》 北大核心 2025年第4期499-505,共7页
为提高水下目标中高频声散射的计算效率,文章建立了基于图形处理器(graphics processing unit,GPU)并行计算方式的目标声散射基尔霍夫(Kirchhoff)近似积分计算模型。首先,针对目标声散射的Kirchhoff近似积分方法的常量元模型和面元精确... 为提高水下目标中高频声散射的计算效率,文章建立了基于图形处理器(graphics processing unit,GPU)并行计算方式的目标声散射基尔霍夫(Kirchhoff)近似积分计算模型。首先,针对目标声散射的Kirchhoff近似积分方法的常量元模型和面元精确积分模型,建立基于GPU线程分配的并行化模式,形成可并行计算的算法模型;然后,以半径为1 m的刚性球为目标,采用GPU并行模型计算其声散射目标强度,并通过与解析解的对比验证算法的准确性;最后,以Benchmark模型为目标,通过仿真计算不同条件下的声散射目标强度,对比分析GPU并行计算模型的加速比。结果表明,常量元模型的GPU并行计算效率相比传统串行计算效率提高4~5倍;面元精确积分模型的GPU并行计算效率相比于传统串行计算效率提高8~11倍。基于GPU的并行化模式对目标声散射的Kirchhoff近似积分方法的计算具有明显的加速效果,且随着面元数增加,GPU计算优势更加明显。 展开更多
关键词 基尔霍夫(Kirchhoff)近似积分 图形处理器(gpu) 并行计算 目标散射
在线阅读 下载PDF
基于GP-GPU技术应用的导引头信号处理模块架构设计
17
作者 马啸龙 许新鹏 +2 位作者 任书磊 李晨 崔闪 《空天防御》 2025年第2期84-92,共9页
针对目前主动导引头信号级建模仿真效率不高、实时性不强的问题,提出了一种基于图形处理器通用计算(General-Purpose Computing on Graphics Processing Units,GP-GPU)并行加速技术的导引头信号处理模块架构方法。采用CUDA编程形式对信... 针对目前主动导引头信号级建模仿真效率不高、实时性不强的问题,提出了一种基于图形处理器通用计算(General-Purpose Computing on Graphics Processing Units,GP-GPU)并行加速技术的导引头信号处理模块架构方法。采用CUDA编程形式对信号处理模块整体及其子模块进行基于GPU加速的架构搭建和接口设计,并对所构建的并行化模块架构进行仿真,对比全CPU状态下的耗时,以验证架构的可靠性与加速性能。仿真结果表明,基于GPU的并行化模块构架的时间速率是全CPU构架时间速率的12.67倍,初步验证了所搭建架构的可行性和加速效率。 展开更多
关键词 导引头仿真系统 图形处理器 异构并行 信号处理
在线阅读 下载PDF
基于GPU加速改进粒子群算法的多波束卫星通信资源优化
18
作者 宋自阳 张廷尧 +3 位作者 赵家庆 慕忠成 黄益新 付哲楷 《上海航天(中英文)》 2025年第5期121-130,共10页
随着低轨(LEO)星座多波束卫星通信系统在宽带接入、物联网(IoT)等领域的广泛应用,星座动态对地通信、动态选择服务节点的场景需求日益凸显,资源调度的效率与优化质量成为系统性能的关键影响因素。传统优化算法在面对波束、功率、带宽等... 随着低轨(LEO)星座多波束卫星通信系统在宽带接入、物联网(IoT)等领域的广泛应用,星座动态对地通信、动态选择服务节点的场景需求日益凸显,资源调度的效率与优化质量成为系统性能的关键影响因素。传统优化算法在面对波束、功率、带宽等多维决策变量联合优化问题时,存在编码表达能力有限、约束处理繁琐、收敛速度慢等问题。为此,本文提出一种基于混合Stick-breaking编码机制与图形处理器(GPU)并行加速的改进粒子群算法(PPSO),用于高效求解LEO星座多波束卫星系统中的智能资源分配优化问题。该方法通过混合Stick-breaking编码方式对粒子的解空间进行重构,使得功率与带宽等归一化变量天然满足全局约束,避免了传统方法中复杂的约束修正操作。同时,借助GPU实现粒子群演化与适应度计算的并行加速,在保证解的质量的前提下显著提升算法运行效率。实验结果表明:本文方法在优化系统总时延、丢包率和能耗等关键性能指标方面均优于现有方法,尤其在需要动态对地通信、动态选择节点的大规模星座场景中,展现出较好的可扩展性与计算优势。 展开更多
关键词 低轨(LEO)星座 动态调度 图形处理器(gpu)加速 优化设计 粒子群算法(PSO)
在线阅读 下载PDF
GPU上两阶段负载调度问题的建模与近似算法 被引量:7
19
作者 孙景昊 邓庆绪 孟亚坤 《软件学报》 EI CSCD 北大核心 2014年第2期298-313,共16页
随着硬件功能的不断丰富和软件开发环境的逐渐成熟,GPU(graphics processing unit)越来越多地被应用到通用计算领域,并对诸多计算系统(尤其是嵌入式系统)性能的显著提升起到了至关重要的作用.在基于GPU的计算系统中,大规模并行负载同时... 随着硬件功能的不断丰富和软件开发环境的逐渐成熟,GPU(graphics processing unit)越来越多地被应用到通用计算领域,并对诸多计算系统(尤其是嵌入式系统)性能的显著提升起到了至关重要的作用.在基于GPU的计算系统中,大规模并行负载同时进行数据传输和加载的情况时常发生,数据传输延时在系统性能全局最优化中变得不容忽视.综合考虑负载的传输时间和执行时间,以总负载makespan最小化作为系统性能的全局优化目标,研究了GPU上负载"传输-执行"联合调度问题.首先,将负载的时间信息和并行任务数与矩形域的二维空间联系起来,建立了负载的2D双层矩形域模型;然后,将GPU上负载调度问题归结为一类Strip-Packing问题;最后,基于贪婪策略给出了近似度为3的多项式时间近似算法,算法复杂度为O(nlogn).该近似算法的核心是对数据传输阶段进行负载排序调度.这从理论层面上证明了GPU系统采取"传输-执行"两阶段调度的有效性,即,在数据传输阶段采取负载排序调度,在负载执行阶段采取先来先服务(first-come-first-serve,简称FCFS)调度,能够使GPU性能达到全局最优或近似最优. 展开更多
关键词 gpu(graphics processing unit) 数据传输 负载排序 strip-packing 近似算法
在线阅读 下载PDF
基于GPU的实时深度图像前向映射绘制算法 被引量:7
20
作者 刘保权 刘学慧 吴恩华 《软件学报》 EI CSCD 北大核心 2007年第6期1531-1542,共12页
提出一种完全基于GPU(graphics processing unit)的实时深度图像绘制流程.该方法利用GPU的并行计算特性对深度图像的绘制过程进行加速.推导出一种在vertex shader上进行的三维前向映射方法,对输入像素进行前向映射,以得到更高的绘制性能... 提出一种完全基于GPU(graphics processing unit)的实时深度图像绘制流程.该方法利用GPU的并行计算特性对深度图像的绘制过程进行加速.推导出一种在vertex shader上进行的三维前向映射方法,对输入像素进行前向映射,以得到更高的绘制性能,并利用图形硬件流水线的光栅化功能高效地进行图像的插值重构,以得到连续无洞的结果图像.在pixel shader上进行逐像素的光照计算,生成高品质的光照效果.实验表明,该方法可以高速地进行满屏绘制,准确地保留物体轮廓信息和正确的遮挡关系.还实现了基于该方法的实时漫游系统.该系统能够实时地绘制多个基于柱面深度图像表示的对象,并能对其进行视相关的动态LOD(level of detail)操作. 展开更多
关键词 图形硬件 gpu(graphics processing unit) 实时绘制 深度图像 基于图像的绘制 逐像素光照
在线阅读 下载PDF
上一页 1 2 30 下一页 到第
使用帮助 返回顶部