期刊文献+
共找到978篇文章
< 1 2 49 >
每页显示 20 50 100
Optimization of a precise integration method for seismic modeling based on graphic processing unit 被引量:2
1
作者 Jingyu Li Genyang Tang Tianyue Hu 《Earthquake Science》 CSCD 2010年第4期387-393,共7页
General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has ... General purpose graphic processing unit (GPU) calculation technology is gradually widely used in various fields. Its mode of single instruction, multiple threads is capable of seismic numerical simulation which has a huge quantity of data and calculation steps. In this study, we introduce a GPU-based parallel calculation method of a precise integration method (PIM) for seismic forward modeling. Compared with CPU single-core calculation, GPU parallel calculating perfectly keeps the features of PIM, which has small bandwidth, high accuracy and capability of modeling complex substructures, and GPU calculation brings high computational efficiency, which means that high-performing GPU parallel calculation can make seismic forward modeling closer to real seismic records. 展开更多
关键词 precise integration method seismic modeling general purpose GPU graphic processing unit
在线阅读 下载PDF
Complex hexagonal close-packed dendritic growth during alloy solidification by graphics processing unit-accelerated three-dimensional phase-field simulations:demo for Mg–Gd alloy 被引量:1
2
作者 Sheng-Lan Yang Jing Zhong +5 位作者 Kai Wang Xun Kang Jian-Bao Gao Jiong Wang Qian Li Li-Jun Zhang 《Rare Metals》 SCIE EI CAS CSCD 2023年第10期3468-3484,共17页
In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dim... In this study,insights into the effect of interfacial anisotropy on a complex hexagonal close-packed(hcp) dendritic growth during alloy solidification were gained by graphics processing unit(GPU)-accelerated three-dimensional(3D) phase-field simulations,as demonstrated for a Mg-Gd alloy.An anisotropic phasefield model with finite interface dissipation was developed by incorporating the contribution of the anisotropy of interfacial energy into the total free energy functional.The modified spherical harmonic anisotropy function was then chosen for the hcp crystal.The GPU parallel computing algorithm was implemented in the present phase-field model,and a corresponding code was developed in the compute unified device architecture parallel computing platform.Benchmark tests indicated that the calculation efficiency of a single TESLA V100 GPU could be~80times that of open multi-processing(OpenMP) with eight central processing unit cores.By coupling the phase-field model with reliable thermodynamic and interfacial energy descriptions,the 3D phase-field simulation of α-Mg dendritic growth in the Mg-6Gd(in wt%) alloy during solidification was performed.Various two-dimensional dendrite morphologies were revealed by cutting the simulated 3D dendrite along different crystallographic planes.Typical sixfold equiaxed and butterflied microstructures observed in experiments were well reproduced. 展开更多
关键词 Interfacial anisotropy Dendrite solidification Phase-field model graphics processing unit(GPU) Mg–Gd
原文传递
TIME-DOMAIN INTERPOLATION ON GRAPHICS PROCESSING UNIT 被引量:1
3
作者 XIQI LI GUOHUA SHI YUDONG ZHANG 《Journal of Innovative Optical Health Sciences》 SCIE EI CAS 2011年第1期89-95,共7页
The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get ... The signal processing speed of spectral domain optical coherence tomography(SD-OCT)has become a bottleneck in a lot of medical applications.Recently,a time-domain interpolation method was proposed.This method can get better signal-to-noise ratio(SNR)but much-reduced signal processing time in SD-OCT data processing as compared with the commonly used zeropadding interpolation method.Additionally,the resampled data can be obtained by a few data and coefficients in the cutoff window.Thus,a lot of interpolations can be performed simultaneously.So,this interpolation method is suitable for parallel computing.By using graphics processing unit(GPU)and the compute unified device architecture(CUDA)program model,time-domain interpolation can be accelerated significantly.The computing capability can be achieved more than 250,000 A-lines,200,000 A-lines,and 160,000 A-lines in a second for 2,048 pixel OCT when the cutoff length is L=11,L=21,and L=31,respectively.A frame SD-OCT data(400A-lines×2,048 pixel per line)is acquired and processed on GPU in real time.The results show that signal processing time of SD-OCT can befinished in 6.223 ms when the cutoff length L=21,which is much faster than that on central processing unit(CPU).Real-time signal processing of acquired data can be realized. 展开更多
关键词 Optical coherence tomography real-time signal processing graphics processing unit GPU CUDA
原文传递
A graphics processing unit-based robust numerical model for solute transport driven by torrential flow condition 被引量:1
4
作者 Jing-ming HOU Bao-shan SHI +6 位作者 Qiu-hua LIANG Yu TONG Yong-de KANG Zhao-an ZHANG Gang-gang BAI Xu-jun GAO Xiao YANG 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2021年第10期835-850,共16页
Solute transport simulations are important in water pollution events.This paper introduces a finite volume Godunovtype model for solving a 4×4 matrix form of the hyperbolic conservation laws consisting of 2D shal... Solute transport simulations are important in water pollution events.This paper introduces a finite volume Godunovtype model for solving a 4×4 matrix form of the hyperbolic conservation laws consisting of 2D shallow water equations and transport equations.The model adopts the Harten-Lax-van Leer-contact(HLLC)-approximate Riemann solution to calculate the cell interface fluxes.It can deal well with the changes in the dry and wet interfaces in an actual complex terrain,and it has a strong shock-wave capturing ability.Using monotonic upstream-centred scheme for conservation laws(MUSCL)linear reconstruction with finite slope and the Runge-Kutta time integration method can achieve second-order accuracy.At the same time,the introduction of graphics processing unit(GPU)-accelerated computing technology greatly increases the computing speed.The model is validated against multiple benchmarks,and the results are in good agreement with analytical solutions and other published numerical predictions.The third test case uses the GPU and central processing unit(CPU)calculation models which take 3.865 s and 13.865 s,respectively,indicating that the GPU calculation model can increase the calculation speed by 3.6 times.In the fourth test case,comparing the numerical model calculated by GPU with the traditional numerical model calculated by CPU,the calculation efficiencies of the numerical model calculated by GPU under different resolution grids are 9.8–44.6 times higher than those by CPU.Therefore,it has better potential than previous models for large-scale simulation of solute transport in water pollution incidents.It can provide a reliable theoretical basis and strong data support in the rapid assessment and early warning of water pollution accidents. 展开更多
关键词 Solute transport Shallow water equations Godunov-type scheme Harten-Lax-van Leer-contact(HLLC)Riemann solver graphics processing unit(GPU)acceleration technology Torrential flow
原文传递
The inversion of density structure by graphic processing unit(GPU) and identification of igneous rocks in Xisha area 被引量:1
5
作者 Lei Yu Jian Zhang +2 位作者 Wei Lin Rongqiang Wei Shiguo Wu 《Earthquake Science》 2014年第1期117-125,共9页
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig... Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration. 展开更多
关键词 Xisha area Organic reefs and igneous rocks -Frequency decomposition of potential field 3D inversionof the graphic processing unit (GPU) parallel processing
在线阅读 下载PDF
Graphical Processing Unit Based Time-Parallel Numerical Method for Ordinary Differential Equations 被引量:1
6
作者 Sumathi Lakshmiranganatha Suresh S. Muknahallipatna 《Journal of Computer and Communications》 2020年第2期39-63,共25页
On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involv... On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involves computing the solutions of the algebraic equations modeling the grid network and the ordinary differential equations modeling the dynamics of the electrical components like synchronous generators, exciters, governors, etc., of the grid in near real-time. In this research, we investigate the use of time-parallel approach in particular the Parareal algorithm implementation on Graphical Processing Unit using Compute Unified Device Architecture to compute solutions of ordinary differential equations. The numerical solution accuracy and computation time of the Parareal algorithm executing on the GPU are demonstrated on the single machine infinite bus test system. Two types of dynamic model of the single synchronous generator namely the classical and detailed models are studied. The numerical solutions of the ordinary differential equations computed by the Parareal algorithm are compared to that computed using the modified Euler’s method demonstrating the accuracy of the Parareal algorithm executing on GPU. Simulations are performed with varying numerical integration time steps, and the suitability of Parareal algorithm in computing near real-time solutions of ordinary different equations is presented. A speedup of 25× and 31× is achieved with the Parareal algorithm for classical and detailed dynamic models of the synchronous generator respectively compared to the sequential modified Euler’s method. The weak scaling efficiency of the Parareal algorithm when required to solve a large number of ordinary differential equations at each time step due to the increase in sequential computations and associated memory transfer latency between the CPU and GPU is discussed. 展开更多
关键词 Time-Parallel Differential EQUATION Numerical Integration graphic processing unit
在线阅读 下载PDF
Simulation of fluid-structure interaction in a microchannel using the lattice Boltzmann method and size-dependent beam element on a graphics processing unit
7
作者 Vahid Esfahanian Esmaeil Dehdashti Amir Mehdi Dehrouye-Semnani 《Chinese Physics B》 SCIE EI CAS CSCD 2014年第8期389-395,共7页
Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The b... Fluid-structure interaction (FSI) problems in microchannels play a prominent role in many engineering applications. The present study is an effort toward the simulation of flow in microchannel considering FSI. The bottom boundary of the microchannel is simulated by size-dependent beam elements for the finite element method (FEM) based on a modified cou- ple stress theory. The lattice Boltzmann method (LBM) using the D2Q13 LB model is coupled to the FEM in order to solve the fluid part of the FSI problem. Because of the fact that the LBM generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallel computing. The simulations are carried out on graphics processing units (GPUs) using computed unified device architecture (CUDA). In the present study, the governing equations are non-dimensionalized and the set of dimensionless groups is exhibited to show their effects on micro-beam displacement. The numerical results show that the displacements of the micro-beam predicted by the size-dependent beam element are smaller than those by the classical beam element. 展开更多
关键词 fluid-structure interaction graphics processing unit lattice Boltzmann method size-dependentbeam element
原文传递
Multi-relaxation-time lattice Boltzmann simulations of lid driven flows using graphics processing unit
8
作者 Chenggong LI J.P.Y.MAA 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2017年第5期707-722,共16页
Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simul... Large eddy simulation (LES) using the Smagorinsky eddy viscosity model is added to the two-dimensional nine velocity components (D2Q9) lattice Boltzmann equation (LBE) with multi-relaxation-time (MRT) to simulate incompressible turbulent cavity flows with the Reynolds numbers up to 1 × 10^7. To improve the computation efficiency of LBM on the numerical simulations of turbulent flows, the massively parallel computing power from a graphic processing unit (GPU) with a computing unified device architecture (CUDA) is introduced into the MRT-LBE-LES model. The model performs well, compared with the results from others, with an increase of 76 times in computation efficiency. It appears that the higher the Reynolds numbers is, the smaller the Smagorinsky constant should be, if the lattice number is fixed. Also, for a selected high Reynolds number and a selected proper Smagorinsky constant, there is a minimum requirement for the lattice number so that the Smagorinsky eddy viscosity will not be excessively large. 展开更多
关键词 large eddy simulation (LES) multi-relaxation-time (MRT) lattice Boltzmann equation (LBE) two-dimensional nine velocity components (D2Q9) Smagorinskymodel graphic processing unit (GPU) computing unified device architecture (CUDA)
在线阅读 下载PDF
Exploiting Parallelism in the Simulation of General Purpose Graphics Processing Unit Program
9
作者 赵夏 马胜 +1 位作者 陈微 王志英 《Journal of Shanghai Jiaotong university(Science)》 EI 2016年第3期280-288,共9页
The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for t... The simulation is an important means of performance evaluation of the computer architecture. Nowadays, the serial simulation of general purpose graphics processing unit(GPGPU) architecture is the main bottleneck for the simulation speed. To address this issue, we propose the intra-kernel parallelization on a multicore processor and the inter-kernel parallelization on a multiple-machine platform. We apply these two methods to the GPGPU-sim simulator. The intra-kernel parallelization method firstly parallelizes the serial simulation of multiple compute units in one cycle. Then it parallelizes the timing and functional simulation to reduce the performance loss caused by the synchronization between different compute units. The inter-kernel parallelization method divides multiple kernels of a CUDA program into several groups and distributes these groups across multiple simulation hosts to perform the simulation. Experimental results show that the intra-kernel parallelization method achieves a speed-up of up to 12 with a maximum error rate of 0.009 4% on a 32-core machine, and the inter-kernel parallelization method can accelerate the simulation by a factor of up to 3.9 with a maximum error rate of 0.11% on four simulation hosts. The orthogonality between these two methods allows us to combine them together on multiple multi-core hosts to get further performance improvements. 展开更多
关键词 general purpose graphics processing unit(GPGPU) MULTICORE intra-kernel inter-kernel parallel
原文传递
Compute Unified Device Architecture Implementation of Euler/Navier-Stokes Solver on Graphics Processing Unit Desktop Platform for 2-D Compressible Flows
10
作者 Zhang Jiale Chen Hongquan 《Transactions of Nanjing University of Aeronautics and Astronautics》 EI CSCD 2016年第5期536-545,共10页
Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/N... Personal desktop platform with teraflops peak performance of thousands of cores is realized at the price of conventional workstations using the programmable graphics processing units(GPUs).A GPU-based parallel Euler/Navier-Stokes solver is developed for 2-D compressible flows by using NVIDIA′s Compute Unified Device Architecture(CUDA)programming model in CUDA Fortran programming language.The techniques of implementation of CUDA kernels,double-layered thread hierarchy and variety memory hierarchy are presented to form the GPU-based algorithm of Euler/Navier-Stokes equations.The resulting parallel solver is validated by a set of typical test flow cases.The numerical results show that dozens of times speedup relative to a serial CPU implementation can be achieved using a single GPU desktop platform,which demonstrates that a GPU desktop can serve as a costeffective parallel computing platform to accelerate computational fluid dynamics(CFD)simulations substantially. 展开更多
关键词 graphics processing unit(GPU) GPU parallel computing compute unified device architecture(CUDA)Fortran finite volume method(FVM) acceleration
在线阅读 下载PDF
Graphic Processing Unit Based Phase Retrieval and CT Reconstruction for Differential X-Ray Phase Contrast Imaging
11
作者 陈晓庆 王宇杰 孙建奇 《Journal of Shanghai Jiaotong university(Science)》 EI 2014年第5期550-554,共5页
Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of ... Compared with the conventional X-ray absorption imaging, the X-ray phase-contrast imaging shows higher contrast on samples with low attenuation coefficient like blood vessels and soft tissues. Among the modalities of phase-contrast imaging, the grating-based phase contrast imaging has been widely accepted owing to the advantage of wide range of sample selections and exemption of coherent source. However, the downside is the substantially larger amount of data generated from the phase-stepping method which slows down the reconstruction process. Graphic processing unit(GPU) has the advantage of allowing parallel computing which is very useful for large quantity data processing. In this paper, a compute unified device architecture(CUDA) C program based on GPU is introduced to accelerate the phase retrieval and filtered back projection(FBP) algorithm for grating-based tomography. Depending on the size of the data, the CUDA C program shows different amount of speed-up over the standard C program on the same Visual Studio 2010 platform. Meanwhile, the speed-up ratio increases as the size of data increases. 展开更多
关键词 grating-based phase contrast imaging parallel computing graphic processing unit(GPU) compute unified device architecture(CUDA) filtered back projection(FBP)
原文传递
Bypass-Enabled Thread Compaction for Divergent Control Flow in Graphics Processing Units
12
作者 LI Bingchao WEI Jizeng +1 位作者 GUO Wei SUN Jizhou 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第2期245-256,共12页
Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a war... Graphics processing units(GPUs)employ the single instruction multiple data(SIMD)hardware to run threads in parallel and allow each thread to maintain an arbitrary control flow.Threads running concurrently within a warp may jump to different paths after conditional branches.Such divergent control flow makes some lanes idle and hence reduces the SIMD utilization of GPUs.To alleviate the waste of SIMD lanes,threads from multiple warps can be collected together to improve the SIMD lane utilization by compacting threads into idle lanes.However,this mechanism induces extra barrier synchronizations since warps have to be stalled to wait for other warps for compactions,resulting in that no warps are scheduled in some cases.In this paper,we propose an approach to reduce the overhead of barrier synchronizat ions induced by compactions,In our approach,a compaction is bypassed by warps whose threads all jump to the same path after branches.Moreover,warps waiting for a compaction can also bypass this compaction when no warps are ready for issuing.In addition,a compaction is canceled if idle lanes can not be reduced via this compaction.The experimental results demonstrate that our approach provides an average improvement of 21%over the baseline GPU for applications with massive divergent branches,while recovering the performance loss induced by compactions by 13%on average for applications with many non-divergent control flows. 展开更多
关键词 graphics processing unit(GPU) single instruction ultiple data(SIMD) THREAD warps BYPASS
原文传递
Graphic Processing Unit-Accelerated Neural Network Model for Biological Species Recognition
13
作者 温程璐 潘伟 +1 位作者 陈晓熹 祝青园 《Journal of Donghua University(English Edition)》 EI CAS 2012年第1期5-8,共4页
A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary netw... A graphic processing unit (GPU)-accelerated biological species recognition method using partially connected neural evolutionary network model is introduced in this paper. The partial connected neural evolutionary network adopted in the paper can overcome the disadvantage of traditional neural network with small inputs. The whole image is considered as the input of the neural network, so the maximal features can be kept for recognition. To speed up the recognition process of the neural network, a fast implementation of the partially connected neural network was conducted on NVIDIA Tesla C1060 using the NVIDIA compute unified device architecture (CUDA) framework. Image sets of eight biological species were obtained to test the GPU implementation and counterpart serial CPU implementation, and experiment results showed GPU implementation works effectively on both recognition rate and speed, and gained 343 speedup over its counterpart CPU implementation. Comparing to feature-based recognition method on the same recognition task, the method also achieved an acceptable correct rate of 84.6% when testing on eight biological species. 展开更多
关键词 graphic processing unit(GPU) compute unified device architecture (CUDA) neural network species recognition
在线阅读 下载PDF
Graphic Processing Unit-Accelerated Mutual Information-Based 3D Image Rigid Registration
14
作者 李冠华 欧宗瑛 +1 位作者 苏铁明 韩军 《Transactions of Tianjin University》 EI CAS 2009年第5期375-380,共6页
Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual ... Mutual information (MI)-based image registration is effective in registering medical images, but it is computationally expensive. This paper accelerates MI-based image registration by dividing computation of mutual information into spatial transformation and histogram-based calculation, and performing 3D spatial transformation and trilinear interpolation on graphic processing unit (GPU). The 3D floating image is downloaded to GPU as flat 3D texture, and then fetched and interpolated for each new voxel location in fragment shader. The transformed resuits are rendered to textures by using frame buffer object (FBO) extension, and then read to the main memory used for the remaining computation on CPU. Experimental results show that GPU-accelerated method can achieve speedup about an order of magnitude with better registration result compared with the software implementation on a single-core CPU. 展开更多
关键词 image registration mutual information graphic processing unit (GPU)
在线阅读 下载PDF
NGP-ERGAS: Revisit Instant Neural Graphics Primitives with the Relative Dimensionless Global Error in Synthesis
15
作者 Dongheng Ye Heping Li +2 位作者 Ning An Jian Cheng Liang Wang 《Computers, Materials & Continua》 2025年第8期3731-3747,共17页
The newly emerging neural radiance fields(NeRF)methods can implicitly fulfill three-dimensional(3D)reconstruction via training a neural network to render novel-view images of a given scene with given posed images.The ... The newly emerging neural radiance fields(NeRF)methods can implicitly fulfill three-dimensional(3D)reconstruction via training a neural network to render novel-view images of a given scene with given posed images.The Instant Neural Graphics Primitives(Instant-NGP)method further improves the position encoding of NeRF.It obtains state-of-the-art efficiency.However,only a local pixel-wised loss is considered when training the Instant-NGP while overlooking the nonlocal structural information between pixels.Despite a good quantitative result,it leads to a poor visual effect,especially the completeness.Inspired by the stochastic structural similarity(S3IM)method that exploits nonlocal structural information of groups of pixels,this paper proposes a new method to improve the completeness of fast novel view synthesis.The proposed method first extends the thread-wised processing of the Instant-NGP to the processing in a customthread block(i.e.,a group of threads).Then,the relative dimensionless global error in synthesis,i.e.,Erreur Relative Globale Adimensionnelle de Synthese(ERGAS),of a group of pixels corresponding to a group of threads is computed and incorporated into the loss function.Extensive experiments validate the proposed method.It can obtain better quantitative results than the original Instant-NGP with fewer iteration steps.PSNR is increased by 1%.Amazing qualitative results are obtained,especially for delicate structures and details such as lines and continuous structures.With the dramatic improvements in the visual effects,our method can boost the practicability of implicit 3D reconstruction in applications such as self-driving and augmented reality. 展开更多
关键词 Neural radiance fields novel view synthesis 3D reconstruction graphic processing unit
在线阅读 下载PDF
面向GPU的稀疏对角矩阵自适应SpMV优化方法
16
作者 王宇华 何俊飞 +2 位作者 张宇琪 兰海燕 曹林琳 《计算机工程》 北大核心 2026年第3期332-345,共14页
稀疏矩阵向量乘(SpMV)是稀疏线性系统的计算核心和瓶颈,其运算效率会影响迭代求解器的整体性能,其优化研究一直是科学计算和工程应用领域中的研究热点之一。偏微分方程的离散化会产生稀疏对角矩阵,由于其多样的非零元分布,导致没有一种... 稀疏矩阵向量乘(SpMV)是稀疏线性系统的计算核心和瓶颈,其运算效率会影响迭代求解器的整体性能,其优化研究一直是科学计算和工程应用领域中的研究热点之一。偏微分方程的离散化会产生稀疏对角矩阵,由于其多样的非零元分布,导致没有一种方法能够在所有矩阵中取得最优时间性能。针对上述问题,提出一种面向图形处理单元(GPU)的稀疏对角矩阵自适应SpMV优化方法AST(Adaptive SpMV Tuning)。该方法通过设计特征空间,构建特征提取器,提取矩阵结构精细特征,通过深入分析特征和SpMV方法的相关性,建立可扩展的候选方法集合,形成特征和最优方法的映射关系,构建性能预测工具,实现矩阵最优方法的高效预测。实验结果表明,AST能够取得85.8%的预测准确率,平均时间性能损失为0.09,相比于DIA(Diagonal)、HDIA(Hacked DIA)、HDC(Hybrid of DIA and Compressed Sparse Row)、DIA-Adaptive和DRM(Divide-Rearrange and Merge),能够获得平均20.19、1.86、3.06、3.72和1.53倍的内核运行时间加速和1.05、1.28、12.45、1.94和0.97倍的浮点运算性能加速。 展开更多
关键词 稀疏矩阵向量乘 稀疏对角矩阵 图形处理单元 自适应优化方法 矩阵结构特征
在线阅读 下载PDF
容器云环境GPU共享技术研究与实现
17
作者 吴阳阳 吴恒 +1 位作者 唐震 张文博 《广西大学学报(自然科学版)》 北大核心 2026年第1期177-187,共11页
针对传统基于时间分片的图形处理单元(GPU)共享方案中容器在时间片内独占GPU而导致的任务低负载时资源浪费问题,提出一种GPU共享框架(TQShare)。TQShare整合核函数执行时间预测技术与核函数时间配额管理机制,支持多个容器在同一时间片... 针对传统基于时间分片的图形处理单元(GPU)共享方案中容器在时间片内独占GPU而导致的任务低负载时资源浪费问题,提出一种GPU共享框架(TQShare)。TQShare整合核函数执行时间预测技术与核函数时间配额管理机制,支持多个容器在同一时间片内并发执行深度学习任务,从而提升资源利用率;同时,通过对任务启动的核函数进行动态调度管理,实现资源的有效隔离。实验结果表明,与KubeShare相比,TQShare将GPU平均利用率提高了13.4个百分点,深度学习工作负载完成时间缩短了14.7%,且平均性能开销仅为2.41%。 展开更多
关键词 容器 图形处理单元共享 资源隔离 图形处理单元利用率 深度学习
在线阅读 下载PDF
近场三维CZT波束形成算法的GPU实现及性能优化
18
作者 徐浚洋 刘祖延 +2 位作者 于晓阳 周天 陈宝伟 《应用声学》 北大核心 2026年第2期434-443,共10页
针对在面阵波束形成过程中运算量大、难以做到实时成像的问题,文章使用图形处理器(GPU)在Visual Studio2019平台上对三维线性调频Z变换(CZT)波束形成算法进行加速,实现了三维CZT波束形成算法的并行化,从存储结构和对数据的访存等方面进... 针对在面阵波束形成过程中运算量大、难以做到实时成像的问题,文章使用图形处理器(GPU)在Visual Studio2019平台上对三维线性调频Z变换(CZT)波束形成算法进行加速,实现了三维CZT波束形成算法的并行化,从存储结构和对数据的访存等方面进行了针对性的设计,有效地利用了GPU的单指令多线程的特性,这些改进提升了算法的运行效率。通过实测数据显示,对于相同的声呐数据,GPU并行处理的计算效率高于CPU串行处理38倍以上,在采样点数量较少的情况下,三维CZT波束形成算法的计算效率明显优于传统的相移波束形成算法。这些发现证实了该方法在小型声呐设备中的应用前景广阔,具有一定的应用价值。 展开更多
关键词 CZT波束形成 图形处理器 平面阵列 并行计算 算法优化
在线阅读 下载PDF
基于多源信息的机载预警雷达非均匀杂波高保真仿真方法
19
作者 谢锴欣 舒汀 +1 位作者 何劲 郁文贤 《现代雷达》 北大核心 2026年第3期94-100,共7页
针对复杂地理环境下机载预警雷达新体制认知探测性能提升的需求,以及传统杂波建模方法保真性不足、复杂场景仿真效率低下的问题,文中提出一种基于多源先验信息的非均匀杂波高保真建模方法。该方法建立了载机坐标系下杂波块与多源先验信... 针对复杂地理环境下机载预警雷达新体制认知探测性能提升的需求,以及传统杂波建模方法保真性不足、复杂场景仿真效率低下的问题,文中提出一种基于多源先验信息的非均匀杂波高保真建模方法。该方法建立了载机坐标系下杂波块与多源先验信息的通用映射关系,实现了多源先验信息的统一表征和非均匀杂波的高精度建模。在此基础上,结合实测杂波数据对仿真杂波的保真性进行了详细的验证和分析,并基于中央处理器+图形处理器异构平台设计了非均匀杂波的通用快速仿真架构。实验结果表明文中所提的杂波建模方法构建的仿真杂波与实测杂波拟合度良好,具有高保真性,所设计的快速仿真架构显著提升了复杂大场景下非均匀杂波的计算效率,具有较高的工程应用价值。 展开更多
关键词 机载预警雷达 非均匀杂波 多源信息 高保真 图形处理器
原文传递
基于斯托克斯平面近似函数与GPU并行的海洋重力梯度模型计算
20
作者 卜靖宇 叶周润 +3 位作者 梁星辉 刘金钊 柳林涛 王嘉琛 《合肥工业大学学报(自然科学版)》 北大核心 2026年第2期253-259,共7页
相对于其他重力场元素,扰动重力梯度能更多地反映变化的不规则地球产生的高频信息。在计算扰动重力梯度时,由于斯托克斯积分较为复杂导致被积函数复杂难以直接用牛顿-莱布尼茨公式计算、且计算的数据量过于庞大导致计算耗时过长。为有... 相对于其他重力场元素,扰动重力梯度能更多地反映变化的不规则地球产生的高频信息。在计算扰动重力梯度时,由于斯托克斯积分较为复杂导致被积函数复杂难以直接用牛顿-莱布尼茨公式计算、且计算的数据量过于庞大导致计算耗时过长。为有效解决该问题,文章使用高斯数值积分解决被积函数复杂的问题,同时利用统一计算设备架构(compute unified device architecture,CUDA)在计算过程中实现了在图形处理器(graphics processing unit,GPU)端的并行计算,根据拉普拉斯方程可以检验计算结果的准确性,并且选取了某海域3°×2°范围海平面的重力异常数据进行计算。结果表明,使用高斯数值积分以及CUDA并行计算的方法,提供准确计算结果的同时也提高了计算效率。 展开更多
关键词 扰动重力梯度 重力异常 CUDA并行计算 图形处理器(GPU) 高斯数值积分
在线阅读 下载PDF
上一页 1 2 49 下一页 到第
使用帮助 返回顶部