Reverse time migration (RTM) is an indispensable but computationally intensive seismic exploration technique. Graphics processing units (GPUs) by NVIDIA■offer the option for parallel computations and speed improvemen...Reverse time migration (RTM) is an indispensable but computationally intensive seismic exploration technique. Graphics processing units (GPUs) by NVIDIA■offer the option for parallel computations and speed improvements in such high-density processes. With increasing seismic imaging space, the problems associated with multi-GPU techniques need to be addressed. We propose an efficient scheme for multi-GPU programming based on the features of the compute-unified device Architecture (CUDA) using GPU hardware, including concurrent kernel execution, CUDA streams, and peer-to-peer (P2P) communication between the different GPUs. In addition, by adjusting the computing time for imaging during RTM, the data communication times between GPUs become negligible. This means that the overall computation effi ciency improves linearly, as the number of GPUs increases. We introduce the multi-GPU scheme by using the acoustic wave propagation and then describe the implementation of RTM in tilted transversely isotropic (TTI) media. Next, we compare the multi-GPU and the unifi ed memory schemes. The results suggest that the proposed multi- GPU scheme is superior and, with increasing number of GPUs, the computational effi ciency improves linearly.展开更多
The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence si...The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers.展开更多
近年来,大语言模型(large language models,LLMs)在自然语言处理等领域展现出卓越性能。然而,在显存受限的工业环境中,将LLMs扩展至多个下游任务时常难以兼顾资源消耗与性能表现之间的平衡,进而限制了其更广泛的应用。为此,提出了一种...近年来,大语言模型(large language models,LLMs)在自然语言处理等领域展现出卓越性能。然而,在显存受限的工业环境中,将LLMs扩展至多个下游任务时常难以兼顾资源消耗与性能表现之间的平衡,进而限制了其更广泛的应用。为此,提出了一种结构紧凑的多专家协同架构(compact llm with collaboration of experts,CCoE)。该架构采用模块化设计,能够高效且灵活地将多个领域专家集成至统一的LLMs中,在保证性能的同时显著降低多专家部署的显存开销。此外,CCoE通过引人基于规则的门控机制与专家规划模块,实现了任务的精准分配与专家间的协同,从而有效支持复杂推理任务。在5个领域数据集上的实验证明,CCoE在各项任务中均达到了与现有垂类LLMs相当的表现。此外,相比于现有模型集成方法,CCoE在保持性能的前提下将显存占用量降低了61.3%,并较参数高效的多专家集成方法提升了76.4%的推理吞吐量。展开更多
基金supported by the National Key R&D Program of China(2017YFC0602204-01)NSFC(Grant Nos.41530321 and 41104083)
文摘Reverse time migration (RTM) is an indispensable but computationally intensive seismic exploration technique. Graphics processing units (GPUs) by NVIDIA■offer the option for parallel computations and speed improvements in such high-density processes. With increasing seismic imaging space, the problems associated with multi-GPU techniques need to be addressed. We propose an efficient scheme for multi-GPU programming based on the features of the compute-unified device Architecture (CUDA) using GPU hardware, including concurrent kernel execution, CUDA streams, and peer-to-peer (P2P) communication between the different GPUs. In addition, by adjusting the computing time for imaging during RTM, the data communication times between GPUs become negligible. This means that the overall computation effi ciency improves linearly, as the number of GPUs increases. We introduce the multi-GPU scheme by using the acoustic wave propagation and then describe the implementation of RTM in tilted transversely isotropic (TTI) media. Next, we compare the multi-GPU and the unifi ed memory schemes. The results suggest that the proposed multi- GPU scheme is superior and, with increasing number of GPUs, the computational effi ciency improves linearly.
文摘The present research attempted a Large-Eddy Simulation (LES) of airflow over a steep, three-dimensional isolated hill by using the latest multi-cores multi-CPUs systems. As a result, it was found that 1) turbulence simulations using approximately 50 million grid points are feasible and 2) the use of this system resulted in the achievement of a high computation speed, which exceeded the speed of parallel computation attained by a single CPU on one of the latest supercomputers. Furthermore, LES was conducted by using the multi-GPUs systems. The results of these simulations revealed the following findings: 1) the multi-GPUs environment which used the NVDIA? Tesla M2090 or the M2075 could simulate turbulence in a model with as many as approximately 50 million grid points. 2) The computation speed achieved by the multi-GPUs environments exceeded that by parallel computation which used four to six CPUs of one of the latest supercomputers.
文摘近年来,大语言模型(large language models,LLMs)在自然语言处理等领域展现出卓越性能。然而,在显存受限的工业环境中,将LLMs扩展至多个下游任务时常难以兼顾资源消耗与性能表现之间的平衡,进而限制了其更广泛的应用。为此,提出了一种结构紧凑的多专家协同架构(compact llm with collaboration of experts,CCoE)。该架构采用模块化设计,能够高效且灵活地将多个领域专家集成至统一的LLMs中,在保证性能的同时显著降低多专家部署的显存开销。此外,CCoE通过引人基于规则的门控机制与专家规划模块,实现了任务的精准分配与专家间的协同,从而有效支持复杂推理任务。在5个领域数据集上的实验证明,CCoE在各项任务中均达到了与现有垂类LLMs相当的表现。此外,相比于现有模型集成方法,CCoE在保持性能的前提下将显存占用量降低了61.3%,并较参数高效的多专家集成方法提升了76.4%的推理吞吐量。