Traditional aerodynamic optimization coupled with computational fluid dynamics is associated with a high computational cost.Surrogate models based on deep learning methods can rapidly predict flow fields from the grid...Traditional aerodynamic optimization coupled with computational fluid dynamics is associated with a high computational cost.Surrogate models based on deep learning methods can rapidly predict flow fields from the grid input but often suffer from poor accuracy and generalizability.This study introduces a modified Fourier neural operator for flow field prediction.Unlike most convolution-based models,the Fourier neural operator learns the solution operator directly in the function space,enhancing predictive accuracy and generalizability.The proposed model incorporates a shallow feature extractor,a boundary variable finetuner,and several physical priors,including the initial flow field and boundary conditions.The model is trained on uniformly parameterized algebraic grids to accelerate grid generation in aerodynamic optimization.The prediction error for the flow field and force coefficients on the validation and test sets is reduced by 70%to 90%compared with that of the previous convolutional model.The proposed model can make precise predictions for supercritical airfoils under typical working conditions,with a drag coefficient error of approximately 1 drag count on the validation set,and generalizes better than previous convolution-based methods do on extrapolative inflow conditions and airfoils.展开更多
In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adopt...In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adoption of algorithms and applications such as BERT,ChatGPT,and DeepSeek,has sparked significant interest in this field.We classify LLMs into encoder-only,encoder-decoder,and decoder-only models,and briefly analyze their training and inference processes to emphasize their substantial need for computational resources.These operations depend heavily on Alspecific accelerators like GPUs(graphics processing units),TPUs(tensor processing units),and MLUs(machine learning units).However,as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators,it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs.We delve into the execution and scheduling of LLM algorithms,underlining the critical role of distributed computing strategies,memory management enhancements,and boosting computational efficiency.This paper clarifies the complex relationship between algorithm design,hardware infrastructure,and software optimization,and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training,offering insights into the challenges and potential avenues for future development and deployment.展开更多
The emergence of supercomputers has brought rapid development to human life and scientific research.Today,the new wave of artificial intelligence(AI)not only brings convenience to people's lives,but also changes t...The emergence of supercomputers has brought rapid development to human life and scientific research.Today,the new wave of artificial intelligence(AI)not only brings convenience to people's lives,but also changes the engineering and scientific high-performance computation.AI technologies provide more efficient and accurate computing methods for many fields.These ongoing changes pose new challenges to the design of computing infrastructures,which will be addressed in this survey in details.This survey first describes the distinguished progress of combining AI and high-performance computing(HPC)in scientific computation,analyzes several typical scenarios,and summarizes the characteristics of the corresponding requirements of computing resources.On this basis,this survey further lists four general methods for integrating AI computing with conventional HPC,as well as their key features and application scenarios.Finally,this survey introduces the design strategy of the Peng Cheng Cloud Brain II Supercomputing Center in improving AI computing capability and cluster communication efficiency,which helped it won the first place in the IO500 and AIPerf rankings.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.U23A2069,12372288,12388101,and 92152301)the Jilin Province Science and Technology Development Program,China(Grant No.20220301013GX)。
文摘Traditional aerodynamic optimization coupled with computational fluid dynamics is associated with a high computational cost.Surrogate models based on deep learning methods can rapidly predict flow fields from the grid input but often suffer from poor accuracy and generalizability.This study introduces a modified Fourier neural operator for flow field prediction.Unlike most convolution-based models,the Fourier neural operator learns the solution operator directly in the function space,enhancing predictive accuracy and generalizability.The proposed model incorporates a shallow feature extractor,a boundary variable finetuner,and several physical priors,including the initial flow field and boundary conditions.The model is trained on uniformly parameterized algebraic grids to accelerate grid generation in aerodynamic optimization.The prediction error for the flow field and force coefficients on the validation and test sets is reduced by 70%to 90%compared with that of the previous convolutional model.The proposed model can make precise predictions for supercritical airfoils under typical working conditions,with a drag coefficient error of approximately 1 drag count on the validation set,and generalizes better than previous convolution-based methods do on extrapolative inflow conditions and airfoils.
基金supported by the National Natural Science Foundation of China under Grant Nos.61925208,U22A2028,62302483,62222214,62341411,62102399,and 62372436the Chinese Academy of Sciences(CAS)Project for Young Scientists in Basic Research under Grant No.YSBR-029the Youth Innovation Promotion Association of CAS,and Xplore Prize.
文摘In this paper,we present a comprehensive overview of artificial intelligence(AI)computing systems for large language models(LLMs)training.The rapid advancement of LLMs in recent years,coupled with the widespread adoption of algorithms and applications such as BERT,ChatGPT,and DeepSeek,has sparked significant interest in this field.We classify LLMs into encoder-only,encoder-decoder,and decoder-only models,and briefly analyze their training and inference processes to emphasize their substantial need for computational resources.These operations depend heavily on Alspecific accelerators like GPUs(graphics processing units),TPUs(tensor processing units),and MLUs(machine learning units).However,as the gap widens between the increasing complexity of LLMs and the current capabilities of accelerators,it becomes essential to adopt heterogeneous computing systems optimized for distributed environments to manage the growing computational and memory requirements of LLMs.We delve into the execution and scheduling of LLM algorithms,underlining the critical role of distributed computing strategies,memory management enhancements,and boosting computational efficiency.This paper clarifies the complex relationship between algorithm design,hardware infrastructure,and software optimization,and provides an in-depth understanding of both the software and hardware infrastructure supporting LLMs training,offering insights into the challenges and potential avenues for future development and deployment.
文摘The emergence of supercomputers has brought rapid development to human life and scientific research.Today,the new wave of artificial intelligence(AI)not only brings convenience to people's lives,but also changes the engineering and scientific high-performance computation.AI technologies provide more efficient and accurate computing methods for many fields.These ongoing changes pose new challenges to the design of computing infrastructures,which will be addressed in this survey in details.This survey first describes the distinguished progress of combining AI and high-performance computing(HPC)in scientific computation,analyzes several typical scenarios,and summarizes the characteristics of the corresponding requirements of computing resources.On this basis,this survey further lists four general methods for integrating AI computing with conventional HPC,as well as their key features and application scenarios.Finally,this survey introduces the design strategy of the Peng Cheng Cloud Brain II Supercomputing Center in improving AI computing capability and cluster communication efficiency,which helped it won the first place in the IO500 and AIPerf rankings.