Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integrat...Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integration of cutting-edge technologies with the railway systems,strengthening the research and application of intelligent railway technologies,applying green computing technologies and advancing the collaborative sharing of transportation big data.The high-speed rail system tasks need to process huge amounts of data and heavy workload with the requirement of ultra-fast response.Therefore,it is of great necessity to promote computation efficiency by applying High Performance Computing(HPC)to high-speed rail systems.The HPC technique is a great solution for improving the performance,efficiency,and safety of high-speed rail systems.In this review,we introduce and analyze the application research of high performance computing technology in the field of highspeed railways.These HPC applications are cataloged into four broad categories,namely:fault diagnosis,network and communication,management system,and simulations.Moreover,challenges and issues to be addressed are discussed and further directions are suggested.展开更多
High-Performance Computing(HPC)has advanced tremendously over the past several decades,with immense technological developments revolutionizing the ability to model,simulate,and analyze large amounts of data.Over time,...High-Performance Computing(HPC)has advanced tremendously over the past several decades,with immense technological developments revolutionizing the ability to model,simulate,and analyze large amounts of data.Over time,these HPC systems have become much larger and more complex,featuring multi-/many-core processors with thousands of cores that are often connected to form a large-scale computing system.展开更多
A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method ...A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method need to solve large USLS. The proposed solution method for unsymmetrical case performs factorization processes symmetrically on the upper and lower triangular portion of matrix, which differs from previous work based on general unsymmetrical process, and attains higher performance. It is shown that the solution algorithm for USLS can be simply derived from the existing approaches for the symmetrical case. The new matrix factorization algorithm in our method can be implemented easily by modifying a standard JKI symmetrical matrix factorization code. Multi-blocked out-of-core strategies were also developed to expand the solution scale. The approach convincingly increases the speed of the solution process, which is demonstrated with the numerical tests.展开更多
In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of ...In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of finite element analysis benchmark tests, the MFLOPS (million floating operations per second) of LDL^T factorization of benchmark tests vary on a Dell Pentium IV 850 MHz machine from 100 to 456 depending on the average size of the super-equations, i.e., on the average depth of unrolling. In this paper, a new sparse static solver with two-level unrolling that employs the concept of master-equations and searches for an appropriate depths of unrolling is proposed. The new solver provides higher MFLOPS for LDL^T factorization of benchmark tests, and therefore speeds up the solution process.展开更多
This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on phy...This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on physical factor analysis. At the same time, since the recent developments drive us to think about whether these optical interconnect technologies with higher bandwidth but higher cost are worthy to be deployed, the computing performance comparison is performed. To meet the increasing demand of large-scale parallel or multi-processor computing tasks, an analytic method to evaluate parallel computing performance ofinterconnect systems is proposed in this paper. Both bandwidth-limit model and full-bandwidth model are under our investigation. Speedup and effi ciency are selected to represent the parallel performance of an interconnect system. Deploying the proposed models, we depict the performance gap between the optical and electrically interconnected systems. Another investigation on power consumption of commercial products showed that if the parallel interconnections are deployed, the unit power consumption will be reduced. Therefore, from the analysis of computing influence and power dissipation, we found that parallel optical interconnect is valuable combination of high performance and low energy consumption. Considering the possible data center under construction, huge power could be saved if parallel optical interconnects technologies are used.展开更多
1.Introduction The rapid expansion of satellite constellations in recent years has resulted in the generation of massive amounts of data.This surge in data,coupled with diverse application scenarios,underscores the es...1.Introduction The rapid expansion of satellite constellations in recent years has resulted in the generation of massive amounts of data.This surge in data,coupled with diverse application scenarios,underscores the escalating demand for high-performance computing over space.Computing over space entails the deployment of computational resources on platforms such as satellites to process large-scale data under constraints such as high radiation exposure,restricted power consumption,and minimized weight.展开更多
With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more d...With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more dynamic and more distributed,with higher uncertainty.These new power system behaviors bring significant challenges in power system modeling and simulation as more data need to be analyzed for larger systems and more complex models to be solved in a shorter time period.The conventional computing approaches will not be sufficient for future power systems.This paper provides a historical review of computing for power system operation and planning,discusses technology advancements in high performance computing(HPC),and describes the drivers for employing HPC techniques.Some high performance computing application examples with different HPC techniques,including the latest quantum computing,are also presented to show how HPC techniques can help us be well prepared to meet the requirements of power system computing in a clean energy future.展开更多
As a great power in supporting scientific research,engineering,and many other fields,High Performance Computing plays a more and more important role is the current world.Supercomputers are deployed in many countries a...As a great power in supporting scientific research,engineering,and many other fields,High Performance Computing plays a more and more important role is the current world.Supercomputers are deployed in many countries around the world,and have been used in a wide range of areas.Technology of designing,building,managing,and utilizing high performance computers has been developing dramatically.With the debut of Frontier in ORNL,the world has stepped into the Exascale era.展开更多
Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges of contemporary large scale high performance computing systems.In this paper we present AceMe...Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges of contemporary large scale high performance computing systems.In this paper we present AceMesh,a taskbased,data-driven language extension targeting legacy MPI applications.Its language features include data-centric parallelizing template,aggregated task dependence for parallel loops.These features not only relieve the programmer from tedious refactoring details but also provide possibility for structured execution of complex task graphs,data locality exploitation upon data tile templates,and reducing system complexity incurred by complex array sections.We present the prototype implementation,including task shifting,data management and communication-related analysis and transformations.The language extension is evaluated on two supercomputing platforms.We compare the performance of AceMesh with existing programming models,and the results show that NPB/MG achieves at most 1.2X and 1.85X speedups on TaihuLight and TH-2,respectively,and the Tend_lin benchmark attains more than 2X speedup on average and attain at most 3.0X and 2.2X speedups on the two platforms,respectively.展开更多
Due to their customizable on-chip resources,reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors.They have been widely ado...Due to their customizable on-chip resources,reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors.They have been widely adopted in many important applications,from traditional numerical processing to emerging deep learning systems.Since FPGAs have become promising options for current and future high performance computing,this report summarises and analyses recent FPGA-related efforts,including the latest industrial approaches,the state-of-the-art reconfigurable solutions,and various issues such as on-chip resources and development productivity.展开更多
With the rapid development of high-throughput sequencing technologies,the scale of sequencing data continuously increases at unprecedented speed.In the field of genomics,high performance computing(HPC)is urgently need...With the rapid development of high-throughput sequencing technologies,the scale of sequencing data continuously increases at unprecedented speed.In the field of genomics,high performance computing(HPC)is urgently needed to process these large-scale sequencing data,which uses supercomputers and parallel processing technologies to solve complex computing problems and performs intensive computing operations across massive resources.Nowadays,high performance computing plays an important role in data-driven sciences,and is widely used in genomics research.However,while dealing with massive multi-dimensional genomics data using high performance computing,there are still many challenges which limit the wide applications of HPC,such as high data complexity,huge memory requirements and low parallel computing performance.In this paper,we reviewed the irreplaceable applications of high performance computing in genomics,especially in pangenome,single-cell transcriptome and large-scale population sequencing studies.In future,with the developing methods of hardware acceleration and algorithm optimization,the applications of high performance computing will be more inseparable in complex and large-scale genomics studies.展开更多
High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties ...High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics,quantum chemistry and materials science.This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers,especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers.We first introduce three various types of DFT software developed on modern heterogeneous supercomputers,involving PWDFT(Plane-Wave Density Functional Theory),HONPAS(Hefei Order-N Packages for Ab initio Simulations)and DGDFT(Discontinuous Galerkin Density Functional Theory),respectively based on three different types of basis sets(plane waves,numerical atomic orbitals and adaptive local basis functions).Then,we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail.Finally,we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.展开更多
With the advent of the Big Data era,the demand for storing,processing,and analyzing this vast and growing amount of data has emerged in the market.Since a single node cannot cope with its complexity requirements,high-...With the advent of the Big Data era,the demand for storing,processing,and analyzing this vast and growing amount of data has emerged in the market.Since a single node cannot cope with its complexity requirements,high-performance systems typically operate in a distributed environment.Today,high-performance distributed computing systems are the foundations of many important infrastructures.However,the diverse development of hardware platforms,the spurt of data growth,and the rapid changes in applications are increasingly challenging resource management,energy efficiency,performance tuning,scalability,and fault tolerance.展开更多
GraphX is a graph computing library based on Spark systems,where fault tolerance is a necessary guarantee for the high availability.However,the existing fault tolerance methods are mostly implemented in a pessimistic ...GraphX is a graph computing library based on Spark systems,where fault tolerance is a necessary guarantee for the high availability.However,the existing fault tolerance methods are mostly implemented in a pessimistic way and are aimed at general computing tasks.Considering the characteristics of iterative computation,this paper presents a combination method of the optimistic fault tolerance and checkpoint for recovering the data under different failure conditions.Firstly,for single node failure,we propose the optimistic fault tolerance mechanism based on compensation function.It does not add fault tolerance measures in advance and will not incur additional costs when there are no failures.Secondly,for multiple node failures,we propose the automatic checkpoint management strategy based on RDD importance.It comprehensively considers the factors of lineage length of RDD,dependency relationship,and computation time of RDD,which can set the RDD as the checkpoint properly.Finally,we implement our proposals in GraphX of Spark−3.5.1,and evaluate the performance by using representative iterative graph algorithms on the high performance computing cluster.The results verify the correctness of iteration results of the mechanism,and illustrate that when recovering the RDD partition,the job execution time can be reduced by the mechanism and strategy substantially.展开更多
The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms...The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms.In this paper,we proposed two scheduling algorithms for input queued switches whose operations are based on ranking procedures.At first,we proposed a Simple 2-Bit(S2B)scheme which uses binary ranking procedure and queue size for scheduling the packets.Here,the Virtual Output Queue(VOQ)set with maximum number of empty queues receives higher rank than other VOQ’s.Through simulation,we showed S2B has better throughput performance than Highest Ranking First(HRF)arbitration under uniform,and non-uniform traffic patterns.To further improve the throughput-delay performance,an Enhanced 2-Bit(E2B)approach is proposed.This approach adopts an integer representation for rank,which is the number of empty queues in a VOQ set.The simulation result shows E2B outperforms S2B and HRF scheduling algorithms with maximum throughput-delay performance.Furthermore,the algorithms are simulated under hotspot traffic and E2B proves to be more efficient.展开更多
This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and...This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and hosting,so that if two virtual severs be active on a host and energy load be more on a host,it would allocated the energy of other hosts(virtual host)to itself to stay steady and this option usually leads to hardware overflow errors and users dissatisfaction.This problem has been removed in methods based on cloud processing but not perfectly,therefore,providing an algorithm not only will implement a suitable security background but also it will suitably divide energy consumption and load balancing among virtual severs.The proposed algorithm is compared with several previously proposed Security Strategy including SC-PSSF,PSSF and DEEAC.Comparisons show that the proposed method offers high performance computing,efficiency and consumes lower energy in the network.展开更多
Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And ...Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And one of the reasons of the poor performance is depend on the lack of the mapping capability of the TLB which is a buffer to accelerate the virtual memory access. In this report, I present that the mapping capability and the performance can be improved with the multi granularity TLB feature that some processors have. And I also present that the new TLB handling routine can be incorporated into the demand paging system of Linux.展开更多
The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems a...The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems and it manages the usage of all computing platforms at a time.Federated learning is a collaborative machine learning approach without centralized training data.The proposed system effectively detects the intrusion attack without human intervention and subsequently detects anomalous deviations in device communication behavior,potentially caused by malicious adversaries and it can emerge with new and unknown attacks.The main objective is to learn overall behavior of an intruder while performing attacks to the assumed target service.Moreover,the updated system model is send to the centralized server in jungle computing,to detect their pattern.Federated learning greatly helps the machine to study the type of attack from each device and this technique paves a way to complete dominion over all malicious behaviors.In our proposed work,we have implemented an intrusion detection system that has high accuracy,low False Positive Rate(FPR)scalable,and versatile for the jungle computing environment.The execution time taken to complete a round is less than two seconds,with an accuracy rate of 96%.展开更多
Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect ...Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect the most reasonable costs of computations. While fixed cloud pricing provides an attractive low entry barrier for compute-intensive applications, both the consumer and supplier of computing resources can see high efficiency for their investments by participating in auction-based exchanges. There are huge incentives for the cloud provider to offer auctioned resources. However, from the consumer perspective, using these resources is a sparsely discussed challenge. This paper reports a methodology and framework designed to address the challenges of using HPC (High Performance Computing) applications on auction-based cloud clusters. The authors focus on HPC applications and describe a method for determining bid-aware checkpointing intervals. They extend a theoretical model for determining checkpoint intervals using statistical analysis of pricing histories. Also the latest developments in the SpotHPC framework are introduced which aim at facilitating the managed execution of real MPI applications on auction-based cloud environments. The authors use their model to simulate a set of algorithms with different computing and communication densities. The results show the complex interactions between optimal bidding strategies and parallel applications performance.展开更多
基金supported in part by the Talent Fund of Beijing Jiaotong University(2023XKRC017)in part by Research and Development Project of China State Railway Group Co.,Ltd.(P2022Z003).
文摘Further improving the railway innovation capacity and technological strength is the important goal of the 14th Five-Year Plan for railway scientific and technological innovation.It includes promoting the deep integration of cutting-edge technologies with the railway systems,strengthening the research and application of intelligent railway technologies,applying green computing technologies and advancing the collaborative sharing of transportation big data.The high-speed rail system tasks need to process huge amounts of data and heavy workload with the requirement of ultra-fast response.Therefore,it is of great necessity to promote computation efficiency by applying High Performance Computing(HPC)to high-speed rail systems.The HPC technique is a great solution for improving the performance,efficiency,and safety of high-speed rail systems.In this review,we introduce and analyze the application research of high performance computing technology in the field of highspeed railways.These HPC applications are cataloged into four broad categories,namely:fault diagnosis,network and communication,management system,and simulations.Moreover,challenges and issues to be addressed are discussed and further directions are suggested.
文摘High-Performance Computing(HPC)has advanced tremendously over the past several decades,with immense technological developments revolutionizing the ability to model,simulate,and analyze large amounts of data.Over time,these HPC systems have become much larger and more complex,featuring multi-/many-core processors with thousands of cores that are often connected to form a large-scale computing system.
基金Project supported by the National Natural Science Foundation of China (Nos. 10232040, 10572002 and 10572003)
文摘A new direct method for solving unsymmetrical sparse linear systems(USLS) arising from meshless methods was introduced. Computation of certain meshless methods such as meshless local Petrov-Galerkin (MLPG) method need to solve large USLS. The proposed solution method for unsymmetrical case performs factorization processes symmetrically on the upper and lower triangular portion of matrix, which differs from previous work based on general unsymmetrical process, and attains higher performance. It is shown that the solution algorithm for USLS can be simply derived from the existing approaches for the symmetrical case. The new matrix factorization algorithm in our method can be implemented easily by modifying a standard JKI symmetrical matrix factorization code. Multi-blocked out-of-core strategies were also developed to expand the solution scale. The approach convincingly increases the speed of the solution process, which is demonstrated with the numerical tests.
基金Project supported by the Research Fund for the Doctoral Program of Higher Education (No.20030001112).
文摘In the previous papers, a high performance sparse static solver with two-level unrolling based on a cell-sparse storage scheme was reported. Although the solver reaches quite a high efficiency for a big percentage of finite element analysis benchmark tests, the MFLOPS (million floating operations per second) of LDL^T factorization of benchmark tests vary on a Dell Pentium IV 850 MHz machine from 100 to 456 depending on the average size of the super-equations, i.e., on the average depth of unrolling. In this paper, a new sparse static solver with two-level unrolling that employs the concept of master-equations and searches for an appropriate depths of unrolling is proposed. The new solver provides higher MFLOPS for LDL^T factorization of benchmark tests, and therefore speeds up the solution process.
基金supported in part by National 863 Program (2009AA01Z256,No.2009AA01A345)National 973 Program (2007CB310705)the NSFC (60932004),P.R.China
文摘This paper analyzes the physical potential, computing performance benefi t and power consumption of optical interconnects. Compared with electrical interconnections, optical ones show undoubted advantages based on physical factor analysis. At the same time, since the recent developments drive us to think about whether these optical interconnect technologies with higher bandwidth but higher cost are worthy to be deployed, the computing performance comparison is performed. To meet the increasing demand of large-scale parallel or multi-processor computing tasks, an analytic method to evaluate parallel computing performance ofinterconnect systems is proposed in this paper. Both bandwidth-limit model and full-bandwidth model are under our investigation. Speedup and effi ciency are selected to represent the parallel performance of an interconnect system. Deploying the proposed models, we depict the performance gap between the optical and electrically interconnected systems. Another investigation on power consumption of commercial products showed that if the parallel interconnections are deployed, the unit power consumption will be reduced. Therefore, from the analysis of computing influence and power dissipation, we found that parallel optical interconnect is valuable combination of high performance and low energy consumption. Considering the possible data center under construction, huge power could be saved if parallel optical interconnects technologies are used.
基金supported in part by the National Natural Science Foundation of China(62025404)in part by the National Key Research and Development Program of China(2022YFB3902802)+1 种基金in part by the Beijing Natural Science Foundation(L241013)in part by the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA000000).
文摘1.Introduction The rapid expansion of satellite constellations in recent years has resulted in the generation of massive amounts of data.This surge in data,coupled with diverse application scenarios,underscores the escalating demand for high-performance computing over space.Computing over space entails the deployment of computational resources on platforms such as satellites to process large-scale data under constraints such as high radiation exposure,restricted power consumption,and minimized weight.
基金the support from U.S.Department of Energy through its Advanced Grid Modeling program,Exascale Computing Program(ECP)The Grid Modernization Laboratory Consortium(GMLC)+1 种基金Advanced Research Projects Agency-Energy(ARPA-E),The National Quantum Information Science Research Centers,Co-design Center for Quantum Advantage(C2QA)the Office of Advanced Scientific Computing Research(ASCR).
文摘With the global trend of pursuing clean energy and decarbonization,power systems have been evolving in a fast pace that we have never seen in the history of electrification.This evolution makes the power system more dynamic and more distributed,with higher uncertainty.These new power system behaviors bring significant challenges in power system modeling and simulation as more data need to be analyzed for larger systems and more complex models to be solved in a shorter time period.The conventional computing approaches will not be sufficient for future power systems.This paper provides a historical review of computing for power system operation and planning,discusses technology advancements in high performance computing(HPC),and describes the drivers for employing HPC techniques.Some high performance computing application examples with different HPC techniques,including the latest quantum computing,are also presented to show how HPC techniques can help us be well prepared to meet the requirements of power system computing in a clean energy future.
文摘As a great power in supporting scientific research,engineering,and many other fields,High Performance Computing plays a more and more important role is the current world.Supercomputers are deployed in many countries around the world,and have been used in a wide range of areas.Technology of designing,building,managing,and utilizing high performance computers has been developing dramatically.With the debut of Frontier in ORNL,the world has stepped into the Exascale era.
基金supported by National Key R&D Program of China(Grant No.2017YFB02-02002)the Innovation Research Group of NSFC(Grant No.61521092).
文摘Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges of contemporary large scale high performance computing systems.In this paper we present AceMesh,a taskbased,data-driven language extension targeting legacy MPI applications.Its language features include data-centric parallelizing template,aggregated task dependence for parallel loops.These features not only relieve the programmer from tedious refactoring details but also provide possibility for structured execution of complex task graphs,data locality exploitation upon data tile templates,and reducing system complexity incurred by complex array sections.We present the prototype implementation,including task shifting,data management and communication-related analysis and transformations.The language extension is evaluated on two supercomputing platforms.We compare the performance of AceMesh with existing programming models,and the results show that NPB/MG achieves at most 1.2X and 1.85X speedups on TaihuLight and TH-2,respectively,and the Tend_lin benchmark attains more than 2X speedup on average and attain at most 3.0X and 2.2X speedups on the two platforms,respectively.
基金supported in part by the National Key R&D Program of China(grant no.2016YFA0602200,and 2017YFA0604500)the National Natural Science Foundation of China(grant nos.61672312,41374113,91530323,U1839206,61962051)by Center for High Performance Computing and System Simulation,Pilot National Laboratory for Marine Science and Technology(Qingdao).
文摘Due to their customizable on-chip resources,reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors.They have been widely adopted in many important applications,from traditional numerical processing to emerging deep learning systems.Since FPGAs have become promising options for current and future high performance computing,this report summarises and analyses recent FPGA-related efforts,including the latest industrial approaches,the state-of-the-art reconfigurable solutions,and various issues such as on-chip resources and development productivity.
基金supported by National Key Research Program of China[2017YFC0907503 to J.X.and 2016YFC0901903 to Z.D.]Strategic Priority Research Program of the Chinese Academy of Sciences[XDB38030400 to J.X.]+1 种基金National Natural Science Foundation of China[31970634 and 31771465 to J.X.]CAS Key Technology Talent Program[to Z.D.].
文摘With the rapid development of high-throughput sequencing technologies,the scale of sequencing data continuously increases at unprecedented speed.In the field of genomics,high performance computing(HPC)is urgently needed to process these large-scale sequencing data,which uses supercomputers and parallel processing technologies to solve complex computing problems and performs intensive computing operations across massive resources.Nowadays,high performance computing plays an important role in data-driven sciences,and is widely used in genomics research.However,while dealing with massive multi-dimensional genomics data using high performance computing,there are still many challenges which limit the wide applications of HPC,such as high data complexity,huge memory requirements and low parallel computing performance.In this paper,we reviewed the irreplaceable applications of high performance computing in genomics,especially in pangenome,single-cell transcriptome and large-scale population sequencing studies.In future,with the developing methods of hardware acceleration and algorithm optimization,the applications of high performance computing will be more inseparable in complex and large-scale genomics studies.
基金supported by the National Natural Science Foundation of China(21688102,21803066,22003061,22173093)the Hefei National Laboratory for Physical Sciences at the Microscale(KF2020003)+6 种基金the Chinese Academy of Sciences Pioneer Hundred Talents Program(KJ2340000031)the Anhui Initiative in Quantum Information Technologies(AHY090400)the CAS Project for Young Scientists in Basic Research(YSBR-005)the Strategic Priority Research Program of Chinese Academy of Sciences(XDC01040100)the Fundamental Research Funds for the Central Universities(WK2340000091,WK2060000018)the Hefei National Laboratory for Physical Sciences at the Microscale(SK2340002001)the Research Start-Up Grants(KY2340000094)and the Academic Leading Talents Training Program(KY2340000103)from University of Science and Technology of China.
文摘High performance computing(HPC)plays an essential role in enabling first-principles calculations based on the Kohn–Sham density functional theory(KS-DFT)for investigating quantum structural and electronic properties of large-scale molecules and solids in condensed matter physics,quantum chemistry and materials science.This review focuses on recent advances for HPC software development in large-scale KS-DFT calculations containing tens of thousands of atoms on modern heterogeneous supercomputers,especially for the HPC software with independent intellectual property rights supported on the Chinese domestic exascale supercomputers.We first introduce three various types of DFT software developed on modern heterogeneous supercomputers,involving PWDFT(Plane-Wave Density Functional Theory),HONPAS(Hefei Order-N Packages for Ab initio Simulations)and DGDFT(Discontinuous Galerkin Density Functional Theory),respectively based on three different types of basis sets(plane waves,numerical atomic orbitals and adaptive local basis functions).Then,we describe the theoretical algorithms and parallel implementation of these three software on modern heterogeneous supercomputers in detail.Finally,we conclude this review and propose several promising research fields for future large-scale KS-DFT calculations towards exascale supercomputers.
文摘With the advent of the Big Data era,the demand for storing,processing,and analyzing this vast and growing amount of data has emerged in the market.Since a single node cannot cope with its complexity requirements,high-performance systems typically operate in a distributed environment.Today,high-performance distributed computing systems are the foundations of many important infrastructures.However,the diverse development of hardware platforms,the spurt of data growth,and the rapid changes in applications are increasingly challenging resource management,energy efficiency,performance tuning,scalability,and fault tolerance.
基金supported by the National Key Research and Development Program of China(Grant No.2021YFB0301200)the Hunan Natural Science Foundation Project(Grant No.2023JJ40555)+1 种基金the Hunan Provincial Graduate Student Research and Innovation Project(Grant No.LXBZZ2024035)the Hunan Provincial Department of Education Scientific Research Project(Grant No.22B0451).
文摘GraphX is a graph computing library based on Spark systems,where fault tolerance is a necessary guarantee for the high availability.However,the existing fault tolerance methods are mostly implemented in a pessimistic way and are aimed at general computing tasks.Considering the characteristics of iterative computation,this paper presents a combination method of the optimistic fault tolerance and checkpoint for recovering the data under different failure conditions.Firstly,for single node failure,we propose the optimistic fault tolerance mechanism based on compensation function.It does not add fault tolerance measures in advance and will not incur additional costs when there are no failures.Secondly,for multiple node failures,we propose the automatic checkpoint management strategy based on RDD importance.It comprehensively considers the factors of lineage length of RDD,dependency relationship,and computation time of RDD,which can set the RDD as the checkpoint properly.Finally,we implement our proposals in GraphX of Spark−3.5.1,and evaluate the performance by using representative iterative graph algorithms on the high performance computing cluster.The results verify the correctness of iteration results of the mechanism,and illustrate that when recovering the RDD partition,the job execution time can be reduced by the mechanism and strategy substantially.
文摘The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications.These switching fabrics are efficiently driven by the deployed scheduling algorithms.In this paper,we proposed two scheduling algorithms for input queued switches whose operations are based on ranking procedures.At first,we proposed a Simple 2-Bit(S2B)scheme which uses binary ranking procedure and queue size for scheduling the packets.Here,the Virtual Output Queue(VOQ)set with maximum number of empty queues receives higher rank than other VOQ’s.Through simulation,we showed S2B has better throughput performance than Highest Ranking First(HRF)arbitration under uniform,and non-uniform traffic patterns.To further improve the throughput-delay performance,an Enhanced 2-Bit(E2B)approach is proposed.This approach adopts an integer representation for rank,which is the number of empty queues in a VOQ set.The simulation result shows E2B outperforms S2B and HRF scheduling algorithms with maximum throughput-delay performance.Furthermore,the algorithms are simulated under hotspot traffic and E2B proves to be more efficient.
文摘This paper proposes algorithm for Increasing Virtual Machine Security Strategy in Cloud Computing computations.Imbalance between load and energy has been one of the disadvantages of old methods in providing server and hosting,so that if two virtual severs be active on a host and energy load be more on a host,it would allocated the energy of other hosts(virtual host)to itself to stay steady and this option usually leads to hardware overflow errors and users dissatisfaction.This problem has been removed in methods based on cloud processing but not perfectly,therefore,providing an algorithm not only will implement a suitable security background but also it will suitably divide energy consumption and load balancing among virtual severs.The proposed algorithm is compared with several previously proposed Security Strategy including SC-PSSF,PSSF and DEEAC.Comparisons show that the proposed method offers high performance computing,efficiency and consumes lower energy in the network.
文摘Today the PC class machines are quite popular for HPC area, especially on the problemsthat require the good cost/performance ratios. One of the drawback of these machines is the poormemory throughput performance. And one of the reasons of the poor performance is depend on the lack of the mapping capability of the TLB which is a buffer to accelerate the virtual memory access. In this report, I present that the mapping capability and the performance can be improved with the multi granularity TLB feature that some processors have. And I also present that the new TLB handling routine can be incorporated into the demand paging system of Linux.
文摘The integration of clusters,grids,clouds,edges and other computing platforms result in contemporary technology of jungle computing.This novel technique has the aptitude to tackle high performance computation systems and it manages the usage of all computing platforms at a time.Federated learning is a collaborative machine learning approach without centralized training data.The proposed system effectively detects the intrusion attack without human intervention and subsequently detects anomalous deviations in device communication behavior,potentially caused by malicious adversaries and it can emerge with new and unknown attacks.The main objective is to learn overall behavior of an intruder while performing attacks to the assumed target service.Moreover,the updated system model is send to the centralized server in jungle computing,to detect their pattern.Federated learning greatly helps the machine to study the type of attack from each device and this technique paves a way to complete dominion over all malicious behaviors.In our proposed work,we have implemented an intrusion detection system that has high accuracy,low False Positive Rate(FPR)scalable,and versatile for the jungle computing environment.The execution time taken to complete a round is less than two seconds,with an accuracy rate of 96%.
基金"This paper is an extended version of "SpotMPl: a framework for auction-based HPC computing using amazon spot instances" published in the International Symposium on Advances of Distributed Computing and Networking (ADCN 2011).Acknowledgment This research is supported in part by the National Science Foundation grant CNS 0958854 and educational resource grants from Amazon.com.
文摘Cloud computing is expanding widely in the world of IT infrastructure. This is due partly to the cost-saving effect of economies of scale. Fair market conditions can in theory provide a healthy environment to reflect the most reasonable costs of computations. While fixed cloud pricing provides an attractive low entry barrier for compute-intensive applications, both the consumer and supplier of computing resources can see high efficiency for their investments by participating in auction-based exchanges. There are huge incentives for the cloud provider to offer auctioned resources. However, from the consumer perspective, using these resources is a sparsely discussed challenge. This paper reports a methodology and framework designed to address the challenges of using HPC (High Performance Computing) applications on auction-based cloud clusters. The authors focus on HPC applications and describe a method for determining bid-aware checkpointing intervals. They extend a theoretical model for determining checkpoint intervals using statistical analysis of pricing histories. Also the latest developments in the SpotHPC framework are introduced which aim at facilitating the managed execution of real MPI applications on auction-based cloud environments. The authors use their model to simulate a set of algorithms with different computing and communication densities. The results show the complex interactions between optimal bidding strategies and parallel applications performance.