A parallel algorithm of circulation numerical model based on message passing interface(MPI) is developed using serialization and an irregular rectangle decomposition scheme. Neighboring point exchange strategy(NPES...A parallel algorithm of circulation numerical model based on message passing interface(MPI) is developed using serialization and an irregular rectangle decomposition scheme. Neighboring point exchange strategy(NPES) is adopted to further enhance the computational efficiency. Two experiments are conducted on HP C7000 Blade System, the numerical results show that the parallel version with NPES(PVN) produces higher efficiency than the original parallel version(PV). The PVN achieves parallel efficiency in excess of 0.9 in the second experiment when the number of processors increases to 100, while the efficiency of PV decreases to 0.39 rapidly. The PVN of ocean circulation model is used in a fine-resolution regional simulation, which produces better results. The capability of universal implementation of this algorithm makes it applicable in many other ocean models potentially.展开更多
Spiking Neural Network(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial intelligence.Recently,GPU-based simulators have ...Spiking Neural Network(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial intelligence.Recently,GPU-based simulators have been developed to support the real-time simulation of SNN.However,these simulators’simulating performance and scale are severely limited,due to the random memory access pattern and the global communication between devices.Therefore,we propose an efficient distributed heterogeneous SNN simulator based on the Sunway accelerators(including SW26010 and SW26010pro),named SWsnn,which supports accurate simulation with small time step(1/16 ms),randomly delay sizes for synapses,and larger scale network computing.Compared with existing GPUs,the Local Dynamic Memory(LDM)(similar to cache)in Sunway is much bigger(4 MB or 16 MB in each core group).To improve the simulation performance,we redesign the network data storage structure and the synaptic plasticity flow to make most random accesses occur in LDM.SWsnn hides Message Passing Interface(MPI)-related operations to reduce communication costs by separating SNN general workflow.Besides,SWsnn relies on parallel Compute Processing Elements(CPEs)rather than serial Manage Processing Element(MPE)to control the communicating buffers,using Register-Level Communication(RLC)and Direct Memory Access(DMA).In addition,SWsnn is further optimized using vectorization and DMA hiding techniques.Experimental results show that SWsnn runs 1.4−2.2 times faster than state-of-the-art GPU-based SNN simulator GPU-enhanced Neuronal Networks(GeNN),and supports much larger scale real-time simulation.展开更多
High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and s...High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems.展开更多
基金The National High Technology Research and Development Program(863 Program)of China under contract No.2013AA09A505
文摘A parallel algorithm of circulation numerical model based on message passing interface(MPI) is developed using serialization and an irregular rectangle decomposition scheme. Neighboring point exchange strategy(NPES) is adopted to further enhance the computational efficiency. Two experiments are conducted on HP C7000 Blade System, the numerical results show that the parallel version with NPES(PVN) produces higher efficiency than the original parallel version(PV). The PVN achieves parallel efficiency in excess of 0.9 in the second experiment when the number of processors increases to 100, while the efficiency of PV decreases to 0.39 rapidly. The PVN of ocean circulation model is used in a fine-resolution regional simulation, which produces better results. The capability of universal implementation of this algorithm makes it applicable in many other ocean models potentially.
基金supported by the Key Research and Development Project of Guangdong Province(No.2021B0101310002)the National Key Research and Development Program of China(No.2021YFF1200104)+4 种基金the Strategic Priority CAS Project(No.XDB38050100)the National Natural Science Foundation of China(No.62272449)the Shenzhen Basic Research Fund(Nos.RCYX20200714114734194,JCYJ20210324102007021,and KQTD20200820113106007)the Open Fund of Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ)(No.GML-KF-22-13)the Shenzhen Key Laboratory of Intelligent Bioinformatics(No.ZDSYS20220422103800001).
文摘Spiking Neural Network(SNN)simulation is very important for studying brain function and validating the hypotheses for neuroscience,and it can also be used in artificial intelligence.Recently,GPU-based simulators have been developed to support the real-time simulation of SNN.However,these simulators’simulating performance and scale are severely limited,due to the random memory access pattern and the global communication between devices.Therefore,we propose an efficient distributed heterogeneous SNN simulator based on the Sunway accelerators(including SW26010 and SW26010pro),named SWsnn,which supports accurate simulation with small time step(1/16 ms),randomly delay sizes for synapses,and larger scale network computing.Compared with existing GPUs,the Local Dynamic Memory(LDM)(similar to cache)in Sunway is much bigger(4 MB or 16 MB in each core group).To improve the simulation performance,we redesign the network data storage structure and the synaptic plasticity flow to make most random accesses occur in LDM.SWsnn hides Message Passing Interface(MPI)-related operations to reduce communication costs by separating SNN general workflow.Besides,SWsnn relies on parallel Compute Processing Elements(CPEs)rather than serial Manage Processing Element(MPE)to control the communicating buffers,using Register-Level Communication(RLC)and Direct Memory Access(DMA).In addition,SWsnn is further optimized using vectorization and DMA hiding techniques.Experimental results show that SWsnn runs 1.4−2.2 times faster than state-of-the-art GPU-based SNN simulator GPU-enhanced Neuronal Networks(GeNN),and supports much larger scale real-time simulation.
基金supported by the National High Technology Research and Development 863 Program of China under Grant No. 2007AA01Z117the National Basic Research 973 Program of China under Grant No.2007CB310900
文摘High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems.