Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,...Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.展开更多
To address the issues of head-of-line(HOL)blocking at the virtual output queue(VOQ)level,packet loss,and congestion spreading caused by buffer overflow in the shared-buffer-based combined input and output queued(CIOQ)...To address the issues of head-of-line(HOL)blocking at the virtual output queue(VOQ)level,packet loss,and congestion spreading caused by buffer overflow in the shared-buffer-based combined input and output queued(CIOQ)switching architecture,while enhancing its performance and stability,we propose a de-blocking adaptive feedback control(AFC)design in this study.The introduction of the credit timeout detection mechanism(CTDM)enables the CIOQ to achieve theoretical 100%non-blocking state,effectively eliminating the impact of HOL blocking.With the combined effect of the proposed VOQ dynamic regulation algorithm(VDRA)and threshold dynamic adaptive algorithm(TDAA),it can reduce the risk of congestion spreading caused by buffer overflow and consequently improve the overall performance of the system.Both theoretical analysis and experimental results demonstrate that,under typical traffic conditions,the proposed design achieves a maximum throughput of 1499.66 Gb/s and a minimum latency of 83 ns.Additionally,the effective throughput ratio reaches 96.94%,with a data link layer packet(DLLP)loss ratio of merely 0.61%and a packet loss rate as low as 0.6%.In comparison with traditional CIOQ and input queued(IQ)switch architectures,the proposed design demonstrates improvements in throughput by 15.12%and 20.55%,and forwarding latency is reduced by 26.9%and 54.7%,respectively,and the system stability is stronger,which can fully satisfy the demand for data exchange in complex situations.展开更多
文摘Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.
基金supported by the National Key Research and Development Program of China(No.2022YFB4500900).
文摘To address the issues of head-of-line(HOL)blocking at the virtual output queue(VOQ)level,packet loss,and congestion spreading caused by buffer overflow in the shared-buffer-based combined input and output queued(CIOQ)switching architecture,while enhancing its performance and stability,we propose a de-blocking adaptive feedback control(AFC)design in this study.The introduction of the credit timeout detection mechanism(CTDM)enables the CIOQ to achieve theoretical 100%non-blocking state,effectively eliminating the impact of HOL blocking.With the combined effect of the proposed VOQ dynamic regulation algorithm(VDRA)and threshold dynamic adaptive algorithm(TDAA),it can reduce the risk of congestion spreading caused by buffer overflow and consequently improve the overall performance of the system.Both theoretical analysis and experimental results demonstrate that,under typical traffic conditions,the proposed design achieves a maximum throughput of 1499.66 Gb/s and a minimum latency of 83 ns.Additionally,the effective throughput ratio reaches 96.94%,with a data link layer packet(DLLP)loss ratio of merely 0.61%and a packet loss rate as low as 0.6%.In comparison with traditional CIOQ and input queued(IQ)switch architectures,the proposed design demonstrates improvements in throughput by 15.12%and 20.55%,and forwarding latency is reduced by 26.9%and 54.7%,respectively,and the system stability is stronger,which can fully satisfy the demand for data exchange in complex situations.