Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of mu...Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni- and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.展开更多
With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycent...With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking.展开更多
The role the quantum entanglement plays in quantum computation speedup has been widely disputed. Some believe that quantum computation's speedup over classical computation is impossible if entan-glement is absent,...The role the quantum entanglement plays in quantum computation speedup has been widely disputed. Some believe that quantum computation's speedup over classical computation is impossible if entan-glement is absent,while others claim that the presence of entanglement is not a necessary condition for some quantum algorithms. This paper discusses this problem systematically. Simulating quantum computation with classical resources is analyzed and entanglement in known algorithms is reviewed. It is concluded that the presence of entanglement is a necessary but not sufficient condition in the pure state or pseudo-pure state quantum computation speedup. The case with the mixed state remains open. Further work on quantum computation will benefit from the presented results.展开更多
Row fixation is a parallel algorithm based on MPI that can be implemented on high performance computer system. It keeps the characteristics of matrices since row-computations are fixed on different nodes. Therefore t...Row fixation is a parallel algorithm based on MPI that can be implemented on high performance computer system. It keeps the characteristics of matrices since row-computations are fixed on different nodes. Therefore the locality of computation is realized effectively and the acceleration ratio is obtained very well for large scale parallel computations such as solving linear equations using Gaussian reduction method, LU decomposition of matrices and m-th power of matrices.展开更多
文摘Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni- and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.
基金supported in part by the U.S.National Science Foundation under Grant Nos.CCF-2029014 and CCF-2008907.
文摘With the surge of big data applications and the worsening of the memory-wall problem,the memory system,instead of the computing unit,becomes the commonly recognized major concern of computing.However,this“memorycentric”common understanding has a humble beginning.More than three decades ago,the memory-bounded speedup model is the first model recognizing memory as the bound of computing and provided a general bound of speedup and a computing-memory trade-off formulation.The memory-bounded model was well received even by then.It was immediately introduced in several advanced computer architecture and parallel computing textbooks in the 1990’s as a must-know for scalable computing.These include Prof.Kai Hwang’s book“Scalable Parallel Computing”in which he introduced the memory-bounded speedup model as the Sun-Ni’s Law,parallel with the Amdahl’s Law and the Gustafson’s Law.Through the years,the impacts of this model have grown far beyond parallel processing and into the fundamental of computing.In this article,we revisit the memory-bounded speedup model and discuss its progress and impacts in depth to make a unique contribution to this special issue,to stimulate new solutions for big data applications,and to promote data-centric thinking and rethinking.
基金Supported by the National Natural Science Foundation for Distinguished Young Scholars of China (Grant No. 60625204)the Key Project of the National Natural Science Foundation of China (Grant No. 60496324)+2 种基金the National 973 Fundamental Research and Development Program of China (Grant No. 2002CB312004)the National 863 High-Tech Project of China (Grant No. 2006AA01Z155)the Knowledge Innovation Program of the Chinese Academy of Sciences and MADIS
文摘The role the quantum entanglement plays in quantum computation speedup has been widely disputed. Some believe that quantum computation's speedup over classical computation is impossible if entan-glement is absent,while others claim that the presence of entanglement is not a necessary condition for some quantum algorithms. This paper discusses this problem systematically. Simulating quantum computation with classical resources is analyzed and entanglement in known algorithms is reviewed. It is concluded that the presence of entanglement is a necessary but not sufficient condition in the pure state or pseudo-pure state quantum computation speedup. The case with the mixed state remains open. Further work on quantum computation will benefit from the presented results.
文摘基于响应系数的数值模拟是在港湾环境容量评估中的常用方法之一,但目前常见的海洋模型中没有可同时计算多个释放点的响应系数场且互不干扰的示踪物模块。针对响应系数法的特点,本研究对三维水动力海洋数值模型FVCOM(Finite-Volume Community Ocean Model)的示踪物模块(dyeing tracking,DYE)进行改进,在模型原有DYE模块的基础上增加多个功能与原DYE模块相同的独立模块,即并行计算多个DYE模块,使FVCOM能够同时计算多个互不干扰的保守示踪物模块。以一个理想地形矩形案例和一个象山港理想地形案例进行了测试。结果显示,改进算法模拟的多点源示踪物平流扩散过程互不影响,且模拟的响应系数场与传统算法一致;相较于传统算法,改进算法的计算过程耗时更短,对理想矩形案例的计算效率最高提升了85%,对象山港案例最高提升了78%;在并行运算的条件下,改进算法对CPU进程的利用率更高。使用改进后的DYE计算响应系数场可以缩短海洋环境容量评估的整体用时。
文摘Row fixation is a parallel algorithm based on MPI that can be implemented on high performance computer system. It keeps the characteristics of matrices since row-computations are fixed on different nodes. Therefore the locality of computation is realized effectively and the acceleration ratio is obtained very well for large scale parallel computations such as solving linear equations using Gaussian reduction method, LU decomposition of matrices and m-th power of matrices.