针对油浸式变压器2维流-热耦合仿真计算效率低的问题,提出了基于混合有限元法的并行计算方法。首先,在Visual Studio 2019中采用C++语言实现无量纲最小二乘有限元法以及迎风有限元法的串行计算方法。然后,基于图形处理器(graphic proces...针对油浸式变压器2维流-热耦合仿真计算效率低的问题,提出了基于混合有限元法的并行计算方法。首先,在Visual Studio 2019中采用C++语言实现无量纲最小二乘有限元法以及迎风有限元法的串行计算方法。然后,基于图形处理器(graphic processing unit,GPU)实现流体场的并行计算,针对单分区分匝模型对比分析了不同GPU卡在不同网格条件下的并行计算效率,分析结果表明数据规模越大,GPU卡流处理器越多并行效果越好。其次,基于Intel MKL(Intel math kernel library)函数库结合共享存储并行编程(open multi-processing,OpenMP)实现了2维温度场的并行计算,并对比分析了不同网格数量对并行效率的影响。最后,在此基础上提出了根据不同仿真条件的混合并行计算方法,并应用到大型油浸式变压器绕组模型的2维温升热点分析中。结果表明,相较于串行程序,混合有限元并行计算方法的加速比达到了69.5,实验测试结果进一步验证了并行计算结果的准确性,研究成果为大型油浸式变压器流-热耦合问题的快速计算奠定了基础。展开更多
Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of mu...Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni- and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.展开更多
With the rise of image data and increased complexity of tasks in edge detection, conventional artificial intelligence techniques have been severely impacted. To be able to solve even greater problems of the future, le...With the rise of image data and increased complexity of tasks in edge detection, conventional artificial intelligence techniques have been severely impacted. To be able to solve even greater problems of the future, learning algorithms must maintain high speed and accuracy through economical means. Traditional edge detection approaches cannot detect edges in images in a timely manner due to memory and computational time constraints. In this work, a novel parallelized ant colony optimization technique in a distributed framework provided by the Hadoop/Map-Reduce infrastructure is proposed to improve the edge detection capabilities. Moreover, a filtering technique is applied to reduce the noisy background of images to achieve significant improvement in the accuracy of edge detection. Close examinations of the implementation of the proposed algorithm are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.展开更多
文摘基于响应系数的数值模拟是在港湾环境容量评估中的常用方法之一,但目前常见的海洋模型中没有可同时计算多个释放点的响应系数场且互不干扰的示踪物模块。针对响应系数法的特点,本研究对三维水动力海洋数值模型FVCOM(Finite-Volume Community Ocean Model)的示踪物模块(dyeing tracking,DYE)进行改进,在模型原有DYE模块的基础上增加多个功能与原DYE模块相同的独立模块,即并行计算多个DYE模块,使FVCOM能够同时计算多个互不干扰的保守示踪物模块。以一个理想地形矩形案例和一个象山港理想地形案例进行了测试。结果显示,改进算法模拟的多点源示踪物平流扩散过程互不影响,且模拟的响应系数场与传统算法一致;相较于传统算法,改进算法的计算过程耗时更短,对理想矩形案例的计算效率最高提升了85%,对象山港案例最高提升了78%;在并行运算的条件下,改进算法对CPU进程的利用率更高。使用改进后的DYE计算响应系数场可以缩短海洋环境容量评估的整体用时。
文摘针对油浸式变压器2维流-热耦合仿真计算效率低的问题,提出了基于混合有限元法的并行计算方法。首先,在Visual Studio 2019中采用C++语言实现无量纲最小二乘有限元法以及迎风有限元法的串行计算方法。然后,基于图形处理器(graphic processing unit,GPU)实现流体场的并行计算,针对单分区分匝模型对比分析了不同GPU卡在不同网格条件下的并行计算效率,分析结果表明数据规模越大,GPU卡流处理器越多并行效果越好。其次,基于Intel MKL(Intel math kernel library)函数库结合共享存储并行编程(open multi-processing,OpenMP)实现了2维温度场的并行计算,并对比分析了不同网格数量对并行效率的影响。最后,在此基础上提出了根据不同仿真条件的混合并行计算方法,并应用到大型油浸式变压器绕组模型的2维温升热点分析中。结果表明,相较于串行程序,混合有限元并行计算方法的加速比达到了69.5,实验测试结果进一步验证了并行计算结果的准确性,研究成果为大型油浸式变压器流-热耦合问题的快速计算奠定了基础。
文摘Along with the unbounded speedup and exponential growth of virtual queues requirement aiming for 100% throughput of multicast scheduling as the size of the high-speed switches scale, the issues of low throughput of multicast under non-speedup or fixed crosspoint buffer size is addressed. Inspired by the load balance two-stage Birkhoff-von Neumann architecture that can provide 100% throughput for all kinds of unicast traffic, a novel 3-stage architecture, consisting of the first stage for multicast fan-out splitting, the second stage for load balancing, and the last stage for switching (FSLBS) is proposed. And the dedicated multicast fan-out splitting to unicast (M2U) scheduling algorithm is developed for the first stage, while the scheduling algorithms in the last two stages adopt the periodic permutation matrix. FSLBS can achieve 100% throughput for integrated uni- and multicast traffic without speedup employing the dedicated M2U and periodic permutation matrix scheduling algorithm. The operation is theoretically validated adopting the fluid model.
文摘With the rise of image data and increased complexity of tasks in edge detection, conventional artificial intelligence techniques have been severely impacted. To be able to solve even greater problems of the future, learning algorithms must maintain high speed and accuracy through economical means. Traditional edge detection approaches cannot detect edges in images in a timely manner due to memory and computational time constraints. In this work, a novel parallelized ant colony optimization technique in a distributed framework provided by the Hadoop/Map-Reduce infrastructure is proposed to improve the edge detection capabilities. Moreover, a filtering technique is applied to reduce the noisy background of images to achieve significant improvement in the accuracy of edge detection. Close examinations of the implementation of the proposed algorithm are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.