针对深度神经网络(deep neural network,DNN)模型在传统切片与映射方法中存在的资源调度和数据传输瓶颈问题,提出了一种基于片上网络(network on chip,NoC)加速器的高效DNN动态切片与智能映射优化算法。该算法通过动态切片技术灵活划分...针对深度神经网络(deep neural network,DNN)模型在传统切片与映射方法中存在的资源调度和数据传输瓶颈问题,提出了一种基于片上网络(network on chip,NoC)加速器的高效DNN动态切片与智能映射优化算法。该算法通过动态切片技术灵活划分DNN模型的计算任务,并结合智能映射策略优化NoC架构中的任务分配与数据流管理。实验结果表明,与传统方法相比,该算法在计算吞吐量、NoC传输时延、外部内存访问次数和计算能效等方面均显著提升,尤其在复杂模型上表现突出。展开更多
随着集成电路技术的飞速发展,其集成度和复杂度越来越高,导致芯片功耗问题日益严重。文章提出一套兼容片上网络(Net on Chip,NoC)总线的功耗管理总线,针对不同电源域进行低功耗管理,通过电源域开关协议将电源域状态同步到事务活动,且不...随着集成电路技术的飞速发展,其集成度和复杂度越来越高,导致芯片功耗问题日益严重。文章提出一套兼容片上网络(Net on Chip,NoC)总线的功耗管理总线,针对不同电源域进行低功耗管理,通过电源域开关协议将电源域状态同步到事务活动,且不影响系统其他部分的操作。实验结果表明,功耗管理总线具有低成本、协议简单、兼容性好、轻量级等优势。展开更多
The water Cherenkov detector array(WCDA) is an important part of the large high-altitude air shower observatory(LHAASO),which is in a research and development phase.The central scientific goal of LHAASO is to explore ...The water Cherenkov detector array(WCDA) is an important part of the large high-altitude air shower observatory(LHAASO),which is in a research and development phase.The central scientific goal of LHAASO is to explore the origin of high-energy cosmic rays of the universe and to push forward the frontier of new physics.To simplify the WCDA's readout electronics,a prototype of a front-end readout for an application-specific integrated circuit(ASIC) is designed based on the timeover-threshold method to achieve charge-to-time conversion.High-precision time measurement and charge measurement are necessary over a full dynamic range[1-4000photoelectrons(P.E.)].To evaluate the performance of this ASIC,a test system is designed that includes the front-end ASIC test module,digitization module,and test software.The first module needs to be customized for different ASIC versions,whereas the digitization module and test software are tested for general-purpose use.In the digitization module,a field programmable gate array-based time-todigital converter is designed with a bin size of 333 ps,which also integrates an inter-integrated circuit to configure the ASIC test module,and a universal serial bus interface is designed to transfer data to the remote computer.Test results indicate that the time resolution is better than 0.5 ns,and the charge resolution is better than 30%root mean square(RMS) at 1 P.E.and 3%RMS at 4000 P.E.,which are beyond the application requirements.展开更多
Along with higher and higher integration of intellectual properties(IPs) on a single chip, traditional bus-based system-on-chips(So C) meets several design difficulties(such as low scalability, high power consumption,...Along with higher and higher integration of intellectual properties(IPs) on a single chip, traditional bus-based system-on-chips(So C) meets several design difficulties(such as low scalability, high power consumption,packet latency and clock tree problem). As a promising solution, network-on-chips(No C) has been proposed and widely studied. In this work, a novel algorithm for No C topology synthesis, which is decomposing and cluster refinement(DCR) algorithm, has been proposed to minimize the total power consumption of application-specific No C. This algorithm is composed of two stages: decomposing with cluster generation, and cluster refinement.For partitioning and cluster generation, an initial low-power solution for No C topology is generated. For cluster refinement, the clustering is optimized by performing floorplan to further reduce power consumption. Meanwhile,a good tradeoff between power consumption and CPU time can be achieved. Experimental results show that the proposed method outperforms the existing work.展开更多
采用模块化方法对集中式仲裁共享总线和二维网格片上网络(Network on Chip,NoC)的硬件开销和延迟进行了数学上的分析。在此基础上,通过可综合Verilog代码对这两种片上通信结构在RTL级进行描述,并建立了这两种通信方式的周期准确级的功...采用模块化方法对集中式仲裁共享总线和二维网格片上网络(Network on Chip,NoC)的硬件开销和延迟进行了数学上的分析。在此基础上,通过可综合Verilog代码对这两种片上通信结构在RTL级进行描述,并建立了这两种通信方式的周期准确级的功能验证和性能分析环境。结果表明,在同样工艺条件下,共享总线的面积与NoC相比相当小;但对于大规模片上系统通信,NoC的吞吐效率及带宽明显优于共享总线。展开更多
文摘针对深度神经网络(deep neural network,DNN)模型在传统切片与映射方法中存在的资源调度和数据传输瓶颈问题,提出了一种基于片上网络(network on chip,NoC)加速器的高效DNN动态切片与智能映射优化算法。该算法通过动态切片技术灵活划分DNN模型的计算任务,并结合智能映射策略优化NoC架构中的任务分配与数据流管理。实验结果表明,与传统方法相比,该算法在计算吞吐量、NoC传输时延、外部内存访问次数和计算能效等方面均显著提升,尤其在复杂模型上表现突出。
文摘随着集成电路技术的飞速发展,其集成度和复杂度越来越高,导致芯片功耗问题日益严重。文章提出一套兼容片上网络(Net on Chip,NoC)总线的功耗管理总线,针对不同电源域进行低功耗管理,通过电源域开关协议将电源域状态同步到事务活动,且不影响系统其他部分的操作。实验结果表明,功耗管理总线具有低成本、协议简单、兼容性好、轻量级等优势。
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciences(KJCX2-YW-N27)the CAS Center for Excellence in Particle Physics(CCEPP)
文摘The water Cherenkov detector array(WCDA) is an important part of the large high-altitude air shower observatory(LHAASO),which is in a research and development phase.The central scientific goal of LHAASO is to explore the origin of high-energy cosmic rays of the universe and to push forward the frontier of new physics.To simplify the WCDA's readout electronics,a prototype of a front-end readout for an application-specific integrated circuit(ASIC) is designed based on the timeover-threshold method to achieve charge-to-time conversion.High-precision time measurement and charge measurement are necessary over a full dynamic range[1-4000photoelectrons(P.E.)].To evaluate the performance of this ASIC,a test system is designed that includes the front-end ASIC test module,digitization module,and test software.The first module needs to be customized for different ASIC versions,whereas the digitization module and test software are tested for general-purpose use.In the digitization module,a field programmable gate array-based time-todigital converter is designed with a bin size of 333 ps,which also integrates an inter-integrated circuit to configure the ASIC test module,and a universal serial bus interface is designed to transfer data to the remote computer.Test results indicate that the time resolution is better than 0.5 ns,and the charge resolution is better than 30%root mean square(RMS) at 1 P.E.and 3%RMS at 4000 P.E.,which are beyond the application requirements.
文摘Along with higher and higher integration of intellectual properties(IPs) on a single chip, traditional bus-based system-on-chips(So C) meets several design difficulties(such as low scalability, high power consumption,packet latency and clock tree problem). As a promising solution, network-on-chips(No C) has been proposed and widely studied. In this work, a novel algorithm for No C topology synthesis, which is decomposing and cluster refinement(DCR) algorithm, has been proposed to minimize the total power consumption of application-specific No C. This algorithm is composed of two stages: decomposing with cluster generation, and cluster refinement.For partitioning and cluster generation, an initial low-power solution for No C topology is generated. For cluster refinement, the clustering is optimized by performing floorplan to further reduce power consumption. Meanwhile,a good tradeoff between power consumption and CPU time can be achieved. Experimental results show that the proposed method outperforms the existing work.
文摘采用模块化方法对集中式仲裁共享总线和二维网格片上网络(Network on Chip,NoC)的硬件开销和延迟进行了数学上的分析。在此基础上,通过可综合Verilog代码对这两种片上通信结构在RTL级进行描述,并建立了这两种通信方式的周期准确级的功能验证和性能分析环境。结果表明,在同样工艺条件下,共享总线的面积与NoC相比相当小;但对于大规模片上系统通信,NoC的吞吐效率及带宽明显优于共享总线。