期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Adaptive implementation of multi-branch convolution with fusion coefficients based on reconfigurable array
1
作者 Liu Dongyue Jiang Lin +2 位作者 Wang Mei Li Yuancheng Hao Juan 《High Technology Letters》 2026年第1期39-48,共10页
Reconfigurable array architecture has become an important hardware platform for edge-side deployment of convolutional neural networks due to their high parallelism and flexible programmability.However,traditional mult... Reconfigurable array architecture has become an important hardware platform for edge-side deployment of convolutional neural networks due to their high parallelism and flexible programmability.However,traditional multi-branch convolutional networks suffer from computational redundancy,high memory access overhead,and inefficient branch fusion.Therefore,this paper proposes an adaptive multi-branch convolutional module(AMBC)that integrates software-hardware co-optimization.During training,the learnable fusion coefficients are introduced to enable adaptive fusion of multi-scale features,while in the inference phase,the multiple branches and their normalization parameters are merged with the fusion coefficients into a single 3×3 convolutional kernel through operator fusion.On the SIREA-288 reconfigurable platform,compared with unoptimized multi-branch networks,the proposed AMBC reduces external memory accesses by 47.91%and inference latency by 47.20%,achieving a 1.90×speedup.This approach maximizes the utilization of the reconfigurable logic while minimizing both reconfiguration and data-movement overheads in edge inference. 展开更多
关键词 reconfigurable array processor structural re-parameterization model compression fusion coefficients edge-side inference acceleration hardware-software co-optimization
在线阅读 下载PDF
Hardware-Software Collaborative Techniques for Runtime Profiling and Phase Transition Detection 被引量:1
2
作者 Youfeng Wu Yong-Fong Lee 《Journal of Computer Science & Technology》 SCIE EI CSCD 2005年第5期665-675,共11页
Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In... Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors. 展开更多
关键词 runtime profiling dynamic optimizations phase transition detection hardware-software collaboration
原文传递
TikTak: A Scalable Simulator of Wireless Sensor Networks Including Hardware/Software Interaction
3
作者 Francesco Menichelli Mauro Olivieri 《Wireless Sensor Network》 2010年第11期815-822,共8页
We present a simulation framework for wireless sensor networks developed to allow the design exploration and the complete microprocessor-instruction-level debug of network formation, data congestion, nodes interaction... We present a simulation framework for wireless sensor networks developed to allow the design exploration and the complete microprocessor-instruction-level debug of network formation, data congestion, nodes interaction, all in one simulation environment. A specifically innovative feature is the co-emulation of selected nodes at clock-cycle-accurate hardware processing level, allowing code debug and exact execution latency evaluation (considering both protocol stack and application), together with other nodes at abstract protocol level, meeting a designer’s needs of simulation speed, scalability and reliability. The simulator is centered on the Zigbee protocol and can be retargeted for different node micro-architectures. 展开更多
关键词 WSN Simulation hardware-software co-emulation
在线阅读 下载PDF
Feasibility study of large-scale mass customization 3D printing framework system with a case study on Nanjing Happy Valley East Gate
4
作者 Philip F.Yuan Hooi Shan Beh +2 位作者 Xuezhou Yang Liming Zhang Tianyi Gao 《Frontiers of Architectural Research》 CSCD 2022年第4期670-680,共11页
At present, the development and implementation of digital transformation are the keys to promoting high-quality industry development. The new digital fabrication method of robotic 3D printing is a research area being ... At present, the development and implementation of digital transformation are the keys to promoting high-quality industry development. The new digital fabrication method of robotic 3D printing is a research area being studied by many to tackle the issue of the declining productivity of traditional construction methods. Although many studies have been done, most of the current 3D printing projects are facing limitations in terms of scale. In order to bridge the gap, this article proposed a mass customization 3D printing framework system for large-scale projects. This article discusses how mass customization is made possible through the joint operation of the FUROBOT software and 3D printing hardware. By taking the east gate of Nanjing Happy Valley Plaza as a case study, the article demonstrates and studies the feasibility of the large-scale mass customization 3D printing framework system. 展开更多
关键词 Mass customization 3D printing hardware-software integration Human-machine collaboration Digital fabrication
原文传递
FASS-pruner:customizing a fine-grained CNN accelerator-aware pruning framework via intra-filter splitting and inter-filter shuffling
5
作者 Xiaohui Wei Xinyang Zheng +2 位作者 Chenyang Wang Guangli Li Hengshan Yue 《CCF Transactions on High Performance Computing》 2023年第3期292-303,共12页
Nowadays,with the increasing depth of CNNs,the number of computation and storage requirements with weights expands significantly,preventing their wide deployment on resource-constrained application scenarios such as e... Nowadays,with the increasing depth of CNNs,the number of computation and storage requirements with weights expands significantly,preventing their wide deployment on resource-constrained application scenarios such as embedded systems.To improve the efficiency of the current deep CNN inference stage,researchers have attempted to explore weight pruning techniques on CNN accelerators(e.g.,systolic arrays)to avoid the number of unimportant weights storage and computation.However,these attempts either suffer expensive extra hardware costs to encode/decode the irregular sparse weight pattern on accelerators or bring finite performance improvement due to structured pruning’s modest compression ratio.In order to address the above challenge,this paper proposes FASS-Pruner,a Fine-grained Accelerator-aware pruning framework via intra-filter Splitting and inter-filter Shuffling:(1)Considering the round-by-round execution behavior of CNN accelerator,FASS-Pruner split filters into multiple rounds to perform column-wise-weight pruning;(2)Leveraging the calculation independence characteristics across filters on CNN accelerators,FASS-Pruner shuffles the filters to prune the unimportant rowwise weights at CNN accelerator.Combining the sparse pattern of pruned CNN and the dataflow of systolic array,we modify the systolic array-based accelerator to enable it to execute pruned sparse CNN with better performance and lower energy consumption.By condensing the pruned sparse weights in systolic arrays,FASS-Pruner achieves a comparable pruning ratio while preserving the original data flow of CNN accelerators,thereby achieving significant performance and energy saving. 展开更多
关键词 CNN accelerator Model pruning hardware-software co-design
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部