期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
SAPER-AI accelerator:a systolic array-based power-efficient reconfigurable AI accelerator
1
作者 Fahad Bin MUSLIM Kashif INAYAT +3 位作者 Muhammad Zain SIDDIQI Safiullah KHAN Tayyeb MAHMOOD Ihtesham ul ISLAM 《Frontiers of Information Technology & Electronic Engineering》 2025年第9期1624-1636,共13页
Deep learning(DL)accelerators are critical for handling the growing computational demands of modern neural networks.Systolic array(SA)-based accelerators consist of a 2D mesh of processing elements(PEs)working coopera... Deep learning(DL)accelerators are critical for handling the growing computational demands of modern neural networks.Systolic array(SA)-based accelerators consist of a 2D mesh of processing elements(PEs)working cooperatively to accelerate matrix multiplication.The power efficiency of such accelerators is of primary importance,especially considering the edge AI regime.This work presents the SAPER-AI accelerator,an SA accelerator with power intent specified via a unified power format representation in a simplified manner with negligible microarchi-tectural optimization effort.Our proposed accelerator switches off rows and columns of PEs in a coarse-grained manner,thus leading to SA microarchitecture complying with the varying computational requirements of modern DL workloads.Our analysis demonstrates enhanced power efficiency ranging between 10% and 25% for the best case 32×32 and 64×64 SA designs,respectively.Additionally,the power delay product(PDP)exhibits a progressive improvement of around 6%for larger SA sizes.Moreover,a performance comparison between the MobileNet and ResNet50 models indicates generally better SA performance for the ResNet50 workload.This is due to the more regular convolutions portrayed by ResNet50 that are more favored by SAs,with the performance gap widening as the SA size increases. 展开更多
关键词 Artificial intelligence(ai)accelerators Application-specific integrated circuit(ASIC)design Systolic arrays Low-power designs
原文传递
LLMs at home? An evaluation on the feasibility of popularising On-device-ANI capable hardware in consumer grade devices
2
作者 Yiding Wang 《Advances in Engineering Innovation》 2024年第5期26-29,共4页
Artificial Narrow Intelligences(ANI)are rapidly becoming an integral part of everyday consumer technology.With products like ChatGPT,Midjourney,and Stable Diffusion gaining widespread popularity,the demand for local h... Artificial Narrow Intelligences(ANI)are rapidly becoming an integral part of everyday consumer technology.With products like ChatGPT,Midjourney,and Stable Diffusion gaining widespread popularity,the demand for local hosting of neural networks has significantly increased.However,the typical'always-online'nature of these services presents several limitations,including dependence on reliable internet connections,privacy concerns,and ongoing operational costs.This essay will explore potential hardware solutions to popularize on-device inferencing of ANI on consumer hardware and speculate on the future of the industry. 展开更多
关键词 ANI neural network LLM ai Accelerator chip design component efficiency consumer applications
在线阅读 下载PDF
An Efficient Multiplier-Less Processing Element on Power-of-2 Dictionary-Based Data Quantization
3
作者 JIAXIANG LI MASAO YANAGISAWA YOUHUA SHI 《Integrated Circuits and Systems》 2024年第1期53-62,共10页
The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressur... The large-scale neural networks have brought incredible shocks to the world,changing people’s lives and offering vast prospects.However,they also come with enormous demands for computational power and storage pressure,the core of its computational requirements lies in the matrix multiplication units dominated by multiplication operations.To address this issue,we propose an area-power-efficient multiplier-less processing element(PE)design.Prior to implementing the proposed PE,we apply a powerof-2 dictionary-based quantization to the model and effectiveness of this quantization method in preserving the accuracy of the original model is confirmed.In hardware design,we present a standard and one variant‘bi-sign’architecture of the PE.Our evaluation results demonstrate that the systolic array that implement our standard multiplier-less PE achieves approximately 38%lower power-delay-product and 13%smaller core area compared to a conventional multiplication-and-accumulation PE and the bi-sign PE design can even save 37%core area and 38%computation energy.Furthermore,the applied quantization reduces the model size and operand bit-width,leading to decreased on-chip memory usage and energy consumption for memory accesses.Additionally,the hardware schematic facilitates expansion to support other sparsity-aware,energy-efficient techniques. 展开更多
关键词 ai accelerators approximate computing efficient-computing model quantization multiplier-less processing element
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部