期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
COPPER:a combinatorial optimization problem solver with processing-in-memory architecture
1
作者 Qiankun WANG Xingchen LI +4 位作者 Bingzhe WU Ke YANG Wei HU Guangyu SUN Yuchao YANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2023年第5期731-741,共11页
The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the... The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the problem scale increases.Troubled by this,researchers may prefer fast methods even if they are not exact,so approximation algorithms,heuristic algorithms,and machine learning have been proposed.Some works proposed chaotic simulated annealing(CSA)based on the Hopfield neural network and did a good job.However,CSA is not something that current general-purpose processors can handle easily,and there is no special hardware for it.To efficiently perform CSA,we propose a software and hardware co-design.In software,we quantize the weight and output using appropriate bit widths,and then modify the calculations that are not suitable for hardware implementation.In hardware,we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor.COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration.The results show that COPPER can perform CSA remarkably well in both speed and energy. 展开更多
关键词 Combinatorial optimization Chaotic simulated annealing processing-in-memory
原文传递
ARCHER:a ReRAM-based accelerator for compressed recommendation systems
2
作者 Xinyang SHEN Xiaofei LIAO +3 位作者 Long ZHENG Yu HUANG Dan CHEN Hai JIN 《Frontiers of Computer Science》 SCIE EI CSCD 2024年第5期147-160,共14页
Modern recommendation systems are widely used in modern data centers.The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms... Modern recommendation systems are widely used in modern data centers.The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data movements between computing units and memory.ReRAM-based processing-in-memory(PIM)can resolve this problem by processing embedding vectors where they are stored.However,the embedding table can easily exceed the capacity limit of a monolithic ReRAM-based PIM chip,which induces off-chip accesses that may offset the PIM profits.Therefore,we deploy the decomposed model on-chip and leverage the high computing efficiency of ReRAM to compensate for the decompression performance loss.In this paper,we propose ARCHER,a ReRAM-based PIM architecture that implements fully yon-chip recommendations under resource constraints.First,we make a full analysis of the computation pattern and access pattern on the decomposed table.Based on the computation pattern,we unify the operations of each layer of the decomposed model in multiply-and-accumulate operations.Based on the access observation,we propose a hierarchical mapping schema and a specialized hardware design to maximize resource utilization.Under the unified computation and mapping strategy,we can coordinatethe inter-processing elements pipeline.The evaluation shows that ARCHER outperforms the state-of-the-art GPU-based DLRM system,the state-of-the-art near-memory processing recommendation system RecNMP,and the ReRAM-based recommendation accelerator REREC by 15.79×,2.21×,and 1.21× in terms of performance and 56.06×,6.45×,and 1.71× in terms of energy savings,respectively. 展开更多
关键词 recommendation system RERAM processing-in-memory embedding layer
原文传递
ReCSA:a dedicated sort accelerator using ReRAM-based content addressable memory 被引量:2
3
作者 Huize LI Hai JIN +2 位作者 Long ZHENG Yu HUANG Xiaofei LIAO 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第2期1-13,共13页
With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data sets.Hardware sorting algorithms have attracted much attention because they can take advantage of diffe... With the increasing amount of data,there is an urgent need for efficient sorting algorithms to process large data sets.Hardware sorting algorithms have attracted much attention because they can take advantage of different hardware's parallelism.But the traditional hardware sort accelerators suffer“memory wall”problems since their multiple rounds of data transmission between the memory and the processor.In this paper,we utilize the in-situ processing ability of the ReRAM crossbar to design a new ReCAM array that can process the matrix-vector multiplication operation and the vector-scalar comparison in the same array simultaneously.Using this designed ReCAM array,we present ReCSA,which is the first dedicated ReCAM-based sort accelerator.Besides hardware designs,we also develop algorithms to maximize memory utilization and minimize memory exchanges to improve sorting performance.The sorting algorithm in ReCSA can process various data types,such as integer,float,double,and strings.We also present experiments to evaluate the performance and energy efficiency against the state-of-the-art sort accelerators.The experimental results show that ReCSA has 90.92×,46.13×,27.38×,84.57×,and 3.36×speedups against CPU-,GPU-,FPGA-,NDP-,and PIM-based platforms when processing numeric data sets.ReCSA also has 24.82×,32.94×,and 18.22×performance improvement when processing string data sets compared with CPU-,GPU-,and FPGA-based platforms. 展开更多
关键词 ReCAM parallel sorting architecture design processing-in-memory
原文传递
PRAP-PIM:A weight pattern reusing aware pruning method for ReRAM-based PIM DNN accelerators 被引量:1
4
作者 Zhaoyan Shen Jinhao Wu +3 位作者 Xikun Jiang Yuhao Zhang Lei Ju Zhiping Jia 《High-Confidence Computing》 2023年第2期50-59,共10页
Resistive Random-Access Memory(ReRAM)based Processing-in-Memory(PIM)frameworks are proposed to accelerate the working process of DNN models by eliminating the data movement between the computing and memory units.To fu... Resistive Random-Access Memory(ReRAM)based Processing-in-Memory(PIM)frameworks are proposed to accelerate the working process of DNN models by eliminating the data movement between the computing and memory units.To further mitigate the space and energy consumption,DNN model weight sparsity and weight pattern repetition are exploited to optimize these ReRAM-based accelerators.However,most of these works only focus on one aspect of this software/hardware codesign framework and optimize them individually,which makes the design far from optimal.In this paper,we propose PRAP-PIM,which jointly exploits the weight sparsity and weight pattern repetition by using a weight pattern reusing aware pruning method.By relaxing the weight pattern reusing precondition,we propose a similarity-based weight pattern reusing method that can achieve a higher weight pattern reusing ratio.Experimental results show that PRAP-PIM achieves 1.64×performance improvement and 1.51×energy efficiency improvement in popular deep learning benchmarks,compared with the state-of-the-art ReRAM-based DNN accelerators. 展开更多
关键词 Resistive Random-Access Memory processing-in-memory Deep neural network Model compression
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部