期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
a‑Tucker:fast input‑adaptive and matricization‑free Tucker decomposition of higher‑order tensors on GPUs
1
作者 Lian Duan Chuanfu Xiao +2 位作者 Min Li Mingshuo Ding Chao Yang 《CCF Transactions on High Performance Computing》 2023年第1期12-25,共14页
Tucker decomposition is one of the most popular models for analyzing and compressing large-scale tensorial data.Existing Tucker decomposition algorithms are usually based on a single solver to compute the factor matri... Tucker decomposition is one of the most popular models for analyzing and compressing large-scale tensorial data.Existing Tucker decomposition algorithms are usually based on a single solver to compute the factor matrices and intermediate tensor in a predetermined order,and are not flexible enough to adapt with the diversities of the input data and the hardware.Moreover,to exploit highly efficient matrix multiplication kernels,most Tucker decomposition implementations rely on explicit matricizations,which could introduce extra costs of data conversion.In this paper,we present a-Tucker,a new framework for input-adaptive and matricization-free Tucker decomposition of higher-order tensors on GPUs.A two-level flexible Tucker decomposition algorithm is proposed to enable the switch of different calculation orders and different factor solvers,and a machine-learning adaptive order-solver selector is applied to automatically cope with change of the application scenarios.To further improve the performance,we implement a-Tucker in a fully matricization-free manner without any conversion between tensors and matrices.Experiments show that a-Tucker can substantially outperform existing works while keeping similar accuracy with a variety of synthetic and real-world tensors. 展开更多
关键词 tensor computation Tucker decomposition Higher-order singular value decomposition Input-adaptive Matricization-free GPU computing
在线阅读 下载PDF
Should we consider a new approach? Detecting grain deviation caused by knots within stems
2
作者 Ping XU 《Forestry Studies in China》 CAS 2010年第2期101-105,共5页
This article describes the importance of detecting grain deviation caused by knots and reviews the main methods used in measuring grain orientation surrounding knots. It discusses the potential of using Diffusion Tens... This article describes the importance of detecting grain deviation caused by knots and reviews the main methods used in measuring grain orientation surrounding knots. It discusses the potential of using Diffusion Tensor Magnetic Resonance Imaging to track and map the grain deviation caused by knots. 展开更多
关键词 KNOT grain deviation grain orientation destructive testing non-destructive testing computed tomography imaging Diffusion tensor Magnetic Resonance Imaging
在线阅读 下载PDF
A high-performance tensor computing unit for deep learning acceleration
3
作者 Qiang Zhou Tieli Sun +1 位作者 Taoran Shen York Xue 《Chip》 2025年第2期75-84,共10页
The increasing complexity of neural network applications has led to a demand for higher computational parallelism and more efficient synchronization in artificial intelligence(AI)chips.To achieve higher performance an... The increasing complexity of neural network applications has led to a demand for higher computational parallelism and more efficient synchronization in artificial intelligence(AI)chips.To achieve higher performance and lower power,a comprehensive and efficient approach is required to compile neural networks for implementation on dedicated hardware.Our first-generation deep learning accelerator,tensor computing unit,was presented with hardware and software solutions.It offered dedicated very long instruction words(VLIWs)instructions and multi-level repeatable direct memory access(DMA).The former lowers the instruction bandwidth requirement and makes it easier to parallelize the index and vector computations.The latter reduces the communication latency between the compute core and the asynchronous DMA,and also greatly alleviates the programming complexity.For operator implementation and optimization,the compiler-based data-flow generator and the instruction macro generator first produced a set of parameterized operators.Then,the tunerconfiguration generator pruned the search space and the distributed tuner framework selected the best data-flow pattern and corresponding parameters.Our tensor computing unit supports all the convolution parameters with full-shape dimensions.It can readily select proper operators to achieve 96%of the chip peak performance under certain shapes and find the best performance implementation within limited power.The evaluation of a large number of convolution shapes on our tensor computing unit chip shows the generated operators significantly outperform the handwritten ones,achieving 9%higher normalized performance than CUDA according to the silicon data. 展开更多
关键词 Deep learning accelerator Programming model VLIW DMA tensor computing unit
原文传递
Normalization in Riemann Tensor Polynomial Ring 被引量:3
4
作者 LIU Jiang 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2018年第2期569-580,共12页
It is one of the oldest research topics in computer algebra to determine the equivalence of Riemann tensor indexed polynomials. However, it remains to be a challenging problem since Grbner basis theory is not yet powe... It is one of the oldest research topics in computer algebra to determine the equivalence of Riemann tensor indexed polynomials. However, it remains to be a challenging problem since Grbner basis theory is not yet powerful enough to deal with ideals that cannot be finitely generated. This paper solves the problem by extending Grbner basis theory. First, the polynomials are described via an infinitely generated free commutative monoid ring. The authors then provide a decomposed form of the Grbner basis of the defining syzygy set in each restricted ring. The canonical form proves to be the normal form with respect to the Grbner basis in the fundamental restricted ring, which allows one to determine the equivalence of polynomials. Finally, in order to simplify the computation of canonical form, the authors find the minimal restricted ring. 展开更多
关键词 Canonical form Einstein summation convention monoid ring nD symbolic computation Riemann tensor.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部