This article describes the importance of detecting grain deviation caused by knots and reviews the main methods used in measuring grain orientation surrounding knots. It discusses the potential of using Diffusion Tens...This article describes the importance of detecting grain deviation caused by knots and reviews the main methods used in measuring grain orientation surrounding knots. It discusses the potential of using Diffusion Tensor Magnetic Resonance Imaging to track and map the grain deviation caused by knots.展开更多
The increasing complexity of neural network applications has led to a demand for higher computational parallelism and more efficient synchronization in artificial intelligence(AI)chips.To achieve higher performance an...The increasing complexity of neural network applications has led to a demand for higher computational parallelism and more efficient synchronization in artificial intelligence(AI)chips.To achieve higher performance and lower power,a comprehensive and efficient approach is required to compile neural networks for implementation on dedicated hardware.Our first-generation deep learning accelerator,tensor computing unit,was presented with hardware and software solutions.It offered dedicated very long instruction words(VLIWs)instructions and multi-level repeatable direct memory access(DMA).The former lowers the instruction bandwidth requirement and makes it easier to parallelize the index and vector computations.The latter reduces the communication latency between the compute core and the asynchronous DMA,and also greatly alleviates the programming complexity.For operator implementation and optimization,the compiler-based data-flow generator and the instruction macro generator first produced a set of parameterized operators.Then,the tunerconfiguration generator pruned the search space and the distributed tuner framework selected the best data-flow pattern and corresponding parameters.Our tensor computing unit supports all the convolution parameters with full-shape dimensions.It can readily select proper operators to achieve 96%of the chip peak performance under certain shapes and find the best performance implementation within limited power.The evaluation of a large number of convolution shapes on our tensor computing unit chip shows the generated operators significantly outperform the handwritten ones,achieving 9%higher normalized performance than CUDA according to the silicon data.展开更多
It is one of the oldest research topics in computer algebra to determine the equivalence of Riemann tensor indexed polynomials. However, it remains to be a challenging problem since Grbner basis theory is not yet powe...It is one of the oldest research topics in computer algebra to determine the equivalence of Riemann tensor indexed polynomials. However, it remains to be a challenging problem since Grbner basis theory is not yet powerful enough to deal with ideals that cannot be finitely generated. This paper solves the problem by extending Grbner basis theory. First, the polynomials are described via an infinitely generated free commutative monoid ring. The authors then provide a decomposed form of the Grbner basis of the defining syzygy set in each restricted ring. The canonical form proves to be the normal form with respect to the Grbner basis in the fundamental restricted ring, which allows one to determine the equivalence of polynomials. Finally, in order to simplify the computation of canonical form, the authors find the minimal restricted ring.展开更多
基金The support of the New Zealand Foundation for Research,Science and Technology (Contract No. C04X0705) is gratefully acknowledged
文摘This article describes the importance of detecting grain deviation caused by knots and reviews the main methods used in measuring grain orientation surrounding knots. It discusses the potential of using Diffusion Tensor Magnetic Resonance Imaging to track and map the grain deviation caused by knots.
文摘The increasing complexity of neural network applications has led to a demand for higher computational parallelism and more efficient synchronization in artificial intelligence(AI)chips.To achieve higher performance and lower power,a comprehensive and efficient approach is required to compile neural networks for implementation on dedicated hardware.Our first-generation deep learning accelerator,tensor computing unit,was presented with hardware and software solutions.It offered dedicated very long instruction words(VLIWs)instructions and multi-level repeatable direct memory access(DMA).The former lowers the instruction bandwidth requirement and makes it easier to parallelize the index and vector computations.The latter reduces the communication latency between the compute core and the asynchronous DMA,and also greatly alleviates the programming complexity.For operator implementation and optimization,the compiler-based data-flow generator and the instruction macro generator first produced a set of parameterized operators.Then,the tunerconfiguration generator pruned the search space and the distributed tuner framework selected the best data-flow pattern and corresponding parameters.Our tensor computing unit supports all the convolution parameters with full-shape dimensions.It can readily select proper operators to achieve 96%of the chip peak performance under certain shapes and find the best performance implementation within limited power.The evaluation of a large number of convolution shapes on our tensor computing unit chip shows the generated operators significantly outperform the handwritten ones,achieving 9%higher normalized performance than CUDA according to the silicon data.
基金supported by the National Natural Science Foundation of China under Grant No.11701370the Natural Science Foundation of Shanghai under Grant No.15ZR1401600
文摘It is one of the oldest research topics in computer algebra to determine the equivalence of Riemann tensor indexed polynomials. However, it remains to be a challenging problem since Grbner basis theory is not yet powerful enough to deal with ideals that cannot be finitely generated. This paper solves the problem by extending Grbner basis theory. First, the polynomials are described via an infinitely generated free commutative monoid ring. The authors then provide a decomposed form of the Grbner basis of the defining syzygy set in each restricted ring. The canonical form proves to be the normal form with respect to the Grbner basis in the fundamental restricted ring, which allows one to determine the equivalence of polynomials. Finally, in order to simplify the computation of canonical form, the authors find the minimal restricted ring.