As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo a...As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices.展开更多
Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quant...Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quantum algorithm for Toeplitz matrix-vector multiplication with time complexity O(κpolylogn),whereκand 2n are the condition number and the dimension of the circulant matrix extended from the Toeplitz matrix,respectively.For the case with an unknown generating function,we also give a corresponding non-asymptotic quantum version that eliminates the dependency on the L_(1)-normρof the displacement of the structured matrices.Due to the good use of the special properties of Toeplitz matrices,the proposed quantum algorithms are sufficiently accurate and efficient compared to the existing quantum algorithms under certain circumstances.展开更多
稀疏线性方程组求解等高性能计算应用常常涉及稀疏矩阵向量乘(SpMV)序列Ax,A2x,…,Asx的计算.上述SpMV序列操作又称为稀疏矩阵幂函数(matrix power kernel,MPK).由于MPK执行多次SpMV且稀疏矩阵保持不变,在缓存(cache)中重用稀疏矩阵,可...稀疏线性方程组求解等高性能计算应用常常涉及稀疏矩阵向量乘(SpMV)序列Ax,A2x,…,Asx的计算.上述SpMV序列操作又称为稀疏矩阵幂函数(matrix power kernel,MPK).由于MPK执行多次SpMV且稀疏矩阵保持不变,在缓存(cache)中重用稀疏矩阵,可避免每次执行SpMV均从主存加载A,从而缓解SpMV访存受限问题,提升MPK性能.但缓存数据重用会导致相邻SpMV操作之间的数据依赖,现有MPK优化多针对单次SpMV调用,或在实现数据重用时引入过多额外开销.提出了缓存感知的MPK(cache-awareMPK,Ca-MPK),基于稀疏矩阵的依赖图,设计了体系结构感知的递归划分方法,将依赖图划分为适合缓存大小的子图/子矩阵,通过构建分割子图解耦数据依赖,根据特定顺序在子矩阵上调度执行SpMV,实现缓存数据重用.测试结果表明,Ca-MPK相对于Intel OneMKL库和最新MPK实现,平均性能提升分别多达约1.57倍和1.40倍.展开更多
As an important computing operation,photonic matrix-vector multiplication is widely used in photonic neutral networks and signal processing.However,conventional incoherent matrix-vector multiplication focuses on real-...As an important computing operation,photonic matrix-vector multiplication is widely used in photonic neutral networks and signal processing.However,conventional incoherent matrix-vector multiplication focuses on real-valued operations,which cannot work well in complex-valued neural networks and discrete Fourier transform.In this paper,we propose a systematic solution to extend the matrix computation of microring arrays from the real-valued field to the complex-valued field,and from small-scale(i.e.,4×4)to large-scale matrix computation(i.e.,16×16).Combining matrix decomposition and matrix partition,our photonic complex matrix-vector multiplier chip can support arbitrary large-scale and complex-valued matrix computation.We further demonstrate Walsh-Hardmard transform,discrete cosine transform,discrete Fourier transform,and image convolutional processing.Our scheme provides a path towards breaking the limits of complex-valued computing accelerator in conventional incoherent optical architecture.More importantly,our results reveal that an integrated photonic platform is of huge potential for large-scale,complex-valued,artificial intelligence computing and signal processing.展开更多
提出了一种模糊最优间隔分布矩阵分类器(Fuzzy Optimal-margin Distribution Matrix Classifier,FODMC)。该模型通过整合模糊隶属度理论与间隔分布优化机制,实现了矩阵结构信息的有效提取与异常值的鲁棒处理。具体而言,FODMC采用基于间...提出了一种模糊最优间隔分布矩阵分类器(Fuzzy Optimal-margin Distribution Matrix Classifier,FODMC)。该模型通过整合模糊隶属度理论与间隔分布优化机制,实现了矩阵结构信息的有效提取与异常值的鲁棒处理。具体而言,FODMC采用基于间隔分布的损失函数来优化分类边界,结合核范数正则化策略保持矩阵的低秩特性,并利用交替方向乘子法(Alternating Direction Method of Multipliers,ADMM)实现模型的高效训练。在多个基准数据集上的实验结果表明:与现有方法相比,FODMC在分类准确率、鲁棒性和泛化能力等方面均展现出显著优势,为矩阵数据分类问题提供了一种有效的解决方案。展开更多
文摘As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices.
基金supported by the National Natural Science Foundation of China(Grant Nos.62071015 and 62171264)。
文摘Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quantum algorithm for Toeplitz matrix-vector multiplication with time complexity O(κpolylogn),whereκand 2n are the condition number and the dimension of the circulant matrix extended from the Toeplitz matrix,respectively.For the case with an unknown generating function,we also give a corresponding non-asymptotic quantum version that eliminates the dependency on the L_(1)-normρof the displacement of the structured matrices.Due to the good use of the special properties of Toeplitz matrices,the proposed quantum algorithms are sufficiently accurate and efficient compared to the existing quantum algorithms under certain circumstances.
文摘稀疏线性方程组求解等高性能计算应用常常涉及稀疏矩阵向量乘(SpMV)序列Ax,A2x,…,Asx的计算.上述SpMV序列操作又称为稀疏矩阵幂函数(matrix power kernel,MPK).由于MPK执行多次SpMV且稀疏矩阵保持不变,在缓存(cache)中重用稀疏矩阵,可避免每次执行SpMV均从主存加载A,从而缓解SpMV访存受限问题,提升MPK性能.但缓存数据重用会导致相邻SpMV操作之间的数据依赖,现有MPK优化多针对单次SpMV调用,或在实现数据重用时引入过多额外开销.提出了缓存感知的MPK(cache-awareMPK,Ca-MPK),基于稀疏矩阵的依赖图,设计了体系结构感知的递归划分方法,将依赖图划分为适合缓存大小的子图/子矩阵,通过构建分割子图解耦数据依赖,根据特定顺序在子矩阵上调度执行SpMV,实现缓存数据重用.测试结果表明,Ca-MPK相对于Intel OneMKL库和最新MPK实现,平均性能提升分别多达约1.57倍和1.40倍.
基金This work was partially supported by the National Key Research and Development Project of China(No.2018YFB2201901)the National Natural Science Foundation of China(Grant Nos.61805090 and 62075075)+1 种基金Shenzhen Science and Technology Innovation Commission(No.SGDX2019081623060558)Research Grants Council of Hong Kong SAR(No.PolyU152241/18E).
文摘As an important computing operation,photonic matrix-vector multiplication is widely used in photonic neutral networks and signal processing.However,conventional incoherent matrix-vector multiplication focuses on real-valued operations,which cannot work well in complex-valued neural networks and discrete Fourier transform.In this paper,we propose a systematic solution to extend the matrix computation of microring arrays from the real-valued field to the complex-valued field,and from small-scale(i.e.,4×4)to large-scale matrix computation(i.e.,16×16).Combining matrix decomposition and matrix partition,our photonic complex matrix-vector multiplier chip can support arbitrary large-scale and complex-valued matrix computation.We further demonstrate Walsh-Hardmard transform,discrete cosine transform,discrete Fourier transform,and image convolutional processing.Our scheme provides a path towards breaking the limits of complex-valued computing accelerator in conventional incoherent optical architecture.More importantly,our results reveal that an integrated photonic platform is of huge potential for large-scale,complex-valued,artificial intelligence computing and signal processing.
文摘提出了一种模糊最优间隔分布矩阵分类器(Fuzzy Optimal-margin Distribution Matrix Classifier,FODMC)。该模型通过整合模糊隶属度理论与间隔分布优化机制,实现了矩阵结构信息的有效提取与异常值的鲁棒处理。具体而言,FODMC采用基于间隔分布的损失函数来优化分类边界,结合核范数正则化策略保持矩阵的低秩特性,并利用交替方向乘子法(Alternating Direction Method of Multipliers,ADMM)实现模型的高效训练。在多个基准数据集上的实验结果表明:与现有方法相比,FODMC在分类准确率、鲁棒性和泛化能力等方面均展现出显著优势,为矩阵数据分类问题提供了一种有效的解决方案。