Convolution algorithms based on the Winograd implementation can reduce computational complexity and are widely used in CNNs.As an emerging GPU-like accelerator,DCU has achieved some performance optimization for the Wi...Convolution algorithms based on the Winograd implementation can reduce computational complexity and are widely used in CNNs.As an emerging GPU-like accelerator,DCU has achieved some performance optimization for the Winograd algorithm,but it fails to fully exploit the Matrix Cores of DCU to further enhance the efficiency of Winograd convolution computations.This paper proposes an improved fused Winograd convolution optimization scheme that integrates all transformation stages into a single kernel,which is specifically designed to exploit the characteristics of Matrix Cores.In the input transformation stage,we design an efficient data reuse mechanism that reduces redundant global memory accesses.In the element-wise matrix multiplication stage,we transform Hadamard products into batched GEMMs,boosting computational intensity and complying with the data layout requirements of Matrix Cores.During kernel fusion,we eliminate shared memory bank conflicts by reorganizing thread layout and further introduce software pipelining to effectively mask memory access latency.The results show that our method achieves average speedups of 1.35×and 1.72×(up to 1.81×and 2.78×)over the Winograd and Implicit GEMM algorithms in MIOpen under FP16 mode,and 1.22×and 1.53×(up to 1.55×and 1.88×)under FP32 mode.展开更多
概念认知学习是一种新兴的交叉研究热点领域,旨在通过模仿人类的认知过程不断学习新知识。然而,现有的概念认知学习模型通常忽略了概念中对象的局部差异性、概念空间的冗余性、概念可解释性等问题,导致模型认知偏差与有效信息利用不足...概念认知学习是一种新兴的交叉研究热点领域,旨在通过模仿人类的认知过程不断学习新知识。然而,现有的概念认知学习模型通常忽略了概念中对象的局部差异性、概念空间的冗余性、概念可解释性等问题,导致模型认知偏差与有效信息利用不足。因此,提出一种融合隶属度与覆盖的模糊概念认知学习(fuzzy concept-cognitive learning model integrating membership degree and coverage,IMDC)模型。首先,为了提高概念外延的表征能力,引入一种带偏移阈值的隶属度函数探讨对象与概念之间的相关性,并构造隶属度矩阵,进一步将概念空间转化为模糊覆盖;其次,通过模糊β截集筛选高相关对象,结合覆盖率探索不同概念的地位,从而构建核心概念空间,以有效降低概念空间的冗余性,提高认知学习效率;然后,基于线索与核心概念之间的相似性实现概念分类;最后,采用十折交叉验证方法,将提出的模型与4种机器学习算法和2种概念认知算法进行对比。实验结果表明,该模型在14个数据集上的平均精度均高于其他对比算法,并且在不同数据集上的性能波动范围最小,此外,在查准率、查全率、F1值方面也保持领先优势,充分验证了该模型的可行性和有效性。展开更多
基金funded by the National Key Research and Development Program of China(2023ZD0120604)the National Key Research and Development Program of China(2024YFB4504103)the Major Science and Technology Special Projects in Henan Province(241111212300).
文摘Convolution algorithms based on the Winograd implementation can reduce computational complexity and are widely used in CNNs.As an emerging GPU-like accelerator,DCU has achieved some performance optimization for the Winograd algorithm,but it fails to fully exploit the Matrix Cores of DCU to further enhance the efficiency of Winograd convolution computations.This paper proposes an improved fused Winograd convolution optimization scheme that integrates all transformation stages into a single kernel,which is specifically designed to exploit the characteristics of Matrix Cores.In the input transformation stage,we design an efficient data reuse mechanism that reduces redundant global memory accesses.In the element-wise matrix multiplication stage,we transform Hadamard products into batched GEMMs,boosting computational intensity and complying with the data layout requirements of Matrix Cores.During kernel fusion,we eliminate shared memory bank conflicts by reorganizing thread layout and further introduce software pipelining to effectively mask memory access latency.The results show that our method achieves average speedups of 1.35×and 1.72×(up to 1.81×and 2.78×)over the Winograd and Implicit GEMM algorithms in MIOpen under FP16 mode,and 1.22×and 1.53×(up to 1.55×and 1.88×)under FP32 mode.
文摘概念认知学习是一种新兴的交叉研究热点领域,旨在通过模仿人类的认知过程不断学习新知识。然而,现有的概念认知学习模型通常忽略了概念中对象的局部差异性、概念空间的冗余性、概念可解释性等问题,导致模型认知偏差与有效信息利用不足。因此,提出一种融合隶属度与覆盖的模糊概念认知学习(fuzzy concept-cognitive learning model integrating membership degree and coverage,IMDC)模型。首先,为了提高概念外延的表征能力,引入一种带偏移阈值的隶属度函数探讨对象与概念之间的相关性,并构造隶属度矩阵,进一步将概念空间转化为模糊覆盖;其次,通过模糊β截集筛选高相关对象,结合覆盖率探索不同概念的地位,从而构建核心概念空间,以有效降低概念空间的冗余性,提高认知学习效率;然后,基于线索与核心概念之间的相似性实现概念分类;最后,采用十折交叉验证方法,将提出的模型与4种机器学习算法和2种概念认知算法进行对比。实验结果表明,该模型在14个数据集上的平均精度均高于其他对比算法,并且在不同数据集上的性能波动范围最小,此外,在查准率、查全率、F1值方面也保持领先优势,充分验证了该模型的可行性和有效性。