期刊文献+
共找到443篇文章
< 1 2 23 >
每页显示 20 50 100
Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs 被引量:1
1
作者 Ruixing Wang Tongxiang Gu Ming Li 《Journal of Computer and Communications》 2017年第6期65-83,共19页
As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo a... As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices. 展开更多
关键词 sparse matrix-vector multiplication Performance Prediction GPU Normal DISTRIBUTION UNIFORM DISTRIBUTION
暂未订购
Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
2
作者 刘力 LiuLi Yang Guang wen 《High Technology Letters》 EI CAS 2013年第4期339-345,共7页
This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the ... This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage. 展开更多
关键词 sparse matrix multiplication cache miss SCALABILITY multi-core CPU GPU
在线阅读 下载PDF
Multiple Endmember Hyperspectral Sparse Unmixing Based on Improved OMP Algorithm 被引量:1
3
作者 Chunhui Zhao Haifeng Zhu +1 位作者 Shiling Cui Bin Qi 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2015年第5期97-104,共8页
In conventional linear spectral mixture analysis model,a class is represented by a single endmember.However,the intra-class spectral variability is usually very large,which makes it difficult to represent a class,and ... In conventional linear spectral mixture analysis model,a class is represented by a single endmember.However,the intra-class spectral variability is usually very large,which makes it difficult to represent a class,and in this case,it leads to incorrect unmixing results. Some proposed algorithms play a positive role in overcoming the endmember variability,but there are shortcomings on computation intensive,unsatisfactory unmixing results and so on. Recently,sparse regression has been applied to unmixing,assuming each mixed pixel can be expressed as a linear combination of only a few spectra in a spectral library. It is essentially the same as multiple endmember spectral unmixing. OMP( orthogonal matching pursuit),a sparse reconstruction algorithm,has advantages of simple structure and high efficiency. However,it does not take into account the constraints of abundance non-negativity and abundance sum-to-one( ANC and ASC),leading to undesirable unmixing results. In order to solve these issues,this paper presents an improved OMP algorithm( fully constraint OMP,FOMP) for multiple endmember hyperspectral sparse unmixing. The proposed algorithm overcomes the shortcomings of OMP,and on the other hand,it solves the problem of endmember variability.The ANC and ASC constraints are firstly added into the OMP algorithm,and then the endmember set is refined by the relative increase in root-mean-square-error( RMSE) to avoid over-fitting,finally pixels are unmixed by their optimal endmember set. The simulated and real hyperspectral data experiments show that FOPM unmixing results are ideally comparable and abundance RMSE reduces much lower than OMP and simple spectral mixture analysis( s SMA),and has a strong anti-noise performance. It proves that multiple endmember spectral mixture analysis is more reasonable. 展开更多
关键词 HYPERSPECTRAL image sparse representation multiplE ENDMEMBER spectral UNMIXING OMP ANC and ASC
在线阅读 下载PDF
Polar Coded Iterative Multiuser Detection for Sparse Code Multiple Access System 被引量:2
4
作者 Hang MU Youhua Tang +3 位作者 Li Li Zheng Ma Pingzhi Fan Weiqiang Xu 《China Communications》 SCIE CSCD 2018年第11期51-61,共11页
Polar coded sparse code multiple access(SCMA) system is conceived in this paper. A simple but new iterative multiuser detection framework is proposed, which consists of a message passing algorithm(MPA) based multiuser... Polar coded sparse code multiple access(SCMA) system is conceived in this paper. A simple but new iterative multiuser detection framework is proposed, which consists of a message passing algorithm(MPA) based multiuser detector and a soft-input soft-output(SISO) successive cancellation(SC) polar decoder. In particular, the SISO polar decoding process is realized by a specifically designed soft re-encoder, which is concatenated to the original SC decoder. This soft re-encoder is capable of reconstructing the soft information of the entire polar codeword based on previously detected log-likelihood ratios(LLRs) of information bits. Benefiting from the soft re-encoding algorithm, the resultant iterative detection strategy is able to obtain a salient coding gain. Our simulation results demonstrate that significant improvement in error performance is achieved by the proposed polar-coded SCMA in additive white Gaussian noise(AWGN) channels, where the performance of the conventional SISO belief propagation(BP) polar decoder aided SCMA, the turbo coded SCMA and the low-density parity-check(LDPC) coded SCMA are employed as benchmarks. 展开更多
关键词 iterative multiuser receiver polarcode sparse code multiple access (SCMA)
在线阅读 下载PDF
A quantum algorithm for Toeplitz matrix-vector multiplication
5
作者 高尚 杨宇光 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期248-253,共6页
Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quant... Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quantum algorithm for Toeplitz matrix-vector multiplication with time complexity O(κpolylogn),whereκand 2n are the condition number and the dimension of the circulant matrix extended from the Toeplitz matrix,respectively.For the case with an unknown generating function,we also give a corresponding non-asymptotic quantum version that eliminates the dependency on the L_(1)-normρof the displacement of the structured matrices.Due to the good use of the special properties of Toeplitz matrices,the proposed quantum algorithms are sufficiently accurate and efficient compared to the existing quantum algorithms under certain circumstances. 展开更多
关键词 quantum algorithm Toeplitz matrix-vector multiplication circulant matrix
原文传递
A NEW SUFFICIENT CONDITION FOR SPARSE RECOVERY WITH MULTIPLE ORTHOGONAL LEAST SQUARES
6
作者 Haifeng LI Jing ZHANG 《Acta Mathematica Scientia》 SCIE CSCD 2022年第3期941-956,共16页
A greedy algorithm used for the recovery of sparse signals,multiple orthogonal least squares(MOLS)have recently attracted quite a big of attention.In this paper,we consider the number of iterations required for the MO... A greedy algorithm used for the recovery of sparse signals,multiple orthogonal least squares(MOLS)have recently attracted quite a big of attention.In this paper,we consider the number of iterations required for the MOLS algorithm for recovery of a K-sparse signal x∈R^(n).We show that MOLS provides stable reconstruction of all K-sparse signals x from y=Ax+w in|6K/ M|iterations when the matrix A satisfies the restricted isometry property(RIP)with isometry constantδ_(7K)≤0.094.Compared with the existing results,our sufficient condition is not related to the sparsity level K. 展开更多
关键词 sparse signal recovery multiple orthogonal least squares(MOLS) sufficient condition restricted isometry property(RIP)
在线阅读 下载PDF
Sparse Code Multiple Access-Towards Massive Connectivity and Low Latency 5G Communications 被引量:3
7
作者 Lei Wang Xiuqiang Xu +2 位作者 Yiqun Wu Shuangshuang Xing Yan Chen 《电信网技术》 2015年第5期6-15,共10页
Sparse code multiple access(SCMA) is a novel non-orthogonal multiple access technology considered as a key component in 5G air interface design. In SCMA, the incoming bits are directly mapped to multi-dimensional cons... Sparse code multiple access(SCMA) is a novel non-orthogonal multiple access technology considered as a key component in 5G air interface design. In SCMA, the incoming bits are directly mapped to multi-dimensional constellation vectors known as SCMA codewords, which are then mapped onto blocks of physical resource elements in a sparse manner. The number of codewords that can be non-orthogonally multiplexed in each SCMA block is much larger than the number of resource elements therein, so the system is overloaded and can support larger number of users. The joint optimization of multi-dimensional modulation and low density spreading in SCMA codebook design ensures the SCMA receiver to recover the coded bits with high reliability and low complexity. The flexibility in design and the robustness in performance further prove SCMA to be a promising technology to meet the 5G communication demands such as massive connectivity and low latency transmissions. 展开更多
关键词 SCMA 电信技术 多址接入 编码
在线阅读 下载PDF
Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems
8
作者 Sina Mortazavi Reza Hosseini 《Journal of Electrical Engineering》 2018年第2期124-128,共5页
We consider the problem of constructing one sparse signal from a few measurements. This problem has been extensively addressed in the literature, providing many sub-optimal methods that assure convergence to a locally... We consider the problem of constructing one sparse signal from a few measurements. This problem has been extensively addressed in the literature, providing many sub-optimal methods that assure convergence to a locally optimal solution under specific conditions. There are a few measurements associated with every signal, where the size of each measurement vector is less than the sparse signal's size. All of the sparse signals have the same unknown support. We generalize an existing algorithm for the recovery of one sparse signal from a single measurement to this problem and analyze its performances through simulations. We also compare the construction performance with other existing algorithms. Finally, the proposed method also shows advantages over the OMP (Orthogonal Matching Pursuit) algorithm in terms of the computational complexity. 展开更多
关键词 sparse signal recovery iterative methods multiple measurements
在线阅读 下载PDF
Nonlinear industrial process fault diagnosis with latent label consistency and sparse Gaussian feature learning 被引量:1
9
作者 LI Xian-ling ZHANG Jian-feng +2 位作者 ZHAO Chun-hui DING Jin-liang SUN You-xian 《Journal of Central South University》 SCIE EI CAS CSCD 2022年第12期3956-3973,共18页
With the increasing complexity of industrial processes, the high-dimensional industrial data exhibit a strong nonlinearity, bringing considerable challenges to the fault diagnosis of industrial processes. To efficient... With the increasing complexity of industrial processes, the high-dimensional industrial data exhibit a strong nonlinearity, bringing considerable challenges to the fault diagnosis of industrial processes. To efficiently extract deep meaningful features that are crucial for fault diagnosis, a sparse Gaussian feature extractor(SGFE) is designed to learn a nonlinear mapping that projects the raw data into the feature space with the fault label dimension. The feature space is described by the one-hot encoding of the fault category label as an orthogonal basis. In this way, the deep sparse Gaussian features related to fault categories can be gradually learned from the raw data by SGFE. In the feature space,the sparse Gaussian(SG) loss function is designed to constrain the distribution of features to multiple sparse multivariate Gaussian distributions. The sparse Gaussian features are linearly separable in the feature space, which is conducive to improving the accuracy of the downstream fault classification task. The feasibility and practical utility of the proposed SGFE are verified by the handwritten digits MNIST benchmark and Tennessee-Eastman(TE) benchmark process,respectively. 展开更多
关键词 nonlinear fault diagnosis multiple multivariate Gaussian distributions sparse Gaussian feature learning Gaussian feature extractor
在线阅读 下载PDF
Sparse channel estimation for MIMO-OFDM systems using distributed compressed sensing 被引量:1
10
作者 刘翼 梅文博 +1 位作者 杜慧茜 汪宏宇 《Journal of Beijing Institute of Technology》 EI CAS 2016年第4期540-546,共7页
A sparse channel estimation method is proposed for doubly selective channels in multiple- input multiple-output ( MIMO ) orthogonal frequency division multiplexing ( OFDM ) systems. Based on the basis expansion mo... A sparse channel estimation method is proposed for doubly selective channels in multiple- input multiple-output ( MIMO ) orthogonal frequency division multiplexing ( OFDM ) systems. Based on the basis expansion model (BEM) of the channel, the joint-sparsity of MIMO-OFDM channels is described. The sparse characteristics enable us to cast the channel estimation as a distributed compressed sensing (DCS) problem. Then, a low complexity DCS-based estimation scheme is designed. Compared with the conventional compressed channel estimators based on the compressed sensing (CS) theory, the DCS-based method has an improved efficiency because it reconstructs the MIMO channels jointly rather than addresses them separately. Furthermore, the group-sparse structure of each single channel is also depicted. To effectively use this additional structure of the sparsity pattern, the DCS algorithm is modified. The modified algorithm can further enhance the estimation performance. Simulation results demonstrate the superiority of our method over fast fading channels in MIMO-OFDM systems. 展开更多
关键词 multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM distributed compressed sensing doubly selective channel group-sparse basis expansionmodel
在线阅读 下载PDF
Semi-Supervised Dimensionality Reduction of Hyperspectral Image Based on Sparse Multi-Manifold Learning
11
作者 Hong Huang Fulin Luo +1 位作者 Zezhong Ma Hailiang Feng 《Journal of Computer and Communications》 2015年第11期33-39,共7页
In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploit... In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images. 展开更多
关键词 HYPERSPECTRAL IMAGE Classification Dimensionality Reduction multiple MANIFOLDS Structure sparse REPRESENTATION SEMI-SUPERVISED Learning
在线阅读 下载PDF
A novel EO-based optimum random beamforming method in mmWave-NOMA systems with sparse antenna array
12
作者 Fatemeh Asghari Azhiri Behzad Mozaffari Tazehkand Reza Abdolee 《Digital Communications and Networks》 CSCD 2024年第5期1313-1321,共9页
Millimeter-wave(mmWave)Non-Orthogonal Multiple Access(NOMA)with random beamforming is a promising technology to guarantee massive connectivity and low latency transmissions of future generations of mobile networks.In ... Millimeter-wave(mmWave)Non-Orthogonal Multiple Access(NOMA)with random beamforming is a promising technology to guarantee massive connectivity and low latency transmissions of future generations of mobile networks.In this paper,we introduce a cost-effective and energy-efficient mmWave-NOMA system that exploits sparse antenna arrays in the transmitter.Our analysis shows that utilizing low-weight and small-sized sparse antennas in the Base Station(BS)leads to better outage probability performance.We also introduce an optimum low complexity Equilibrium Optimization(EO)-based algorithm to further improve the outage probability.The simulation and analysis results show that the systems equipped with sparse antenna arrays making use of optimum beamforming vectors outperform the conventional systems with uniform linear arrays in terms of outage probability and sum rates. 展开更多
关键词 BEAMFORMING Millimeter-wave communication Non-orthogonal multiple access sparse antenna arrays
在线阅读 下载PDF
Low complexity MIMO sonar imaging using a virtual sparse linear array
13
作者 Xionghou Liu Chao Sun +2 位作者 Yixin Yang Jie Zhuo Yina Han 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2016年第2期370-378,共9页
A multiple-input multiple-output(MIMO) sonar can synthesize a large-aperture virtual uniform linear array(ULA) from a small number of physical elements. However, the large aperture is obtained at the cost of a gre... A multiple-input multiple-output(MIMO) sonar can synthesize a large-aperture virtual uniform linear array(ULA) from a small number of physical elements. However, the large aperture is obtained at the cost of a great number of matched filters with much heavy computation load. To reduce the computation load, a MIMO sonar imaging method using a virtual sparse linear array(SLA) is proposed, which contains the offline and online processing. In the offline processing, the virtual ULA of the MIMO sonar is thinned to a virtual SLA by the simulated annealing algorithm, and matched filters corresponding to inactive virtual elements are removed. In the online processing, outputs of matched filters corresponding to active elements are collected for further multibeam processing and hence, the number of matched filters in the echo processing procedure is effectively reduced. Numerical simulations show that the proposed method can reduce the computation load effectively while obtaining a similar imaging performance as the traditional method. 展开更多
关键词 multiple-input multiple-output(MIMO) sonar simulated annealing sonar imaging sparse arrays
在线阅读 下载PDF
面向GPU的稀疏对角矩阵自适应SpMV优化方法
14
作者 王宇华 何俊飞 +2 位作者 张宇琪 兰海燕 曹林琳 《计算机工程》 北大核心 2026年第3期332-345,共14页
稀疏矩阵向量乘(SpMV)是稀疏线性系统的计算核心和瓶颈,其运算效率会影响迭代求解器的整体性能,其优化研究一直是科学计算和工程应用领域中的研究热点之一。偏微分方程的离散化会产生稀疏对角矩阵,由于其多样的非零元分布,导致没有一种... 稀疏矩阵向量乘(SpMV)是稀疏线性系统的计算核心和瓶颈,其运算效率会影响迭代求解器的整体性能,其优化研究一直是科学计算和工程应用领域中的研究热点之一。偏微分方程的离散化会产生稀疏对角矩阵,由于其多样的非零元分布,导致没有一种方法能够在所有矩阵中取得最优时间性能。针对上述问题,提出一种面向图形处理单元(GPU)的稀疏对角矩阵自适应SpMV优化方法AST(Adaptive SpMV Tuning)。该方法通过设计特征空间,构建特征提取器,提取矩阵结构精细特征,通过深入分析特征和SpMV方法的相关性,建立可扩展的候选方法集合,形成特征和最优方法的映射关系,构建性能预测工具,实现矩阵最优方法的高效预测。实验结果表明,AST能够取得85.8%的预测准确率,平均时间性能损失为0.09,相比于DIA(Diagonal)、HDIA(Hacked DIA)、HDC(Hybrid of DIA and Compressed Sparse Row)、DIA-Adaptive和DRM(Divide-Rearrange and Merge),能够获得平均20.19、1.86、3.06、3.72和1.53倍的内核运行时间加速和1.05、1.28、12.45、1.94和0.97倍的浮点运算性能加速。 展开更多
关键词 稀疏矩阵向量乘 稀疏对角矩阵 图形处理单元 自适应优化方法 矩阵结构特征
在线阅读 下载PDF
基于SGL惩罚函数的火箭发动机质量评估多重中介效应模型研究
15
作者 王育辉 徐思宁 +1 位作者 程湘钧 杨乾军 《航空兵器》 北大核心 2026年第1期123-134,共12页
针对液体火箭发动机质量评估中多维度指标关联复杂、传导路径不明确、关键要素筛选困难等问题,构建了融合稀疏群组Lasso(Sparse Group Lasso,SGL)惩罚函数与多重中介效应的结构方程模型。通过SGL惩罚函数的三层递进机制筛选关键中介路径... 针对液体火箭发动机质量评估中多维度指标关联复杂、传导路径不明确、关键要素筛选困难等问题,构建了融合稀疏群组Lasso(Sparse Group Lasso,SGL)惩罚函数与多重中介效应的结构方程模型。通过SGL惩罚函数的三层递进机制筛选关键中介路径,结合偏最小二乘结构方程模型(Partial Least Squares Structural Equation Modeling,PLS-SEM)实现参数估计与路径识别,并通过实例验证。结果表明,该模型决定系数R^(2)=0.726,能够解释任务可靠性裕度72.6%的变异,识别出燃烧性能→热管理性能→任务可靠性裕度、结构完整性→热管理性能→任务可靠性裕度、燃烧性能→控制系统性能→任务可靠性裕度3条关键中介路径,相比传统PLS-SEM、传统Lasso等方法在模型解释力和中介筛选精度上均显著提升。进一步通过运载火箭芯级发动机(液氧-煤油)、战术导弹动力装置(四氧化二氮-偏二甲肼)、深空探测着陆器发动机(液氧-甲烷)的真实数据验证,发现模型在常温常压常规工况下适配性良好,在低重力极端工况下适配性下降。该模型聚焦液体火箭发动机质量评估,为提升液体火箭发动机质量评估的准确性与可解释性提供了新路径。 展开更多
关键词 稀疏群组Lasso 液体火箭发动机 质量评估 多重中介效应 结构方程模型
在线阅读 下载PDF
基于三维反褶积理论的Radon域稀疏增强算法
16
作者 郭梦欣 陈思远 +1 位作者 时伟 王维红 《石油物探》 北大核心 2026年第2期224-233,共10页
Radon变换是压制多次波、实现地震反射波高精度成像的重要算法之一,其聚焦性的强弱会影响部分地震数据处理的效果。传统基于频率-曲率域L2范数约束的最小二乘Radon变换,受有限孔径影响,易产生剪刀状拖尾效应。而稀疏Radon变换虽通过变... Radon变换是压制多次波、实现地震反射波高精度成像的重要算法之一,其聚焦性的强弱会影响部分地震数据处理的效果。传统基于频率-曲率域L2范数约束的最小二乘Radon变换,受有限孔径影响,易产生剪刀状拖尾效应。而稀疏Radon变换虽通过变换域稀疏性假设获得了更高聚焦性的Radon域数据,但存在能量团收敛不足的问题。为此,联合地震子波与曲率方向平滑函数协同构建反褶积算子,对已有的Radon数据开展三维反褶积,基于交替方向乘子法(alternating direction method of multipliers,ADMM)的迭代求解,提出了基于三维反褶积理论的Radon域稀疏增强算法。该算法依据稀疏反演理论,通过去模糊机制在曲率维度实现能量压缩,显著提升了能量团的聚集度与分辨率。数值模拟与实测数据应用表明,相较于常规最小二乘Radon变换和时间域稀疏Radon变换,经反褶积稀疏增强处理后的Radon域能量团可辨识度得到有效提升,从而提高了多次波的识别和压制精度。 展开更多
关键词 RADON变换 反褶积 多项式保幅 多次波压制 稀疏反演
在线阅读 下载PDF
swDaCe:一种申威众核处理器上以数据为中心的并行编程模型设计与实现
17
作者 沈沛祺 陈俊仕 安虹 《小型微型计算机系统》 北大核心 2026年第3期751-759,共9页
高性能科学计算是超级计算机的核心应用领域,包括粒子模拟、气候分析等关键任务.然而,随着摩尔定律逐渐失效,超级计算机体系结构日益趋向异构和复杂,导致科学计算应用的开发和优化变得更加困难.为解决这一问题,本文基于新一代申威超级... 高性能科学计算是超级计算机的核心应用领域,包括粒子模拟、气候分析等关键任务.然而,随着摩尔定律逐渐失效,超级计算机体系结构日益趋向异构和复杂,导致科学计算应用的开发和优化变得更加困难.为解决这一问题,本文基于新一代申威超级计算平台,提出并实现了一种以数据为中心的并行编程模型——swDaCe.该模型通过解耦数据流图优化与原始程序,使得编程人员可以使用Python描述计算逻辑,并最终生成适配申威众核架构的高性能C++代码.此外,本文提出了一系列针对申威架构的数据流优化方法,包括从核任务映射、向量化并行以及DMA访存优化,以充分利用申威众核处理器的计算能力.实验结果表明,swDaCe生成的代码在稀疏矩阵计算等典型应用中实现了显著的性能提升,单核组加速比达到25倍以上,验证了该框架在申威架构上的有效性. 展开更多
关键词 新一代神威平台 异构众核处理器 数据流编程 并行计算 稀疏矩阵乘
在线阅读 下载PDF
BHDC:准对角阵高效SpMV的分块混合存储格式
18
作者 徐悦竹 赵泽煊 +1 位作者 邰宇浩 王宇华 《计算机应用与软件》 北大核心 2026年第2期118-126,共9页
利用科学计算解决工程领域的实际问题往往可以转化为大型线性方程组的求解,在这一过程中最常调用的步骤就是稀疏矩阵向量乘。对于工程中常见的稀疏准对角矩阵,提出结合DIA和CSR两种方式优点的分块混合存储方式BHDC,将原矩阵分成若干行段... 利用科学计算解决工程领域的实际问题往往可以转化为大型线性方程组的求解,在这一过程中最常调用的步骤就是稀疏矩阵向量乘。对于工程中常见的稀疏准对角矩阵,提出结合DIA和CSR两种方式优点的分块混合存储方式BHDC,将原矩阵分成若干行段,根据阈值将对角稠密区域和散点分别存储,既利用DIA存储方式下良好的浮点运算性能,又通过CSR存储方式避免对角线急剧增加而降低性能。在CUDA平台上选取若干稀疏矩阵进行测试,获得了优于上述两种存储方式的时空性能和优于不分块混合方式HDC的时间性能。 展开更多
关键词 准对角矩阵 稀疏矩阵向量乘 分块存储 CUDA
在线阅读 下载PDF
GPU上Tensor Core加速的共轭梯度解法器
19
作者 卢玥辰 袁雨萧 +1 位作者 杨德闯 刘伟峰 《电子科技大学学报》 北大核心 2026年第2期244-251,共8页
共轭梯度方法(CG)和稳定双共轭梯度方法(BiCGSTAB)是求解稀疏线性系统的两种经典且高效的迭代方法,被广泛应用于科学计算和工程问题中。尽管GPU等并行处理器提升了这两种方法的并行性,但最新的硬件单元Tensor Core及其计算能力尚未被用... 共轭梯度方法(CG)和稳定双共轭梯度方法(BiCGSTAB)是求解稀疏线性系统的两种经典且高效的迭代方法,被广泛应用于科学计算和工程问题中。尽管GPU等并行处理器提升了这两种方法的并行性,但最新的硬件单元Tensor Core及其计算能力尚未被用于这两种方法中。该文设计了一个Tensor Core加速的CG解法器,利用Tensor Core计算CG和BiCGSTAB方法中的关键组件稀疏矩阵−向量乘法(SpMV)和点积操作,以发挥Tensor Core的计算能力,从而提升两种方法的整体性能。在NVIDIA A100和H100 GPU上的实验结果表明,Tensor Core加速的这两种方法相比调用CUDA官方库的基准版本在多个稀疏矩阵上均取得了显著的加速效果。 展开更多
关键词 稀疏矩阵−向量乘法 点积 共轭梯度法 稳定双共轭梯度法 张量核心 图形处理单元
在线阅读 下载PDF
基于CNN和非负稀疏表示的嵌入式图像目标识别算法
20
作者 秦川 高翔 +1 位作者 龚道庆 邓雪莲 《吉林大学学报(理学版)》 北大核心 2026年第2期387-393,共7页
针对嵌入式系统的处理器运算速度和内存较小,从而限制了图像目标识别算法在嵌入式系统上运行效率和性能的问题,提出一种高性能的嵌入式图像目标识别算法,即卷积神经网络(convolutional neural network,CNN)和非负稀疏表示相结合的算法.... 针对嵌入式系统的处理器运算速度和内存较小,从而限制了图像目标识别算法在嵌入式系统上运行效率和性能的问题,提出一种高性能的嵌入式图像目标识别算法,即卷积神经网络(convolutional neural network,CNN)和非负稀疏表示相结合的算法.首先,利用CNN挖掘嵌入式图像特征,通过参数共享和局部感知性能降低模型的参数量和计算复杂度,提高计算效率;其次,通过Roberts交叉梯度滤波器对嵌入式图像进行卷积操作,先结合Sigmoid函数运算初步获得特征挖掘结果,再采用非线性池化法对结果下采样,从而降低特征挖掘结果的维度,完成图像特征挖掘任务;最后,使用非负稀疏表示法建立目标识别模型,根据乘性迭代算法求解系数稀疏系数向量.经过核函数运算和最小类残留运算确定目标区域.实验结果表明,该算法获得的各组图像识别结果的F 1值均稳定在0.98以上,且在嵌入式图像目标识别方面帧率较高,表明该方法在保持高精度识别性能的同时,具有在嵌入式系统上高效运行的能力. 展开更多
关键词 卷积神经网络 非负稀疏表示 嵌入式图像 乘性迭代算法
在线阅读 下载PDF
上一页 1 2 23 下一页 到第
使用帮助 返回顶部