期刊文献+
共找到431篇文章
< 1 2 22 >
每页显示 20 50 100
Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs 被引量:1
1
作者 Ruixing Wang Tongxiang Gu Ming Li 《Journal of Computer and Communications》 2017年第6期65-83,共19页
As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo a... As one of the most essential and important operations in linear algebra, the performance prediction of sparse matrix-vector multiplication (SpMV) on GPUs has got more and more attention in recent years. In 2012, Guo and Wang put forward a new idea to predict the performance of SpMV on GPUs. However, they didn’t consider the matrix structure completely, so the execution time predicted by their model tends to be inaccurate for general sparse matrix. To address this problem, we proposed two new similar models, which take into account the structure of the matrices and make the performance prediction model more accurate. In addition, we predict the execution time of SpMV for CSR-V, CSR-S, ELL and JAD sparse matrix storage formats by the new models on the CUDA platform. Our experimental results show that the accuracy of prediction by our models is 1.69 times better than Guo and Wang’s model on average for most general matrices. 展开更多
关键词 sparse matrix-vector multiplication Performance Prediction GPU Normal DISTRIBUTION UNIFORM DISTRIBUTION
暂未订购
Cache performance optimization of irregular sparse matrix multiplication on modern multi-core CPU and GPU
2
作者 刘力 LiuLi Yang Guang wen 《High Technology Letters》 EI CAS 2013年第4期339-345,共7页
This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the ... This paper focuses on how to optimize the cache performance of sparse matrix-matrix multiplication(SpGEMM).It classifies the cache misses into two categories;one is caused by the irregular distribution pattern of the multiplier-matrix,and the other is caused by the multiplicand.For each of them,the paper puts forward an optimization method respectively.The first hash based method removes cache misses of the 1 st category effectively,and improves the performance by a factor of 6 on an Intel 8-core CPU for the best cases.For cache misses of the 2nd category,it proposes a new cache replacement algorithm,which achieves a cache hit rate much higher than other historical knowledge based algorithms,and the algorithm is applicable on CELL and GPU.To further verify the effectiveness of our methods,we implement our algorithm on GPU,and the performance perfectly scales with the size of on-chip storage. 展开更多
关键词 sparse matrix multiplication cache miss SCALABILITY multi-core CPU GPU
在线阅读 下载PDF
Multiple Endmember Hyperspectral Sparse Unmixing Based on Improved OMP Algorithm 被引量:1
3
作者 Chunhui Zhao Haifeng Zhu +1 位作者 Shiling Cui Bin Qi 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2015年第5期97-104,共8页
In conventional linear spectral mixture analysis model,a class is represented by a single endmember.However,the intra-class spectral variability is usually very large,which makes it difficult to represent a class,and ... In conventional linear spectral mixture analysis model,a class is represented by a single endmember.However,the intra-class spectral variability is usually very large,which makes it difficult to represent a class,and in this case,it leads to incorrect unmixing results. Some proposed algorithms play a positive role in overcoming the endmember variability,but there are shortcomings on computation intensive,unsatisfactory unmixing results and so on. Recently,sparse regression has been applied to unmixing,assuming each mixed pixel can be expressed as a linear combination of only a few spectra in a spectral library. It is essentially the same as multiple endmember spectral unmixing. OMP( orthogonal matching pursuit),a sparse reconstruction algorithm,has advantages of simple structure and high efficiency. However,it does not take into account the constraints of abundance non-negativity and abundance sum-to-one( ANC and ASC),leading to undesirable unmixing results. In order to solve these issues,this paper presents an improved OMP algorithm( fully constraint OMP,FOMP) for multiple endmember hyperspectral sparse unmixing. The proposed algorithm overcomes the shortcomings of OMP,and on the other hand,it solves the problem of endmember variability.The ANC and ASC constraints are firstly added into the OMP algorithm,and then the endmember set is refined by the relative increase in root-mean-square-error( RMSE) to avoid over-fitting,finally pixels are unmixed by their optimal endmember set. The simulated and real hyperspectral data experiments show that FOPM unmixing results are ideally comparable and abundance RMSE reduces much lower than OMP and simple spectral mixture analysis( s SMA),and has a strong anti-noise performance. It proves that multiple endmember spectral mixture analysis is more reasonable. 展开更多
关键词 HYPERSPECTRAL image sparse representation multiplE ENDMEMBER spectral UNMIXING OMP ANC and ASC
在线阅读 下载PDF
Polar Coded Iterative Multiuser Detection for Sparse Code Multiple Access System 被引量:2
4
作者 Hang MU Youhua Tang +3 位作者 Li Li Zheng Ma Pingzhi Fan Weiqiang Xu 《China Communications》 SCIE CSCD 2018年第11期51-61,共11页
Polar coded sparse code multiple access(SCMA) system is conceived in this paper. A simple but new iterative multiuser detection framework is proposed, which consists of a message passing algorithm(MPA) based multiuser... Polar coded sparse code multiple access(SCMA) system is conceived in this paper. A simple but new iterative multiuser detection framework is proposed, which consists of a message passing algorithm(MPA) based multiuser detector and a soft-input soft-output(SISO) successive cancellation(SC) polar decoder. In particular, the SISO polar decoding process is realized by a specifically designed soft re-encoder, which is concatenated to the original SC decoder. This soft re-encoder is capable of reconstructing the soft information of the entire polar codeword based on previously detected log-likelihood ratios(LLRs) of information bits. Benefiting from the soft re-encoding algorithm, the resultant iterative detection strategy is able to obtain a salient coding gain. Our simulation results demonstrate that significant improvement in error performance is achieved by the proposed polar-coded SCMA in additive white Gaussian noise(AWGN) channels, where the performance of the conventional SISO belief propagation(BP) polar decoder aided SCMA, the turbo coded SCMA and the low-density parity-check(LDPC) coded SCMA are employed as benchmarks. 展开更多
关键词 iterative multiuser receiver polarcode sparse code multiple access (SCMA)
在线阅读 下载PDF
A novel EO-based optimum random beamforming method in mmWave-NOMA systems with sparse antenna array
5
作者 Fatemeh Asghari Azhiri Behzad Mozaffari Tazehkand Reza Abdolee 《Digital Communications and Networks》 CSCD 2024年第5期1313-1321,共9页
Millimeter-wave(mmWave)Non-Orthogonal Multiple Access(NOMA)with random beamforming is a promising technology to guarantee massive connectivity and low latency transmissions of future generations of mobile networks.In ... Millimeter-wave(mmWave)Non-Orthogonal Multiple Access(NOMA)with random beamforming is a promising technology to guarantee massive connectivity and low latency transmissions of future generations of mobile networks.In this paper,we introduce a cost-effective and energy-efficient mmWave-NOMA system that exploits sparse antenna arrays in the transmitter.Our analysis shows that utilizing low-weight and small-sized sparse antennas in the Base Station(BS)leads to better outage probability performance.We also introduce an optimum low complexity Equilibrium Optimization(EO)-based algorithm to further improve the outage probability.The simulation and analysis results show that the systems equipped with sparse antenna arrays making use of optimum beamforming vectors outperform the conventional systems with uniform linear arrays in terms of outage probability and sum rates. 展开更多
关键词 BEAMFORMING Millimeter-wave communication Non-orthogonal multiple access sparse antenna arrays
在线阅读 下载PDF
A quantum algorithm for Toeplitz matrix-vector multiplication
6
作者 高尚 杨宇光 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期248-253,共6页
Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quant... Toeplitz matrix-vector multiplication is widely used in various fields,including optimal control,systolic finite field multipliers,multidimensional convolution,etc.In this paper,we first present a non-asymptotic quantum algorithm for Toeplitz matrix-vector multiplication with time complexity O(κpolylogn),whereκand 2n are the condition number and the dimension of the circulant matrix extended from the Toeplitz matrix,respectively.For the case with an unknown generating function,we also give a corresponding non-asymptotic quantum version that eliminates the dependency on the L_(1)-normρof the displacement of the structured matrices.Due to the good use of the special properties of Toeplitz matrices,the proposed quantum algorithms are sufficiently accurate and efficient compared to the existing quantum algorithms under certain circumstances. 展开更多
关键词 quantum algorithm Toeplitz matrix-vector multiplication circulant matrix
原文传递
A NEW SUFFICIENT CONDITION FOR SPARSE RECOVERY WITH MULTIPLE ORTHOGONAL LEAST SQUARES
7
作者 Haifeng LI Jing ZHANG 《Acta Mathematica Scientia》 SCIE CSCD 2022年第3期941-956,共16页
A greedy algorithm used for the recovery of sparse signals,multiple orthogonal least squares(MOLS)have recently attracted quite a big of attention.In this paper,we consider the number of iterations required for the MO... A greedy algorithm used for the recovery of sparse signals,multiple orthogonal least squares(MOLS)have recently attracted quite a big of attention.In this paper,we consider the number of iterations required for the MOLS algorithm for recovery of a K-sparse signal x∈R^(n).We show that MOLS provides stable reconstruction of all K-sparse signals x from y=Ax+w in|6K/ M|iterations when the matrix A satisfies the restricted isometry property(RIP)with isometry constantδ_(7K)≤0.094.Compared with the existing results,our sufficient condition is not related to the sparsity level K. 展开更多
关键词 sparse signal recovery multiple orthogonal least squares(MOLS) sufficient condition restricted isometry property(RIP)
在线阅读 下载PDF
Sparse Code Multiple Access-Towards Massive Connectivity and Low Latency 5G Communications 被引量:3
8
作者 Lei Wang Xiuqiang Xu +2 位作者 Yiqun Wu Shuangshuang Xing Yan Chen 《电信网技术》 2015年第5期6-15,共10页
Sparse code multiple access(SCMA) is a novel non-orthogonal multiple access technology considered as a key component in 5G air interface design. In SCMA, the incoming bits are directly mapped to multi-dimensional cons... Sparse code multiple access(SCMA) is a novel non-orthogonal multiple access technology considered as a key component in 5G air interface design. In SCMA, the incoming bits are directly mapped to multi-dimensional constellation vectors known as SCMA codewords, which are then mapped onto blocks of physical resource elements in a sparse manner. The number of codewords that can be non-orthogonally multiplexed in each SCMA block is much larger than the number of resource elements therein, so the system is overloaded and can support larger number of users. The joint optimization of multi-dimensional modulation and low density spreading in SCMA codebook design ensures the SCMA receiver to recover the coded bits with high reliability and low complexity. The flexibility in design and the robustness in performance further prove SCMA to be a promising technology to meet the 5G communication demands such as massive connectivity and low latency transmissions. 展开更多
关键词 SCMA 电信技术 多址接入 编码
在线阅读 下载PDF
Modified Iterative Method for Recovery of Sparse Multiple Measurement Problems
9
作者 Sina Mortazavi Reza Hosseini 《Journal of Electrical Engineering》 2018年第2期124-128,共5页
We consider the problem of constructing one sparse signal from a few measurements. This problem has been extensively addressed in the literature, providing many sub-optimal methods that assure convergence to a locally... We consider the problem of constructing one sparse signal from a few measurements. This problem has been extensively addressed in the literature, providing many sub-optimal methods that assure convergence to a locally optimal solution under specific conditions. There are a few measurements associated with every signal, where the size of each measurement vector is less than the sparse signal's size. All of the sparse signals have the same unknown support. We generalize an existing algorithm for the recovery of one sparse signal from a single measurement to this problem and analyze its performances through simulations. We also compare the construction performance with other existing algorithms. Finally, the proposed method also shows advantages over the OMP (Orthogonal Matching Pursuit) algorithm in terms of the computational complexity. 展开更多
关键词 sparse signal recovery iterative methods multiple measurements
在线阅读 下载PDF
Design Framework of Unsourced Multiple Access for 6G Massive IoT 被引量:3
10
作者 Chunlin Yan Siying Lyu +2 位作者 Sen Wang Yuhong Huang Xiaodong Xu 《China Communications》 SCIE CSCD 2024年第1期1-12,共12页
In this paper,ambient IoT is used as a typical use case of massive connections for the sixth generation(6G)mobile communications where we derive the performance requirements to facilitate the evaluation of technical s... In this paper,ambient IoT is used as a typical use case of massive connections for the sixth generation(6G)mobile communications where we derive the performance requirements to facilitate the evaluation of technical solutions.A rather complete design of unsourced multiple access is proposed in which two key parts:a compressed sensing module for active user detection,and a sparse interleaver-division multiple access(SIDMA)module are simulated side by side on a same platform at balanced signal to noise ratio(SNR)operating points.With a proper combination of compressed sensing matrix,a convolutional encoder,receiver algorithms,the simulated performance results appear superior to the state-of-the-art benchmark,yet with relatively less complicated processing. 展开更多
关键词 channel coding compressed sensing massive Internet-of-Things(IoT) sparse interleaverdivision multiple access(SIDMA) the sixth generation(6G)mobile communications unsourced multiple access
在线阅读 下载PDF
Sparse channel estimation for MIMO-OFDM systems using distributed compressed sensing 被引量:1
11
作者 刘翼 梅文博 +1 位作者 杜慧茜 汪宏宇 《Journal of Beijing Institute of Technology》 EI CAS 2016年第4期540-546,共7页
A sparse channel estimation method is proposed for doubly selective channels in multiple- input multiple-output ( MIMO ) orthogonal frequency division multiplexing ( OFDM ) systems. Based on the basis expansion mo... A sparse channel estimation method is proposed for doubly selective channels in multiple- input multiple-output ( MIMO ) orthogonal frequency division multiplexing ( OFDM ) systems. Based on the basis expansion model (BEM) of the channel, the joint-sparsity of MIMO-OFDM channels is described. The sparse characteristics enable us to cast the channel estimation as a distributed compressed sensing (DCS) problem. Then, a low complexity DCS-based estimation scheme is designed. Compared with the conventional compressed channel estimators based on the compressed sensing (CS) theory, the DCS-based method has an improved efficiency because it reconstructs the MIMO channels jointly rather than addresses them separately. Furthermore, the group-sparse structure of each single channel is also depicted. To effectively use this additional structure of the sparsity pattern, the DCS algorithm is modified. The modified algorithm can further enhance the estimation performance. Simulation results demonstrate the superiority of our method over fast fading channels in MIMO-OFDM systems. 展开更多
关键词 multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM distributed compressed sensing doubly selective channel group-sparse basis expansionmodel
在线阅读 下载PDF
Semi-Supervised Dimensionality Reduction of Hyperspectral Image Based on Sparse Multi-Manifold Learning
12
作者 Hong Huang Fulin Luo +1 位作者 Zezhong Ma Hailiang Feng 《Journal of Computer and Communications》 2015年第11期33-39,共7页
In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploit... In this paper, we proposed a new semi-supervised multi-manifold learning method, called semi- supervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturally gives relative importance to the labeled ones through a graph-based methodology. Then it tries to extract discriminative features on each manifold such that the data points in the same manifold become closer. The effectiveness of the proposed multi-manifold learning algorithm is demonstrated and compared through experiments on a real hyperspectral images. 展开更多
关键词 HYPERSPECTRAL IMAGE Classification Dimensionality Reduction multiple MANIFOLDS Structure sparse REPRESENTATION SEMI-SUPERVISED Learning
在线阅读 下载PDF
Nonlinear industrial process fault diagnosis with latent label consistency and sparse Gaussian feature learning
13
作者 LI Xian-ling ZHANG Jian-feng +2 位作者 ZHAO Chun-hui DING Jin-liang SUN You-xian 《Journal of Central South University》 SCIE EI CAS CSCD 2022年第12期3956-3973,共18页
With the increasing complexity of industrial processes, the high-dimensional industrial data exhibit a strong nonlinearity, bringing considerable challenges to the fault diagnosis of industrial processes. To efficient... With the increasing complexity of industrial processes, the high-dimensional industrial data exhibit a strong nonlinearity, bringing considerable challenges to the fault diagnosis of industrial processes. To efficiently extract deep meaningful features that are crucial for fault diagnosis, a sparse Gaussian feature extractor(SGFE) is designed to learn a nonlinear mapping that projects the raw data into the feature space with the fault label dimension. The feature space is described by the one-hot encoding of the fault category label as an orthogonal basis. In this way, the deep sparse Gaussian features related to fault categories can be gradually learned from the raw data by SGFE. In the feature space,the sparse Gaussian(SG) loss function is designed to constrain the distribution of features to multiple sparse multivariate Gaussian distributions. The sparse Gaussian features are linearly separable in the feature space, which is conducive to improving the accuracy of the downstream fault classification task. The feasibility and practical utility of the proposed SGFE are verified by the handwritten digits MNIST benchmark and Tennessee-Eastman(TE) benchmark process,respectively. 展开更多
关键词 nonlinear fault diagnosis multiple multivariate Gaussian distributions sparse Gaussian feature learning Gaussian feature extractor
在线阅读 下载PDF
Low complexity MIMO sonar imaging using a virtual sparse linear array
14
作者 Xionghou Liu Chao Sun +2 位作者 Yixin Yang Jie Zhuo Yina Han 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2016年第2期370-378,共9页
A multiple-input multiple-output(MIMO) sonar can synthesize a large-aperture virtual uniform linear array(ULA) from a small number of physical elements. However, the large aperture is obtained at the cost of a gre... A multiple-input multiple-output(MIMO) sonar can synthesize a large-aperture virtual uniform linear array(ULA) from a small number of physical elements. However, the large aperture is obtained at the cost of a great number of matched filters with much heavy computation load. To reduce the computation load, a MIMO sonar imaging method using a virtual sparse linear array(SLA) is proposed, which contains the offline and online processing. In the offline processing, the virtual ULA of the MIMO sonar is thinned to a virtual SLA by the simulated annealing algorithm, and matched filters corresponding to inactive virtual elements are removed. In the online processing, outputs of matched filters corresponding to active elements are collected for further multibeam processing and hence, the number of matched filters in the echo processing procedure is effectively reduced. Numerical simulations show that the proposed method can reduce the computation load effectively while obtaining a similar imaging performance as the traditional method. 展开更多
关键词 multiple-input multiple-output(MIMO) sonar simulated annealing sonar imaging sparse arrays
在线阅读 下载PDF
联合功率域与码域NOMA‑OTFS系统研究 被引量:1
15
作者 周围 陈黎 +2 位作者 向曾 黎飞雨 余明明 《南京邮电大学学报(自然科学版)》 北大核心 2025年第2期13-19,共7页
正交时频空(OTFS)调制是一种新的6G候选波形,可在高移动性场景中提供可靠通信。为了更好地满足未来6G的需求,需要在高移动性场景中兼顾高频谱效率。然而现有功率域NOMA⁃OTFS(PD⁃NOMA⁃OTFS)和码域NOMA⁃OTFS(SCMA⁃OTFS)系统存在和速率低... 正交时频空(OTFS)调制是一种新的6G候选波形,可在高移动性场景中提供可靠通信。为了更好地满足未来6G的需求,需要在高移动性场景中兼顾高频谱效率。然而现有功率域NOMA⁃OTFS(PD⁃NOMA⁃OTFS)和码域NOMA⁃OTFS(SCMA⁃OTFS)系统存在和速率低、过载率(用户数/资源数)低和可支持用户数量少的问题。针对此问题,提出了联合功率域与码域NOMA⁃OTFS(PD⁃SCMA⁃OTFS)系统。该系统在发射端,根据信道增益高低将用户分为强、弱用户组,组内用户使用传统SCMA方式叠加,组间用户使用功率域NOMA思想叠加。在接收端,使用消息传递算法(MPA)和串行干扰消除(SIC)进行联合检测。通过对不同系统进行对比,结果表明,所提系统拥有更高的和速率,并且过载率是传统SCMA⁃OTFS和PD⁃NOMA⁃OTFS系统的1.5~2倍,可支持用户数量是传统系统的1.2~6倍。 展开更多
关键词 正交时频空 功率域非正交多址接入 稀疏码多址接入 和速率 过载率
在线阅读 下载PDF
基于监督学习的稀疏矩阵乘算法优选
16
作者 彭林 张鹏 +2 位作者 陈俊峰 唐滔 黄春 《计算机工程与科学》 北大核心 2025年第3期381-391,共11页
稀疏矩阵乘算法中主流的row-by-row计算公式上的SPA、HASH、ESC 3种稀疏矩阵乘实现算法,在对不同的稀疏矩阵进行计算时性能差异显著,在不同非零元规模上单一算法不总是能取得最佳性能,而且单一算法与最优选择存在明显差距。为此,提出了... 稀疏矩阵乘算法中主流的row-by-row计算公式上的SPA、HASH、ESC 3种稀疏矩阵乘实现算法,在对不同的稀疏矩阵进行计算时性能差异显著,在不同非零元规模上单一算法不总是能取得最佳性能,而且单一算法与最优选择存在明显差距。为此,提出了一种基于机器学习的最优稀疏矩阵乘算法选择模型,以给定矩阵集作为数据源,抽取稀疏矩阵的特征,并使用SPA、HASH、ESC计算获得的性能数据进行训练和验证,获得的模型能够仅使用稀疏矩阵的特征即可完成对新数据集的算法优选。实验结果表明,该模型可以获得91%以上的预测准确率,平均性能达到最优选择的98%,是单一算法性能的1.55倍以上,并且可在实际库函数中使用,具有良好的泛化能力和实用价值。 展开更多
关键词 稀疏矩阵乘 SpGEMM SPA算法 HASH算法 ESC算法 机器学习
在线阅读 下载PDF
多视角SAR图像联合决策的目标识别方法
17
作者 姚培娟 赵小龙 +3 位作者 李思逸 邵开丽 付辉 张亚娟 《探测与控制学报》 北大核心 2025年第5期137-143,共7页
以多视角SAR图像为输入提出一种自适应加权决策融合的合成孔径雷达(SAR)目标识别方法。采用联合稀疏表示对参与识别的多视角进行表征,得到相对应的重构误差矢量。基于熵理论对不同视角的误差矢量进行分析,评估其不确定性,据此定义对应... 以多视角SAR图像为输入提出一种自适应加权决策融合的合成孔径雷达(SAR)目标识别方法。采用联合稀疏表示对参与识别的多视角进行表征,得到相对应的重构误差矢量。基于熵理论对不同视角的误差矢量进行分析,评估其不确定性,据此定义对应的权值。利用自适应权值对不同视角的误差矢量进行加权融合,进而根据最终误差分布确定多视角SAR图像所属目标类别。自适应权值可更好发挥不同视角对于正确决策的贡献因而有利于提升SAR目标识别性能。基于MSTAR数据集设置4类场景开展对比实验,结果验证了提出方法的有效性。 展开更多
关键词 合成孔径雷达目标识别 自适应权值 多视角 联合稀疏表示
在线阅读 下载PDF
基于缓存数据重用的稀疏矩阵向量乘序列优化
18
作者 徐传福 邱昊中 车永刚 《计算机研究与发展》 北大核心 2025年第6期1434-1442,共9页
稀疏线性方程组求解等高性能计算应用常常涉及稀疏矩阵向量乘(SpMV)序列Ax,A2x,…,Asx的计算.上述SpMV序列操作又称为稀疏矩阵幂函数(matrix power kernel,MPK).由于MPK执行多次SpMV且稀疏矩阵保持不变,在缓存(cache)中重用稀疏矩阵,可... 稀疏线性方程组求解等高性能计算应用常常涉及稀疏矩阵向量乘(SpMV)序列Ax,A2x,…,Asx的计算.上述SpMV序列操作又称为稀疏矩阵幂函数(matrix power kernel,MPK).由于MPK执行多次SpMV且稀疏矩阵保持不变,在缓存(cache)中重用稀疏矩阵,可避免每次执行SpMV均从主存加载A,从而缓解SpMV访存受限问题,提升MPK性能.但缓存数据重用会导致相邻SpMV操作之间的数据依赖,现有MPK优化多针对单次SpMV调用,或在实现数据重用时引入过多额外开销.提出了缓存感知的MPK(cache-awareMPK,Ca-MPK),基于稀疏矩阵的依赖图,设计了体系结构感知的递归划分方法,将依赖图划分为适合缓存大小的子图/子矩阵,通过构建分割子图解耦数据依赖,根据特定顺序在子矩阵上调度执行SpMV,实现缓存数据重用.测试结果表明,Ca-MPK相对于Intel OneMKL库和最新MPK实现,平均性能提升分别多达约1.57倍和1.40倍. 展开更多
关键词 稀疏矩阵向量乘 矩阵幂函数 缓存数据重用 数据依赖 稀疏线性方程组求解
在线阅读 下载PDF
基于块压缩感知的大规模免授权多址接入方案
19
作者 张晶 马林 何艳 《南京邮电大学学报(自然科学版)》 北大核心 2025年第2期20-29,共10页
针对大规模机器终端零星突发传输场景,提出一种基于块压缩感知和自适应匹配追踪的大规模免授权多址接入方案。首先,将上行多址信号建模为逐帧稀疏结构压缩感知方程。然后,将上行多址信号重构问题转变为块稀疏结构的压缩感知多用户检测... 针对大规模机器终端零星突发传输场景,提出一种基于块压缩感知和自适应匹配追踪的大规模免授权多址接入方案。首先,将上行多址信号建模为逐帧稀疏结构压缩感知方程。然后,将上行多址信号重构问题转变为块稀疏结构的压缩感知多用户检测。最后,提出一种块稀疏模型自适应匹配追踪算法,完成多址信号检测,引入动态步长、动态剪枝和动态迭代3种自适应策略,提高多用户信号检测重构性能。仿真结果表明,所提方案极大降低了上行免授权多址接入传输的误码率,提高了无线网络的过载接入能力。 展开更多
关键词 大规模机器类型通信 免授权多址接入 压缩感知多用户检测 块稀疏模型 自适应匹配追踪
在线阅读 下载PDF
面向SW26010-Pro众核处理器的新型矩阵存储格式及稀疏矩阵向量乘(SpMV)算法研究
20
作者 王萃 刘芳芳 +2 位作者 马文静 赵玉文 胡力娟 《计算机学报》 北大核心 2025年第6期1290-1304,共15页
稀疏矩阵向量乘(Sparse Matrix-Vector Multiplication,SpMV)是高性能计算、人工智能大模型领域中的关键操作,其性能通常对应用程序整体性能的提升具有重要影响。高效的稀疏矩阵存储格式是影响SpMV性能的重要因素,然而,现有的稀疏矩阵... 稀疏矩阵向量乘(Sparse Matrix-Vector Multiplication,SpMV)是高性能计算、人工智能大模型领域中的关键操作,其性能通常对应用程序整体性能的提升具有重要影响。高效的稀疏矩阵存储格式是影响SpMV性能的重要因素,然而,现有的稀疏矩阵存储格式主要通过压缩零元素以减少访存,未充分利用非零元素的数值规律,因此仍有进一步压缩和优化的空间。本文通过对压缩稀疏行(Compressed Sparse Row,CSR)存储格式中非零元数组内的重复元素进行进一步的压缩,提出了一种新型的稀疏矩阵存储格式(Further Compressed Sparse Row,FCSR),并设计了从CSR到FCSR格式转换的异构并行算法,以尽量减少格式转换带来的开销。同时,本文面向SW26010-Pro众核处理器,设计了基于FCSR存储格式的SpMV异构并行算法,对SpMV进行了细粒度的任务划分和并行优化设计,探究了五种向量x的间接访存方式,并通过双缓冲技术对算法进行了优化。最后,本文选用SuiteSparse矩阵集中的稀疏矩阵进行了测试,实验结果表明,本文提出的基于FCSR存储格式的异构众核SpMV算法相较于主核版SpMV算法具有明显的性能提升,最高加速比达到43.11,平均加速比为7.56,测试矩阵最高带宽利用率达到了91.13%,平均带宽利用率为26.27%。另外,本文对基于FCSR存储格式和CSR存储格式的SpMV算法性能进行了比较,在两者均得到充分优化的前提下,基于FCSR存储格式的SpMV算法相较于基于CSR存储格式的SpMV算法性能的平均加速比达到1.19。 展开更多
关键词 稀疏矩阵向量乘 SW26010-Pro众核处理器 新型矩阵存储格式 并行优化 双缓冲技术
在线阅读 下载PDF
上一页 1 2 22 下一页 到第
使用帮助 返回顶部