K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper propo...K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.展开更多
A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also con...A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.展开更多
An information system is a type of knowledge representation,and attribute reduction is crucial in big data,machine learning,data mining,and intelligent systems.There are several ways for solving attribute reduction pr...An information system is a type of knowledge representation,and attribute reduction is crucial in big data,machine learning,data mining,and intelligent systems.There are several ways for solving attribute reduction problems,but they all require a common categorization.The selection of features in most scientific studies is a challenge for the researcher.When working with huge datasets,selecting all available attributes is not an option because it frequently complicates the study and decreases performance.On the other side,neglecting some attributes might jeopardize data accuracy.In this case,rough set theory provides a useful approach for identifying superfluous attributes that may be ignored without sacrificing any significant information;nonetheless,investigating all available combinations of attributes will result in some problems.Furthermore,because attribute reduction is primarily a mathematical issue,technical progress in reduction is dependent on the advancement of mathematical models.Because the focus of this study is on the mathematical side of attribute reduction,we propose some methods to make a reduction for information systems according to classical rough set theory,the strength of rules and similarity matrix,we applied our proposed methods to several examples and calculate the reduction for each case.These methods expand the options of attribute reductions for researchers.展开更多
融合丰富度、均匀度与差异度的多维测度指标已成为衡量跨学科性的重要工具,然而,针对差异度对应的学科相似性度量方法对跨学科测度的具体影响尚缺乏系统讨论。为此,文章将学科分类层级、归一化系数、引用模式和时间窗口视为学科相似性...融合丰富度、均匀度与差异度的多维测度指标已成为衡量跨学科性的重要工具,然而,针对差异度对应的学科相似性度量方法对跨学科测度的具体影响尚缺乏系统讨论。为此,文章将学科分类层级、归一化系数、引用模式和时间窗口视为学科相似性度量的四个关键要素,通过构建相应的学科相似性矩阵,系统剖析各要素对跨学科测度结果的作用;在不同要素设定下,基于1981—2020年Web of Science收录的研究论文及其参考文献和施引文献数据构建学科相似性矩阵,并选取八本代表性期刊的论文展开实证分析。结果表明:不同层级的学科分类方式对学科相似性度量具有显著影响,进而导致跨学科测度结果存在系统性差异;在4种归一化系数下,运用Ochiai余弦所测得的学科相似性更低、跨学科性区分度更高;在不同引用模式和时间窗口设定下,各学科相似性度量结果对跨学科测度的影响并不显著。展开更多
A formula to compute the similarity between two audio feature vectors is proposed, which can map arbitrary pair of vectors with equivalent dimension to [0,1). To fulfill the task of audio segmentation, a self-similar...A formula to compute the similarity between two audio feature vectors is proposed, which can map arbitrary pair of vectors with equivalent dimension to [0,1). To fulfill the task of audio segmentation, a self-similarity matrix is computed to reveal the inner structure of an audio clip to be segmented. As the final result must be consistent with the subjective evaluation and be adaptive to some special applications, a set of weights is adopted, which can be modified through relevance feedback techniques. Experiments show that satisfactory result can be achieved via the algorithm proposed in this paper.展开更多
As far as the problem of intuitionistic fuzzy cluster analysis is concerned, this paper proposes a new formula of similarity degree with attribute weight of each index. We conduct a fuzzy cluster analysis based on the...As far as the problem of intuitionistic fuzzy cluster analysis is concerned, this paper proposes a new formula of similarity degree with attribute weight of each index. We conduct a fuzzy cluster analysis based on the new intuitionistic fuzzy similarity matrix, which is constructed via this new weighted similarity degree method and can be transformed into a fuzzy similarity matrix. Moreover, an example is given to demonstrate the feasibility and validity of this method.展开更多
For a particular clustering problems, selecting the best clustering method is a challenging problem.Research suggests that integrate the multiple clustering can improve the accuracy of clustering ensemble greatly. A n...For a particular clustering problems, selecting the best clustering method is a challenging problem.Research suggests that integrate the multiple clustering can improve the accuracy of clustering ensemble greatly. A new clustering ensemble approach based on the similarities in 2-mode networks is proposed in this paper. First of all, the data object and the initial clustering clusters transform into 2-mode networks, then using the similarities in 2-mode networks to calculate the similarity between different clusters iteratively to refine the adjacency matrix, K-means algorithm is finally used to get the final clustering, then obtain the final clustering results.The method effectively use the similarity between different clusters,example shows the feasibility of this method.展开更多
针对数据稀疏场景下,基于共享的跨域推荐模型未能有效获取和传递跨域信息,导致用户偏好未能充分迁移以及推荐效率降低的问题,提出了结合图神经网络和多头注意力机制的跨域推荐模型(cross-domain recommendation model based on graph ne...针对数据稀疏场景下,基于共享的跨域推荐模型未能有效获取和传递跨域信息,导致用户偏好未能充分迁移以及推荐效率降低的问题,提出了结合图神经网络和多头注意力机制的跨域推荐模型(cross-domain recommendation model based on graph neural network and multi-head attention mechanism,GMACDR)。首先,利用节点到向量(node to vector,Node2Vec)算法进行作图嵌入,通过随机游走生成节点传播路径,并根据跳字(Skip-gram)模型生成高相似性的节点表示。然后,在图卷积神经网络的信息传播过程中,利用多头注意力机制,动态调整不同领域信息的传播权重,捕捉用户物品的深层次联系。最后,利用动态加权机制对领域特征加权求和,以改进特征融合过程,生成更具代表性的用户嵌入。结果表明,GMACDR在多个跨域推荐数据集上优于传统方法,对比最优基线模型,命中率最高提升了4.61%,归一化折损累计增益最高提升了10.47%,平均倒数排名最高提升了9.71%。研究结果证明该模型能够为用户提供更精准的推荐结果。展开更多
文摘K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional al- gorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The im- proved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
基金Sponsored bythe Huo Ying-Dong Education Foundation of China(91101)
文摘A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.
文摘An information system is a type of knowledge representation,and attribute reduction is crucial in big data,machine learning,data mining,and intelligent systems.There are several ways for solving attribute reduction problems,but they all require a common categorization.The selection of features in most scientific studies is a challenge for the researcher.When working with huge datasets,selecting all available attributes is not an option because it frequently complicates the study and decreases performance.On the other side,neglecting some attributes might jeopardize data accuracy.In this case,rough set theory provides a useful approach for identifying superfluous attributes that may be ignored without sacrificing any significant information;nonetheless,investigating all available combinations of attributes will result in some problems.Furthermore,because attribute reduction is primarily a mathematical issue,technical progress in reduction is dependent on the advancement of mathematical models.Because the focus of this study is on the mathematical side of attribute reduction,we propose some methods to make a reduction for information systems according to classical rough set theory,the strength of rules and similarity matrix,we applied our proposed methods to several examples and calculate the reduction for each case.These methods expand the options of attribute reductions for researchers.
文摘融合丰富度、均匀度与差异度的多维测度指标已成为衡量跨学科性的重要工具,然而,针对差异度对应的学科相似性度量方法对跨学科测度的具体影响尚缺乏系统讨论。为此,文章将学科分类层级、归一化系数、引用模式和时间窗口视为学科相似性度量的四个关键要素,通过构建相应的学科相似性矩阵,系统剖析各要素对跨学科测度结果的作用;在不同要素设定下,基于1981—2020年Web of Science收录的研究论文及其参考文献和施引文献数据构建学科相似性矩阵,并选取八本代表性期刊的论文展开实证分析。结果表明:不同层级的学科分类方式对学科相似性度量具有显著影响,进而导致跨学科测度结果存在系统性差异;在4种归一化系数下,运用Ochiai余弦所测得的学科相似性更低、跨学科性区分度更高;在不同引用模式和时间窗口设定下,各学科相似性度量结果对跨学科测度的影响并不显著。
文摘A formula to compute the similarity between two audio feature vectors is proposed, which can map arbitrary pair of vectors with equivalent dimension to [0,1). To fulfill the task of audio segmentation, a self-similarity matrix is computed to reveal the inner structure of an audio clip to be segmented. As the final result must be consistent with the subjective evaluation and be adaptive to some special applications, a set of weights is adopted, which can be modified through relevance feedback techniques. Experiments show that satisfactory result can be achieved via the algorithm proposed in this paper.
文摘As far as the problem of intuitionistic fuzzy cluster analysis is concerned, this paper proposes a new formula of similarity degree with attribute weight of each index. We conduct a fuzzy cluster analysis based on the new intuitionistic fuzzy similarity matrix, which is constructed via this new weighted similarity degree method and can be transformed into a fuzzy similarity matrix. Moreover, an example is given to demonstrate the feasibility and validity of this method.
文摘For a particular clustering problems, selecting the best clustering method is a challenging problem.Research suggests that integrate the multiple clustering can improve the accuracy of clustering ensemble greatly. A new clustering ensemble approach based on the similarities in 2-mode networks is proposed in this paper. First of all, the data object and the initial clustering clusters transform into 2-mode networks, then using the similarities in 2-mode networks to calculate the similarity between different clusters iteratively to refine the adjacency matrix, K-means algorithm is finally used to get the final clustering, then obtain the final clustering results.The method effectively use the similarity between different clusters,example shows the feasibility of this method.
文摘针对数据稀疏场景下,基于共享的跨域推荐模型未能有效获取和传递跨域信息,导致用户偏好未能充分迁移以及推荐效率降低的问题,提出了结合图神经网络和多头注意力机制的跨域推荐模型(cross-domain recommendation model based on graph neural network and multi-head attention mechanism,GMACDR)。首先,利用节点到向量(node to vector,Node2Vec)算法进行作图嵌入,通过随机游走生成节点传播路径,并根据跳字(Skip-gram)模型生成高相似性的节点表示。然后,在图卷积神经网络的信息传播过程中,利用多头注意力机制,动态调整不同领域信息的传播权重,捕捉用户物品的深层次联系。最后,利用动态加权机制对领域特征加权求和,以改进特征融合过程,生成更具代表性的用户嵌入。结果表明,GMACDR在多个跨域推荐数据集上优于传统方法,对比最优基线模型,命中率最高提升了4.61%,归一化折损累计增益最高提升了10.47%,平均倒数排名最高提升了9.71%。研究结果证明该模型能够为用户提供更精准的推荐结果。