摘要
为解决软件工程数据量大、属性多且多为离散型数据的特点,提高软件工程数据的挖掘效率,寻求更快速、高效的聚类算法,提出了将基于核函数的模糊聚类算法应用于源代码挖掘;同时采用TF-IDF方法对离散型文本数据进行处理,解决了核模糊聚类算法不能对文本数据直接进行聚类的问题。将遗传算法与KFCM算法相结合,克服了KFCM只能求解局部极小值的问题。实验结果表明,改进的KFCM算法对软件工程数据的挖掘有很好的聚类效果,且有较高的效率。
It provides that Kernelized fuzzy C-means uses on the research of source code mining for solving the significant number of quantities,multiple attributes and most of the attributes are discrete data and improving the efficiency of mining software engineering data,also seeking faster and more effective cluster approaches;meanwhile,to solve the problem that the KFCM algorithm can not cluster text data directly,the TF-IDF method is used to process the discrete text data.Then we integrate KFCM and genetic algorithm to overcome the defect of only being able to obtain the local minimum value by KFCM.Finally,the experimental results illustrate the improved KFCM algorithm can achieve good clustering performance and high efficiency for software engineering data mining.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第10期2249-2252,共4页
Computer Engineering and Design
关键词
源代码挖掘
特征空间
核函数
遗传算法
目标函数
source code mining
feature space
kernel function
genetic algorithm
object function