摘要
由于低分化肿瘤很难通过常规组织病理学诊断发现,而结合基因检测的手段可以准确筛选出针对特定肿瘤的致病基因,因此基因选择是进行肿瘤分类和临床治疗的关键问题.肿瘤基因表达数据具有样本小、维度高的特征,现有的基因选择算法在分类精度和计算效率上还有待提高.在模糊粗糙集理论的基础上进行区分矩阵模糊化,并依此设计了模糊区分矩阵属性约简算法.相比于经典的区分矩阵,模糊化的区分矩阵能够体现不同属性对于两个对象区分程度的差异,从而选择区分程度更高的属性而获得更好的分类效果.数值实验表明该方法提高了肿瘤基因数据的分类精度,且降低了计算耗时.实验采用kNN分类器进行结直肠癌(Colon Microarray)分类特征基因选择实验,从2000个特征基因中筛选出了五个结直肠癌发病相关的关键基因,且分类精度高达88.06%。
Since poorly differentiated tumors are difficult to be diagnosed by conventional histopathology,through gene selection can accurate screen disease.causing genes for specific tumors,therefore gene selection has become a key issue in tumor classification and clinical treatment. Tumor gene expression data usually contains thousands of genes but a small number of samples. On the basis of fuzzy rough set theory, the concept of discernibility matrix fuzzification is proposed in this paper. Compared with the classical discernibility matrix, the fuzzy discernibility matrix can reflect the difference in the degree of the two objects distinguished by different attributes,so that the attributes with higher degree of distinction can be selected for better classification effect. Numerical experiments show that this method improves the classification accuracy of tumor gene data and reduces the computation time. In this study,kNN classifier was used for the gene selection of Colon cancer (Colon Microarray),five key genes related to Colon cancer were screened from 2000 feature genes and the classification accuracy was as high as 88.06%.
作者
李藤
杨田
代建华
陈鸰
Li Teng;Yang Tian;Dai Jianhua;Chen Ling(College of Logistics and Transportation,Central South University of Forestry and Technology,Changsha,410004,China;Hunan Provincial Science and Technology Project Foundation,Hunan Normal University,Changsha,410081,China;Xiangya Hospital of Central South University,Changsha,410008,China;College of Systems Engineering,University of Defense Science and Technology,Changsha,410073,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2019年第4期633-643,共11页
Journal of Nanjing University(Natural Science)
基金
中国博士后科学基金(2017T100795)
湖南省自然科学基金(2017JJ2408)
湖南省重点研发计划(2018SK2129)
关键词
模糊粗糙集
粗糙集
模糊区分矩阵
基因选择
fuzzy rough sets
rough sets
fuzzy discernibility matrix
gene selection