期刊文献+

基于TCGA数据库不平衡数据的改进分类方法 被引量:2

Improved classification method based on imbalanced data of TCGA
在线阅读 下载PDF
导出
摘要 为解决癌症基因组图谱中DNA甲基化数据不平衡导致假阴率上升的问题,提出一种基于TCGA数据库不平衡数据的改进分类方法.使用合成少数类过采样技术和Tomek Link算法进行混合采样,解决数据不平衡问题.在此基础上,将经特征选择后的训练集数据输入改进模型进行训练、学习及分类.基于TCGA数据库6种癌症DNA甲基化数据的实验结果表明:改进方法对少数类样本的分类性能有显著提高,对多数类样本的分类性能也有一定的提升. In order to solve the problem that the DNA methylation data imbalance in cancer genomic map led to the increase in false negative rate,this paper proposed an improved classification method based on the imbalanced data of TCGA database,which used synthetic minority oversampling technique and Tomek Link algorithm for mixed sampling to resolve data imbalance problems.On this basis,the training set data after feature selection was input into the improved model for training,learning and classification.Based on DNA methylation data onto six cancers in the TCGA database,the experimental results showed that the classification performance of improved model was significantly improved for a few samples,and the performance of most samples was also improved.
作者 侯维岩 刘超 宋杨 孙燚 HOU Weiyan;LIU Chao;SONG Yang;SUN Yi(School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China;School of Mechatronic and Automation,Shanghai University,Shanghai 200072,China)
出处 《安徽大学学报(自然科学版)》 CAS 北大核心 2020年第1期37-43,共7页 Journal of Anhui University(Natural Science Edition)
基金 国家自然科学基金资助项目(61573237)。
关键词 DNA甲基化 数据不平衡 TCGA Tomek Link算法 DNA methylation data imbalance TCGA Tomek Link algorithm
  • 相关文献

参考文献1

二级参考文献4

共引文献9

同被引文献24

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部