摘要
凋谢蛋白亚细胞定位预测是研究凋谢蛋白生物功能的1种重要的方法,也是生物信息学研究的重要领域之一。提高凋谢蛋白亚细胞定位预测模型准确性和实用性是该研究的重点。在本研究中,提出了以模糊K近邻分类算法作为基础分类器的集成分类算法。以蛋白质序列内不同间隔的二肽组成表示基本的蛋白质序列的特征集合,采用二进制粒子群算法作为特征选择方法提取能够有效的蛋白质序列特征。这些经过特征选择后的蛋白质序列特征作为集成分类算法中每一个基础分类器的输入向量。经过在2个常用的数据集上使用Jackknife测试,本文算法在CL317数据集上取得了91.5%的预测准确率,在ZW225数据集上取得了88.0%的准确率。与前人报道的算法预测结果比较,本文方法取得了较好的准确率。与使用相同数据集的已经报道凋谢蛋白亚细胞定位预测算法相比,本研究方法取得了预测准确率。
Predicting subcellular localization ofapoptosis protein is an important method to identify the biological function of a newly found protein sequence. Meanwhile, the research field is a hot field in bioinformatics. The prediction model with more accuracy, the model is better. In this study, an ensemble classifier based approach to predict algorithm is presented which aim to prediction apoptosis protein subcellular localization with feature selection. The binary particle swarm optimization is used as feature selection method, and the fuzzy K nearest neighbor classifier is applied as base classifier in ensemble one. Jackknife test is used to validate the performance of this approach. On the dataset of CL317, the accuracy rate is 91.5%, and accuracy rate is 88.0% on the dataset of ZW225. Compared with the results of reported piror works with the same datasets, presented algorithm achieves promising results.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2010年第5期645-648,共4页
Computers and Applied Chemistry
关键词
凋谢蛋白
亚细胞定位预测
二进制粒子群算法
特征选择
apoptosis protein, prediction of subcellular location, binary particle swarm optimization, feature selection