摘要
生物信息学中,发现、鉴别新基因是承上启下的一步,它既承接了过往如“基因组测序”的工作,又是未来“后基因时代”研究的基石.“基因电脑克隆”是利用计算手段发现、鉴别新基因的方法,SiClone软件实现了“基因电脑克隆”功能.本文对SiClone软件操作的数据库提出并行处理方案,并详述了基于MPI(message passing interface)平台实现的并行优化版本PSiClone.根据已得到的EST数据库,展示了软件并行版PSiClone的运行性能,试验数据库EST序列条数仅仅是NCBI(The National Center for Biotechnology Information)dbEST庞大数据库的很小部分,这也暗示我们软件的并行工作对于大数据库的比较和运算将更有应用前景.
In Bioinformatics, it is a consecutive step for finding and identifying new genes, which keeps the genome sequencing work and is the unique basis for post-genome period to analyze gene function. In this paper, the method silico gene cloning of SiClone software is introduced for dealing with identifying new genes. The parallel programming scheme is proposed for SiClone to use EST database. An optimized parallel MPI version of SiClone, PSiClone, is implemented in detail. The performance of PSiClone is measured by a given EST database, which is a small part of NCBI dbEST. The performance shows that PSiClone will be suitable to comparison and manipulation for large database.
出处
《生物数学学报》
CSCD
北大核心
2006年第4期619-626,共8页
Journal of Biomathematics
基金
国家自然科学基金"当代并行机的并行算法应用基础研究"(NO.60533020)
国家自然科学基金"特征信息发现的并行算法及实现研究"(NO.60673064).