In this paper, we attempt to understand complex network evolution from the underlying evolutionary relationship between biological organisms. Firstly, we construct a Pfam domain interaction network for each of the 470...In this paper, we attempt to understand complex network evolution from the underlying evolutionary relationship between biological organisms. Firstly, we construct a Pfam domain interaction network for each of the 470 completely sequenced organisms, and therefore each organism is correlated with a specific Pfam domain interaction network; secondly, we infer the evolutionary relationship of these organisms with the nearest neighbour joining method; thirdly, we use the evolutionary relationship between organisms constructed in the second step as the evolutionary course of the Pfam domain interaction network constructed in the first step. This analysis of the evolutionary course shows: (i) there is a conserved sub-network structure in network evolution; in this sub-network, nodes with lower degree prefer to maintain their connectivity invariant, and hubs tend to maintain their role as a hub is attached preferentially to new added nodes; (ii) few nodes are conserved as hubs; most of the other nodes are conserved as one with very low degree; (iii) in the course of network evolution, new nodes are added to the network either individually in most cases or as clusters with relative high clustering coefficients in a very few cases.展开更多
Alzheimer disease has been defined as Type 3 Diabetes due to their shared metabolic profiles. Like our previously research, results of Alzheimer’s disease and other neurodegenerative diseases, systematic analysis of ...Alzheimer disease has been defined as Type 3 Diabetes due to their shared metabolic profiles. Like our previously research, results of Alzheimer’s disease and other neurodegenerative diseases, systematic analysis of diabetes- and glucose metabolism-related proteins also provides help in the treatment of Alzheimer’s patients. Some interesting results indicate that diabetes-related proteins (DRPs) are rich in Lys and the content of Trp can distinguish between type 1 and type 2 diabetes mellitus in particular, while glucose metabolism-related proteins (GMRPs) possess Leurich and Trp-poor character. Moreover, the usage biases of codons depend on GC contents to a great extent, in concord with all codons of the highly expressed genes with the terminal of C/G. Especially, the deficit of CpG dinucleotides is largely attributed to the hypermutability of methylated CpGs to UpGs by the mutational pressure. Besides a common node insulin receptor, there are some similar node proteins, such as glucose transporter member, protein tyrosine phosphatase, and adipose metabolism signal protein. The sharing proteins involve glucagon, amylin, insulin, PPARγ, angiopoietin, PC-1/ENPP1, and adiponectin mediated signal pathway. Meanwhile, the gene sequences of node proteins contained the binding sites of 37 transcription factors divide into four kinds of superclasses. Additionally, BAD complex can integrate pathways of glucose metabolism and apoptosis by BH3 domain of BAD directly interacting with GK as well as GK binding with the consensus motif [G]-[1]-[K]-[2]-[S/T] or [L/M]-[R/K]-[2]-[T] of PP1 or WAVE1. This facilitates the therapies for diabetes mellitus as well as Alzheimer’s disease.展开更多
Evidence shows that biological systems are composed of separable functional modules. Identifying protein complexes is essential for understanding the principles of cellular functions. Many methods have been proposed t...Evidence shows that biological systems are composed of separable functional modules. Identifying protein complexes is essential for understanding the principles of cellular functions. Many methods have been proposed to mine protein complexes from protein-protein interaction networks. However, the performances of these algorithms are not good enough since the protein-protein interactions detected from experiments are not complete and have noise. This paper presents an analysis of the topological properties of protein complexes to show that although proteins from the same complex are more highly connected than proteins from different complexes, many protein complexes are not very dense (density ≥0.8). A method is then given to mine protein complexes that are relatively dense (density ≥0.4). In the first step, a topology property is used to identify proteins that are probably in a same complex. Then, a possible boundary is calculated based on a minimum vertex cut for the protein complex. The final complex is formed by the proteins within the boundary. The method is validated on a yeast protein-protein interaction network. The results show that this method has better performance in terms of sensitivity and specificity compared with other methods. The functional consistency is also good.展开更多
为了解决目前的关键蛋白质预测方法对生物功能的分析不够深入的情况,利用蛋白质复合物信息,提出1种基于随机游走模型,结合蛋白质相互作用网络中的边聚集系数等数据来预测关键蛋白质的RWP(random walk method for predicting essential p...为了解决目前的关键蛋白质预测方法对生物功能的分析不够深入的情况,利用蛋白质复合物信息,提出1种基于随机游走模型,结合蛋白质相互作用网络中的边聚集系数等数据来预测关键蛋白质的RWP(random walk method for predicting essential proteins)算法。在酿酒酵母(Saccharomyces cerevisiae)蛋白质相互作用网络上,以敏感度、特异性、阳性预测值、阴性预测值、准确率等5个统计学指标为评价标准,将RWP与介数中心性、度中心性、信息中心性、CSC算法及LIDC算法等5种用于预测关键蛋白质的方法进行对比实验。结果表明:RWP在关键蛋白质识别率等方面优于这5种测度方法,它具有较好的预测关键蛋白质的性能。展开更多
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 70671089 and 30871521)the State Key Program of National Natural Science of China (Grant No. 10635040)
文摘In this paper, we attempt to understand complex network evolution from the underlying evolutionary relationship between biological organisms. Firstly, we construct a Pfam domain interaction network for each of the 470 completely sequenced organisms, and therefore each organism is correlated with a specific Pfam domain interaction network; secondly, we infer the evolutionary relationship of these organisms with the nearest neighbour joining method; thirdly, we use the evolutionary relationship between organisms constructed in the second step as the evolutionary course of the Pfam domain interaction network constructed in the first step. This analysis of the evolutionary course shows: (i) there is a conserved sub-network structure in network evolution; in this sub-network, nodes with lower degree prefer to maintain their connectivity invariant, and hubs tend to maintain their role as a hub is attached preferentially to new added nodes; (ii) few nodes are conserved as hubs; most of the other nodes are conserved as one with very low degree; (iii) in the course of network evolution, new nodes are added to the network either individually in most cases or as clusters with relative high clustering coefficients in a very few cases.
文摘Alzheimer disease has been defined as Type 3 Diabetes due to their shared metabolic profiles. Like our previously research, results of Alzheimer’s disease and other neurodegenerative diseases, systematic analysis of diabetes- and glucose metabolism-related proteins also provides help in the treatment of Alzheimer’s patients. Some interesting results indicate that diabetes-related proteins (DRPs) are rich in Lys and the content of Trp can distinguish between type 1 and type 2 diabetes mellitus in particular, while glucose metabolism-related proteins (GMRPs) possess Leurich and Trp-poor character. Moreover, the usage biases of codons depend on GC contents to a great extent, in concord with all codons of the highly expressed genes with the terminal of C/G. Especially, the deficit of CpG dinucleotides is largely attributed to the hypermutability of methylated CpGs to UpGs by the mutational pressure. Besides a common node insulin receptor, there are some similar node proteins, such as glucose transporter member, protein tyrosine phosphatase, and adipose metabolism signal protein. The sharing proteins involve glucagon, amylin, insulin, PPARγ, angiopoietin, PC-1/ENPP1, and adiponectin mediated signal pathway. Meanwhile, the gene sequences of node proteins contained the binding sites of 37 transcription factors divide into four kinds of superclasses. Additionally, BAD complex can integrate pathways of glucose metabolism and apoptosis by BH3 domain of BAD directly interacting with GK as well as GK binding with the consensus motif [G]-[1]-[K]-[2]-[S/T] or [L/M]-[R/K]-[2]-[T] of PP1 or WAVE1. This facilitates the therapies for diabetes mellitus as well as Alzheimer’s disease.
文摘蛋白质复合体在细胞生物学过程中起着关键作用,对理解细胞功能和生物过程的识别至关重要。在蛋白质-蛋白质相互作用(Protein-Protein Interaction,PPI)网络中采用网络聚类识别蛋白质复合体已经成为数据挖掘与生物信息学的研究热点,各种计算方法被提出用于识别蛋白质复合体。然而,大多数方法仅利用原始网络来挖掘密集子图或子网络,未能突破传统图结构对多节点交互关系的局限。针对生物网络中普遍存在的多对多复杂交互特性问题,提出基于超图网络嵌入的蛋白质复合体识别算法(Protein Complex Identification Method Based on Hypergraph Network Embedding,PCIHNE)。该算法首先利用超图网络对多元关系的直接建模能力,将原始PPI网络转换为超图网络。其次,对超图网络采用分层压缩策略递归地压缩为多个不同层次的较小超图,以此构建多尺度分析框架。再次,将超图卷积应用于不同层次,得到每个节点在不同尺度下的表示。将这些节点表示进行连接,得到完整的节点嵌入表示。基于节点嵌入表示,在低阶原始网络上构建加权PPI网络。最后,在加权PPI网络上采用基于核心附属策略,得到预测的蛋白质复合体。在多个酵母和人类真实的数据集上将所提算法与其他蛋白质复合体识别算法进行比较,实验结果表明,所提方法在F-measure和Accuracy指标上取得了较好的蛋白质复合体识别性能。
基金Supported in part by the National Natural Science Foundation of China (Nos.61232001 and 61073036)
文摘Evidence shows that biological systems are composed of separable functional modules. Identifying protein complexes is essential for understanding the principles of cellular functions. Many methods have been proposed to mine protein complexes from protein-protein interaction networks. However, the performances of these algorithms are not good enough since the protein-protein interactions detected from experiments are not complete and have noise. This paper presents an analysis of the topological properties of protein complexes to show that although proteins from the same complex are more highly connected than proteins from different complexes, many protein complexes are not very dense (density ≥0.8). A method is then given to mine protein complexes that are relatively dense (density ≥0.4). In the first step, a topology property is used to identify proteins that are probably in a same complex. Then, a possible boundary is calculated based on a minimum vertex cut for the protein complex. The final complex is formed by the proteins within the boundary. The method is validated on a yeast protein-protein interaction network. The results show that this method has better performance in terms of sensitivity and specificity compared with other methods. The functional consistency is also good.
文摘为了解决目前的关键蛋白质预测方法对生物功能的分析不够深入的情况,利用蛋白质复合物信息,提出1种基于随机游走模型,结合蛋白质相互作用网络中的边聚集系数等数据来预测关键蛋白质的RWP(random walk method for predicting essential proteins)算法。在酿酒酵母(Saccharomyces cerevisiae)蛋白质相互作用网络上,以敏感度、特异性、阳性预测值、阴性预测值、准确率等5个统计学指标为评价标准,将RWP与介数中心性、度中心性、信息中心性、CSC算法及LIDC算法等5种用于预测关键蛋白质的方法进行对比实验。结果表明:RWP在关键蛋白质识别率等方面优于这5种测度方法,它具有较好的预测关键蛋白质的性能。