Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technol...Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.展开更多
为了探究鸡肿瘤坏死因子α诱导蛋白8样1(TNF alpha induced protein 8 like 1,TNFAIP8L1)基因序列特征,以及叶酸(FA)和甲氨蝶呤(MTX)对TNFAIP8L1基因在鸡胚肝脏组织不同发育阶段表达的影响,试验将孵化至第7天的240枚琅琊鸡种蛋分为生理...为了探究鸡肿瘤坏死因子α诱导蛋白8样1(TNF alpha induced protein 8 like 1,TNFAIP8L1)基因序列特征,以及叶酸(FA)和甲氨蝶呤(MTX)对TNFAIP8L1基因在鸡胚肝脏组织不同发育阶段表达的影响,试验将孵化至第7天的240枚琅琊鸡种蛋分为生理盐水组(注射0.1 mL生理盐水)、FA组[注射0.1 mL FA溶液(90μg FA溶于0.1 mL生理盐水中)]、MTX组[注射0.1 mL MTX溶液(5μg MTX溶于0.1 mL生理盐水中)]、FA和MTX组[注射0.1 mL FA+MTX混合液(90μg FA+5μg MTX溶于0.1 mL生理盐水中)],每组4个重复,每个重复15枚,利用PCR技术克隆鸡TNFAIP8L1基因开放阅读框(ORF),并对其进行生物信息学分析,同时利用荧光定量PCR方法分析FA和MTX对TNFAIP8L1基因在鸡胚孵化至第10天(10E)、第13天(13E)、第16天(16E)、第19天(19E)和出壳第1天(1D)肝脏中的表达差异。结果表明:克隆获得琅琊鸡TNFAIP8L1基因序列长726 bp,其中ORF区序列长561 bp,编码186个氨基酸。氨基酸多序列比对显示琅琊鸡与绿头鸭的氨基酸序列一致性最高(为97.85%),与马的一致性最低(为64.52%)。鸡TNFAIP8L1蛋白是一种亲水稳定蛋白;二级结构预测以α-螺旋(为72.04%)和无规则卷曲(为23.66%)为主要结构,三级结构与5jxd.1模板相似,其序列同源性为62.37%。10E和16E时,FA组和MTX组TNFAIP8L1基因相对表达量均显著低于生理盐水组(P<0.05);19E和1D时,FA组和MTX组TNFAIP8L1基因相对表达量与生理盐水组差异不显著(P>0.05)。说明在鸡胚发育前期FA和MTX可能对TNFAIP8L1基因表达有抑制作用。展开更多
基金supported by the National Natural Science Foundation of China(Grant No.61863010)the Key Research and Development Program of Shandong Province of China(Grant No.2019GGX101001)the Natural Science Foundation of Shandong Province of China(Grant No.ZR2018MC007)。
文摘Protein-protein interactions(PPIs)are of great importance to understand genetic mechanisms,delineate disease pathogenesis,and guide drug design.With the increase of PPI data and development of machine learning technologies,prediction and identification of PPIs have become a research hotspot in proteomics.In this study,we propose a new prediction pipeline for PPIs based on gradient tree boosting(GTB).First,the initial feature vector is extracted by fusing pseudo amino acid composition(Pse AAC),pseudo position-specific scoring matrix(Pse PSSM),reduced sequence and index-vectors(RSIV),and autocorrelation descriptor(AD).Second,to remove redundancy and noise,we employ L1-regularized logistic regression(L1-RLR)to select an optimal feature subset.Finally,GTB-PPI model is constructed.Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets,respectively.In addition,GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans,Escherichia coli,Homo sapiens,and Mus musculus,the one-core PPI network for CD9,and the crossover PPI network for the Wnt-related signaling pathways.The results show that GTB-PPI can significantly improve accuracy of PPI prediction.The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
文摘为了探究鸡肿瘤坏死因子α诱导蛋白8样1(TNF alpha induced protein 8 like 1,TNFAIP8L1)基因序列特征,以及叶酸(FA)和甲氨蝶呤(MTX)对TNFAIP8L1基因在鸡胚肝脏组织不同发育阶段表达的影响,试验将孵化至第7天的240枚琅琊鸡种蛋分为生理盐水组(注射0.1 mL生理盐水)、FA组[注射0.1 mL FA溶液(90μg FA溶于0.1 mL生理盐水中)]、MTX组[注射0.1 mL MTX溶液(5μg MTX溶于0.1 mL生理盐水中)]、FA和MTX组[注射0.1 mL FA+MTX混合液(90μg FA+5μg MTX溶于0.1 mL生理盐水中)],每组4个重复,每个重复15枚,利用PCR技术克隆鸡TNFAIP8L1基因开放阅读框(ORF),并对其进行生物信息学分析,同时利用荧光定量PCR方法分析FA和MTX对TNFAIP8L1基因在鸡胚孵化至第10天(10E)、第13天(13E)、第16天(16E)、第19天(19E)和出壳第1天(1D)肝脏中的表达差异。结果表明:克隆获得琅琊鸡TNFAIP8L1基因序列长726 bp,其中ORF区序列长561 bp,编码186个氨基酸。氨基酸多序列比对显示琅琊鸡与绿头鸭的氨基酸序列一致性最高(为97.85%),与马的一致性最低(为64.52%)。鸡TNFAIP8L1蛋白是一种亲水稳定蛋白;二级结构预测以α-螺旋(为72.04%)和无规则卷曲(为23.66%)为主要结构,三级结构与5jxd.1模板相似,其序列同源性为62.37%。10E和16E时,FA组和MTX组TNFAIP8L1基因相对表达量均显著低于生理盐水组(P<0.05);19E和1D时,FA组和MTX组TNFAIP8L1基因相对表达量与生理盐水组差异不显著(P>0.05)。说明在鸡胚发育前期FA和MTX可能对TNFAIP8L1基因表达有抑制作用。