Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.teg...Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.展开更多
Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four ...Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.展开更多
High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes.In this study,first,we apply four kinds of hapl...High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes.In this study,first,we apply four kinds of haplotype identification methods(Confidence Intervals,Four Gamete Tests,Solid Spine of LD and fusing method of haplotype block)into high-throughout SNP genotype data to identify blocks,then use cluster analysis to verify the effectiveness of the four methods,and select the alco-holism-related SNP haplotypes through risk analysis.Second,we establish a mapping from haplotypes to alcoholism-related genes.Third,we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes.In the end,we make gene function annotation by KEGG,Biocarta,and GO database.We find 159 haplotype blocks,which relate to the alcoholism most possibly on chromosome 1~22,including 227 haplotypes,of which 102 SNP haplotypes may increase the risk of alcoholism.We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology.In a word,we not only can handle the SNP data easily,but also can locate the disease-related genes pre-cisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with ex-isting knowledge framework.展开更多
基金supported by the National Natural Science Foundation of China(32260097)the National Guidance of Local Science and Technology Development Fund of China([2023]009)。
文摘Ericaceae is a diverse family of flowering plants distributed nearly worldwide,and it includes 126 genera and more than 4,000 species.In the present study,we developed The Ericaceae Genome Resource(TEGR,http://www.tegr.com.cn)as a comprehensive,user-friendly,web-based functional genomic database that is based on 16 published genomes from 16 Ericaceae species.The TEGR database contains information on many important functional genes,including 763 auxin genes,2,407 flowering genes,20,432 resistance genes,617 anthocyanin-related genes,and 470 N^(6)-methyladenosine(m^(6)A)modification genes.We identified a total of 599,174 specific guide sequences for CRISPR in the TEGR database.The gene duplication events,synteny analysis,and orthologous analysis of the16 Ericaceae species were performed using the TEGR database.The TEGR database contains 614,821 functional genes annotated through the GO,Nr,Pfam,TrEMBL,and Swiss-Prot databases.The TEGR database provides the Primer Design,Hmmsearch,Synteny,BLAST,and JBrowse tools for helping users perform comprehensive comparative genome analyses.All the high-quality reference genome sequences,genomic features,gene annotations,and bioinformatics results can be downloaded from the TEGR database.In the future,we will continue to improve the TEGR database with the latest data sets when they become available and to provide a useful resource that facilitates comparative genomic studies.
基金supported in part by the National Natural Science Foundation of China(Grant Nos.62072243 and 61772273 to Dong-Jun Yu)the Natural Science Foundation of Jiangsu,China(Grant No.BK20201304 to Dong-Jun Yu)+7 种基金the Foundation of National Defense Key Laboratory of Science and Technology,China(Grant No.JZX7Y202001SY000901 to DongJun Yu)the China Scholarship Council(Grant No.201906840041 to Yi-Heng Zhu)the National Institute of Environmental Health Sciences,USA(Grant No.P30ES017885 to Gilbert S.Omenn)the National Cancer Institute,USA(Grant No.U24CA210967 to Gilbert S.Omenn)the National Institute of General Medical Sciences,USA(Grant Nos.GM136422 and S10OD026825 to Yang Zhang)the National Institute of Allergy and Infectious Diseases,USA(Grant No.AI134678 to Peter L.Freddolino and Yang Zhang)the National Science Foundation,USA(Grant Nos.IIS1901191,DBI2030790,and MTM2025426 to Yang Zhang)used the Extreme Science and Engineering Discovery Environment(XSEDE),which is supported by the National Science Foundation,USA(Grant No.ACI1548562)。
文摘Gene Ontology(GO)has been widely used to annotate functions of genes and gene products.Here,we proposed a new method,Triplet GO,to deduce GO terms of protein-coding and noncoding genes,through the integration of four complementary pipelines built on transcript expression profile,genetic sequence alignment,protein sequence alignment,and naīve probability.Triplet GO was tested on a large set of 5754 genes from 8 species(human,mouse,Arabidopsis,rat,fly,budding yeast,fission yeast,and nematoda)and 2433 proteins with available expression data from the third Critical Assessment of Protein Function Annotation challenge(CAFA3).Experimental results show that Triplet GO achieves function annotation accuracy significantly beyond the current state-of-the-art approaches.Detailed analyses show that the major advantage of Triplet GO lies in the coupling of a new triplet network-based profiling method with the feature space mapping technique,which can accurately recognize function patterns from transcript expression profiles.Meanwhile,the combination of multiple complementary models,especially those from transcript expression and protein-level alignments,improves the coverage and accuracy of the final GO annotation results.The standalone package and an online server of Triplet GO are freely available at https://zhanggroup.org/Triplet GO/.
基金Supported by the National Natural Science Foundation of China(Grant Nos.30570424,60601010 and 30600367)the National High-Tech Research and Devel-opment Program of China,(Grant No.2007AA02Z329)+3 种基金the Key Science and Tech-nology Program of Heilongjiang Province(Grant No.GB03C602-4)Natural Science Foundation of Heilongjiang Province(Grant No.F2008-02)Youth Science Founda-tion of Harbin Medical University(Grant No.060045)Science Foundation of Heilongjiang Province Education Department(Grant Nos.11531113 and 1152hq28).
文摘High-throughout single nucleotide polymorphism detection technology and the existing knowledge provide strong support for mining the disease-related haplotypes and genes.In this study,first,we apply four kinds of haplotype identification methods(Confidence Intervals,Four Gamete Tests,Solid Spine of LD and fusing method of haplotype block)into high-throughout SNP genotype data to identify blocks,then use cluster analysis to verify the effectiveness of the four methods,and select the alco-holism-related SNP haplotypes through risk analysis.Second,we establish a mapping from haplotypes to alcoholism-related genes.Third,we inquire NCBI SNP and gene databases to locate the blocks and identify the candidate genes.In the end,we make gene function annotation by KEGG,Biocarta,and GO database.We find 159 haplotype blocks,which relate to the alcoholism most possibly on chromosome 1~22,including 227 haplotypes,of which 102 SNP haplotypes may increase the risk of alcoholism.We get 121 alcoholism-related genes and verify their reliability by the functional annotation of biology.In a word,we not only can handle the SNP data easily,but also can locate the disease-related genes pre-cisely by combining our novel strategies of mining alcoholism-related haplotypes and genes with ex-isting knowledge framework.