Chitinases catalyze the hydrolysis of chitin, a linear homopolymer of β-(1,4)-linked N-acetylglucosamine. The broad range of applications of chitinolytic enzymes makes their identification and study very promising. M...Chitinases catalyze the hydrolysis of chitin, a linear homopolymer of β-(1,4)-linked N-acetylglucosamine. The broad range of applications of chitinolytic enzymes makes their identification and study very promising. Metagenomic approaches offer access to functional genes in uncultured representatives of the microbiota and hold great potential in the discovery of novel enzymes, but tools to extensively explore these data are still scarce. In this study, we develop a chitinase mining pipeline to facilitate the comprehensive search of these enzymes in environmental metagenomic databases and also to explore phylogenetic relationships among the retrieved sequences. In order to perform the analyses, UniprotKB fungal and bacterial chitinases sequences belonging to the glycoside hydrolases (GH) family-18, 19 and 20 were used to generate 15 reference datasets, which were then used to generate high quality seed alignments with the MAFFT program. Profile Hidden Markov Models (pHMMs) were built from each seed alignment using the hmmbuild program of HMMER v3.0 package. The best-hit sequences returned by hmmsearch against two environmental metagenomic databases (Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis—CAMERA and Integrated Microbial Genomes—IMG/M) were retrieved and further analyzed. The NJ trees generated for each chitinase dataset showed some variability in the catalytic domain region of the metagenomic sequences and revealed common sequence patterns among all the trees. The scanning of the retrieved metagenomic sequences for chitinase conserved domains/signatures using both the InterPro and the RPS-BLAST tools confirmed the efficacy and sensitivity of our pHMM-based approach in detecting putative chitinases sequences. These analyses provide insight into the potential reservoir of novel molecules in metagenomic databases while supporting the chitinase mining pipeline developed in this work. By using our chitinase mining pipeline, a larger number of previously unannotated metagenomic chitinase sequences can be classified, enabling further studies on these enzymes.展开更多
Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the recognition accuracy with limited training data.By introducing a limited number of long-size ...Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the recognition accuracy with limited training data.By introducing a limited number of long-size units into unit set,this modeling scheme can make better acoustic model precision than complete short-size unit modeling without losing model trainability.However,such a multiple-size unit acoustic modeling paradigm does not always bring reliable improvement on recognition performance,since when a large number of long-size units are added in,the amount of training data for short-size units will decrease and result in insufficiently trained models.In this paper,a modified Baum-Welch training method is proposed,which uses product hidden Markov models(PHMMs)to couple units with different sizes and enables them to share same portions of training data.The validity of proposed method is proved by experiment results.展开更多
文摘Chitinases catalyze the hydrolysis of chitin, a linear homopolymer of β-(1,4)-linked N-acetylglucosamine. The broad range of applications of chitinolytic enzymes makes their identification and study very promising. Metagenomic approaches offer access to functional genes in uncultured representatives of the microbiota and hold great potential in the discovery of novel enzymes, but tools to extensively explore these data are still scarce. In this study, we develop a chitinase mining pipeline to facilitate the comprehensive search of these enzymes in environmental metagenomic databases and also to explore phylogenetic relationships among the retrieved sequences. In order to perform the analyses, UniprotKB fungal and bacterial chitinases sequences belonging to the glycoside hydrolases (GH) family-18, 19 and 20 were used to generate 15 reference datasets, which were then used to generate high quality seed alignments with the MAFFT program. Profile Hidden Markov Models (pHMMs) were built from each seed alignment using the hmmbuild program of HMMER v3.0 package. The best-hit sequences returned by hmmsearch against two environmental metagenomic databases (Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis—CAMERA and Integrated Microbial Genomes—IMG/M) were retrieved and further analyzed. The NJ trees generated for each chitinase dataset showed some variability in the catalytic domain region of the metagenomic sequences and revealed common sequence patterns among all the trees. The scanning of the retrieved metagenomic sequences for chitinase conserved domains/signatures using both the InterPro and the RPS-BLAST tools confirmed the efficacy and sensitivity of our pHMM-based approach in detecting putative chitinases sequences. These analyses provide insight into the potential reservoir of novel molecules in metagenomic databases while supporting the chitinase mining pipeline developed in this work. By using our chitinase mining pipeline, a larger number of previously unannotated metagenomic chitinase sequences can be classified, enabling further studies on these enzymes.
基金supported in part by the National Natural Science Foundation of China (Grant No.60605016)the National Key Basic Research Program of China (Nos.2004CB318005 and 2004CB318105)the National High Technology Research and Development Program of China (No.2006AA010103).
文摘Multiple-size units-based acoustic modeling has been proposed for large vocabulary speech recognition system to improve the recognition accuracy with limited training data.By introducing a limited number of long-size units into unit set,this modeling scheme can make better acoustic model precision than complete short-size unit modeling without losing model trainability.However,such a multiple-size unit acoustic modeling paradigm does not always bring reliable improvement on recognition performance,since when a large number of long-size units are added in,the amount of training data for short-size units will decrease and result in insufficiently trained models.In this paper,a modified Baum-Welch training method is proposed,which uses product hidden Markov models(PHMMs)to couple units with different sizes and enables them to share same portions of training data.The validity of proposed method is proved by experiment results.