Identifying antimicrobial resistant(AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing(NGS) technology has provided a powerful tool in identifying the gene...Identifying antimicrobial resistant(AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing(NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations(CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments(GPA). Simulation results showed that GPA has reduced the false discovery rate(FDR) and mean absolute error(MAE) in CNV and single nucleotide variant(SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.展开更多
Mosaic variants resulting from postzygotic mutations are prevalent in the human genome and play important roles in human diseases.However,except for cancer-related variants,there is no collection of postzygotic mosaic...Mosaic variants resulting from postzygotic mutations are prevalent in the human genome and play important roles in human diseases.However,except for cancer-related variants,there is no collection of postzygotic mosaic variants in noncancer disease-related and healthy individuals.Here,we present MosaicBase,a comprehensive database that includes 6698 mosaic variants related to 266 noncancer diseases and 27,991 mosaic variants identified in 422 healthy individuals.Genomic and phenotypic information of each variant was manually extracted and curated from 383 publications.MosaicBase supports the query of variants with Online Mendelian Inheritance in Man(OMIM)entries,genomic coordinates,gene symbols,or Entrez IDs.We also provide an integrated genome browser for users to easily access mosaic variants and their related annotations for any genomic region.By analyzing the variants collected in MosaicBase,we find that mosaic variants that directly contribute to disease phenotype show features distinct from those of variants in individuals with mild or no phenotypes,in terms of their genomic distribution,mutation signatures,and fraction of mutant cells.MosaicBase will not only assist clinicians in genetic counseling and diagnosis but also provide a useful resource to understand the genomic baseline of postzygotic mutations in the general human population.MosaicBase is publicly available at http://mosaicbase.com/or http://49.4.21.8:8000.展开更多
基金supported by the Beijing Municipal Science & Technology Commission (Grant No. Z161100000516021)the National Key R&D Program of China (Grant No. 2016YFC1200804)the National Natural Science Foundation of China (Grant Nos. 81571956 and 81702038)
文摘Identifying antimicrobial resistant(AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing(NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations(CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments(GPA). Simulation results showed that GPA has reduced the false discovery rate(FDR) and mean absolute error(MAE) in CNV and single nucleotide variant(SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.
基金supported by grants from the National Natural Science Foundation of China(Grant No.31530092)the Ministry of Science and Technology of China(Grant No.2015AA020108)awarded to LW
文摘Mosaic variants resulting from postzygotic mutations are prevalent in the human genome and play important roles in human diseases.However,except for cancer-related variants,there is no collection of postzygotic mosaic variants in noncancer disease-related and healthy individuals.Here,we present MosaicBase,a comprehensive database that includes 6698 mosaic variants related to 266 noncancer diseases and 27,991 mosaic variants identified in 422 healthy individuals.Genomic and phenotypic information of each variant was manually extracted and curated from 383 publications.MosaicBase supports the query of variants with Online Mendelian Inheritance in Man(OMIM)entries,genomic coordinates,gene symbols,or Entrez IDs.We also provide an integrated genome browser for users to easily access mosaic variants and their related annotations for any genomic region.By analyzing the variants collected in MosaicBase,we find that mosaic variants that directly contribute to disease phenotype show features distinct from those of variants in individuals with mild or no phenotypes,in terms of their genomic distribution,mutation signatures,and fraction of mutant cells.MosaicBase will not only assist clinicians in genetic counseling and diagnosis but also provide a useful resource to understand the genomic baseline of postzygotic mutations in the general human population.MosaicBase is publicly available at http://mosaicbase.com/or http://49.4.21.8:8000.