Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered...Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
To the Editor:Transparency is acknowledged to be crucial for the improvement of interdisciplinary research quality and reproducibility.The International Committee of Medical Journal Editors(ICMJE)has introduced requir...To the Editor:Transparency is acknowledged to be crucial for the improvement of interdisciplinary research quality and reproducibility.The International Committee of Medical Journal Editors(ICMJE)has introduced requirements for prospective registration,reporting of results and data sharing to make trial registration in public databases a standard practice.[1,2]Transparency of systematic reviews can also be evaluated from prospective registration of schemes and/or publicly available protocols.However,no framework exists for the evaluation of the transparency of clinical practice guidelines(CPGs)and definitions of guideline transparency remain unclear.展开更多
Introduction:Streptococcus agalactiae,or group B Streptococcus(GBS),can cause severe infections in humans,yet comprehensive genomic characterization from China remains limited.This study presents an extensive genomic ...Introduction:Streptococcus agalactiae,or group B Streptococcus(GBS),can cause severe infections in humans,yet comprehensive genomic characterization from China remains limited.This study presents an extensive genomic analysis of GBS isolates collected in China from 1998 to 2024.Methods:GBS genomes were obtained from public databases and through de novo sequencing.Serotype confirmation was conducted via pan-genomic analysis,phylogenetic relationships were established using maximum-likelihood methodology,and virulence and antibiotic resistance genes were identified through the Virulence Factor Database and Comprehensive Antibiotic Resistance Database.Statistical analyses were performed using SPSS 26.0,primarily employing Fisher’s exact tests.Results:Analysis of 747 GBS genomes revealed eight serotypes(Ia,Ib,II,III,IV,V,VI,VII)and nontypeable strains.Serotypes III,Ib,Ia,V,and II constituted 96.65%of all isolates.GBS prevalence remained low from 1998–2011 but increased substantially after 2012.Geographic distribution demonstrated significant regional heterogeneity.Phylogenetic analysis categorized the 747 genomes into five distinct lineages,with lineage 5 being predominant.Six virulence factor categories encompassing 56 virulence-associated genes were identified,with 33 genes present in nearly all genomes.Twenty-seven antibiotic resistance genes spanning nine drug classes were detected,particularly those conferring resistance to peptides and macrolide antibiotics,indicating widespread antimicrobial resistance mechanisms in GBS.Conclusions:GBS infections in China exhibit serotype distributions similar to global patterns but with notable regional variations.This comprehensive genomic characterization provides critical insights for developing targeted prevention strategies and treatment approaches for GBS infections in China.展开更多
Objective ZW10 interacting kinetochore protein(ZWINT)has been demonstrated to play a pivotal role in the growth,invasion,and migration of cancers.Nevertheless,whether the expression levels of ZWINT are significantly c...Objective ZW10 interacting kinetochore protein(ZWINT)has been demonstrated to play a pivotal role in the growth,invasion,and migration of cancers.Nevertheless,whether the expression levels of ZWINT are significantly correlated with clinicopathological characteristics and prognostic outcomes of patients with breast cancer remains elusive.This study systematically investigated the clinical significance of ZWINT expression in breast cancer through integrated molecular subtyping and survival analysis.Methods We systematically characterized the spatial expression pattern of ZWINT across various breast cancer subtypes and assessed its prognostic significance using an integrated bioinformatics approach that involved multi-omics analysis.The approach included the Breast Cancer Gene-Expression Miner v5.1(bc-GenExMiner v5.1),TNMplot,MuTarget,PrognoScan database,and Database for Annotation,Visualization,and Integrated Discovery(DAVID).Results Our analysis revealed consistent upregulation of ZWINT mRNA and protein expression across distinct clinicopathological subtypes of breast cancer.ZWINT overexpression demonstrated significant co-occurrence with truncating mutations in cadherin 1(CDH1)and tumor protein p53(TP53),suggesting potential functional crosstalk in tumor progression pathways.The overexpression of ZWINT correlated with adverse clinical outcomes,showing 48%increased mortality risk(overall survival:HR 1.48,95%CI 1.23–1.79),66%higher recurrence probability(relapse-free survival:1.66,95%CI 1.50–1.84),and 63%elevated metastasis risk(distant metastasis-free survival:HR 1.63,95%CI 1.39–1.90).Multivariate Cox regression incorporating TNM staging and molecular subtypes confirmed ZWINT as an independent prognostic determinant(P<0.001,Harrell’s C-index=0.7827),which was validated through bootstrap resampling(1000 iterations).Conclusion ZWINT may serve as a potential biomarker for prognosis and a possible therapeutic target alongside TP53/CDH1 in breast cancer.展开更多
Natural products,as major resources for drug discovery historically,are gaining more attentions recently due to the advancement in genomic sequencing and other technologies,which makes them attractive and amenable to ...Natural products,as major resources for drug discovery historically,are gaining more attentions recently due to the advancement in genomic sequencing and other technologies,which makes them attractive and amenable to drug candidate screening.Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost.Lately,a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products.Thus,it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment.PubChem,as a public database,contains large amounts of natural products associated with bioactivity data.In this review,we introduce the information system provided at PubChem,and systematically describe the applications for a set of PubChem web services for rapid data retrieval,analysis,and downloading of natural products.We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.展开更多
基金supported by the National Key Research and Development Program of China(No.2016YFD0200-308)the National Key Basic Research Program of China(No.2015CB150501)the Project of Priority and Key Areas,Institute of Soil Science,Chinese Academy of Sciences(Nos.ISSASIP1605 and ISSASIP1640).
文摘Soil metaproteomics has excellent potential as a tool to elucidate the structural and functional changes in soil microbial communities in response to environmental alterations. However, soil metaproteomics is hindered by several challenges and gaps. Soil microbial communities possess extremely complex microbial composition, including many uncultured microorganisms without whole genome sequencing. Thus, how to select a suitable protein sequence database remains challenging in soil metaproteomics. In this study, the Public database and Meta-database were constructed using protein sequences from public databases and metagenomics, respectively. We comprehensively analyzed and compared the soil metaproteomic results using these two kinds of protein sequence databases for protein identification based on published soil metaproteomic raw data. The results demonstrated that many more proteins, higher sequence coverage, and even more microbial species and functional annotations could be identified using the Meta-database compared with those identified using the Public database. These findings indicated that the Meta-database was more specific as a protein sequence database. However, the follow-up in-depth metaproteomic analyses exhibited similar main results regardless of the database used. The microbial community composition at the genus level was similar between the two databases, especially the species annotations with high peptide-spectrum match and high abundance. The functional analyses in response to stress, such as the gene ontology enrichment of biological progress and molecular function and the key functional microorganisms, were also similar regardless of the database. Our analysis revealed that the Public database could also meet the demand to explore the functional responses of microbial proteins to some extent. This study provides valuable insights into the choice of protein sequence databases and their impacts on subsequent bioinformatic analysis in soil metaproteomic research and will facilitate the optimization of experimental design for different purposes.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
基金This study was supported by grants from the National Natural Science Foundation of China(Nos.72074161 and 81873197)the Project of Medical Management Center of National Health Commission,China(No.ZX2022020).
文摘To the Editor:Transparency is acknowledged to be crucial for the improvement of interdisciplinary research quality and reproducibility.The International Committee of Medical Journal Editors(ICMJE)has introduced requirements for prospective registration,reporting of results and data sharing to make trial registration in public databases a standard practice.[1,2]Transparency of systematic reviews can also be evaluated from prospective registration of schemes and/or publicly available protocols.However,no framework exists for the evaluation of the transparency of clinical practice guidelines(CPGs)and definitions of guideline transparency remain unclear.
基金Supported by the National Science and Technology Infrastructure of China(NPRC-32)the construction of protein fingerprints of rare pathogens(KFYJ-2022-039).
文摘Introduction:Streptococcus agalactiae,or group B Streptococcus(GBS),can cause severe infections in humans,yet comprehensive genomic characterization from China remains limited.This study presents an extensive genomic analysis of GBS isolates collected in China from 1998 to 2024.Methods:GBS genomes were obtained from public databases and through de novo sequencing.Serotype confirmation was conducted via pan-genomic analysis,phylogenetic relationships were established using maximum-likelihood methodology,and virulence and antibiotic resistance genes were identified through the Virulence Factor Database and Comprehensive Antibiotic Resistance Database.Statistical analyses were performed using SPSS 26.0,primarily employing Fisher’s exact tests.Results:Analysis of 747 GBS genomes revealed eight serotypes(Ia,Ib,II,III,IV,V,VI,VII)and nontypeable strains.Serotypes III,Ib,Ia,V,and II constituted 96.65%of all isolates.GBS prevalence remained low from 1998–2011 but increased substantially after 2012.Geographic distribution demonstrated significant regional heterogeneity.Phylogenetic analysis categorized the 747 genomes into five distinct lineages,with lineage 5 being predominant.Six virulence factor categories encompassing 56 virulence-associated genes were identified,with 33 genes present in nearly all genomes.Twenty-seven antibiotic resistance genes spanning nine drug classes were detected,particularly those conferring resistance to peptides and macrolide antibiotics,indicating widespread antimicrobial resistance mechanisms in GBS.Conclusions:GBS infections in China exhibit serotype distributions similar to global patterns but with notable regional variations.This comprehensive genomic characterization provides critical insights for developing targeted prevention strategies and treatment approaches for GBS infections in China.
基金supported by the Research Project of Maternal and Child Health Hospital of Hubei Province(No.2023SFYM008)Key Project of Hubei Provincial Natural Science Foundation(No.JCZRLH202500304).
文摘Objective ZW10 interacting kinetochore protein(ZWINT)has been demonstrated to play a pivotal role in the growth,invasion,and migration of cancers.Nevertheless,whether the expression levels of ZWINT are significantly correlated with clinicopathological characteristics and prognostic outcomes of patients with breast cancer remains elusive.This study systematically investigated the clinical significance of ZWINT expression in breast cancer through integrated molecular subtyping and survival analysis.Methods We systematically characterized the spatial expression pattern of ZWINT across various breast cancer subtypes and assessed its prognostic significance using an integrated bioinformatics approach that involved multi-omics analysis.The approach included the Breast Cancer Gene-Expression Miner v5.1(bc-GenExMiner v5.1),TNMplot,MuTarget,PrognoScan database,and Database for Annotation,Visualization,and Integrated Discovery(DAVID).Results Our analysis revealed consistent upregulation of ZWINT mRNA and protein expression across distinct clinicopathological subtypes of breast cancer.ZWINT overexpression demonstrated significant co-occurrence with truncating mutations in cadherin 1(CDH1)and tumor protein p53(TP53),suggesting potential functional crosstalk in tumor progression pathways.The overexpression of ZWINT correlated with adverse clinical outcomes,showing 48%increased mortality risk(overall survival:HR 1.48,95%CI 1.23–1.79),66%higher recurrence probability(relapse-free survival:1.66,95%CI 1.50–1.84),and 63%elevated metastasis risk(distant metastasis-free survival:HR 1.63,95%CI 1.39–1.90).Multivariate Cox regression incorporating TNM staging and molecular subtypes confirmed ZWINT as an independent prognostic determinant(P<0.001,Harrell’s C-index=0.7827),which was validated through bootstrap resampling(1000 iterations).Conclusion ZWINT may serve as a potential biomarker for prognosis and a possible therapeutic target alongside TP53/CDH1 in breast cancer.
基金supported by the Intramural Research Program of the National Institutes of Health,National Library of Medicine
文摘Natural products,as major resources for drug discovery historically,are gaining more attentions recently due to the advancement in genomic sequencing and other technologies,which makes them attractive and amenable to drug candidate screening.Collecting and mining the bioactivity information of natural products are extremely important for accelerating drug development process by reducing cost.Lately,a number of publicly accessible databases have been established to facilitate the access to the chemical biology data for small molecules including natural products.Thus,it is imperative for scientists in related fields to exploit these resources in order to expedite their researches on natural products as drug leads/candidates for disease treatment.PubChem,as a public database,contains large amounts of natural products associated with bioactivity data.In this review,we introduce the information system provided at PubChem,and systematically describe the applications for a set of PubChem web services for rapid data retrieval,analysis,and downloading of natural products.We hope this work can serve as a starting point for the researchers to perform data mining on natural products using PubChem.