Microbial community studies have established enzymes’pivotal catalytic roles in ecosystem metabolism,yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources.Metagenomics overcomes this...Microbial community studies have established enzymes’pivotal catalytic roles in ecosystem metabolism,yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources.Metagenomics overcomes this by directly accessing microbial genetic information,but its massive data generation challenges precise enzyme identification:(1)Restricted applicability across varied sample types.(2)Narrow functional scope in target enzyme discovery.To address this,we developed Gene Surfing,a bioinformatics workflow platform based on Snakemake.It integrates modules for data quality control(Fastp),genome assembly(MEGAHIT),assembly evaluation(QUAST and MetaQUAST),functional annotation(Prokka),and homologous sequence retrieval(MMseqs2).Gene Surfing offers scalability,reproducibility,and efficiency,addressing key challenges in enzyme identification.Validation results include:Cellulose-degrading enzymes(GH5 family):1,311,316 potential lignocellulolytic enzyme se-quences were identified,with 127 sequences functionally validated(84.25%activity rate);Polyethylenedegrading enzymes:705 candidate sequences were found,38 of which were heterologously expressed,showing an 81.5%activity rate(31/38);Endonucleases(HNH superfamily):585 potential sequences were retrieved,with 4 out of 7 tested showing activity(57.1%success rate).展开更多
Chromatin immunoprecipitation sequencing(Ch IP-seq)and the Assay for Transposase-Accessible Chromatin with high-throughput sequencing(ATAC-seq)have become essential technologies to effectively measure protein–DNA int...Chromatin immunoprecipitation sequencing(Ch IP-seq)and the Assay for Transposase-Accessible Chromatin with high-throughput sequencing(ATAC-seq)have become essential technologies to effectively measure protein–DNA interactions and chromatin accessibility.However,there is a need for a scalable and reproducible pipeline that incorporates proper normalization between samples,correction of copy number variations,and integration of new downstream analysis tools.Here we present Containerized Bioinformatics workflow for Reproducible Ch IP/ATAC-seq Analysis(Co BRA),a modularized computational workflow which quantifies Ch IP-seq and ATAC-seq peak regions and performs unsupervised and supervised analyses.Co BRA provides a comprehensive state-of-the-art Ch IP-seq and ATAC-seq analysis pipeline that can be used by scientists with limited computational experience.This enables researchers to gain rapid insight into protein–DNA interactions and chromatin accessibility through sample clustering,differential peak calling,motif enrichment,comparison of sites to a reference database,and pathway analysis.Co BRA is publicly available online at https://bitbucket.org/cfce/cobra.展开更多
基金supported by the Third Xinjiang Scientific Expedition Program,the National Key Research and Development Program of China(grant 2022xjkk020603)Key Research and Development Project of Xinjiang Uygur Autonomous Region of China(grant 2023B02034,2023B02034-2)+1 种基金the National Natural Science Foundation of China(grant U2003305,31860018)the Tianshan Young Top Talents-Basic Research Talents(2024TSYCJU0002).
文摘Microbial community studies have established enzymes’pivotal catalytic roles in ecosystem metabolism,yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources.Metagenomics overcomes this by directly accessing microbial genetic information,but its massive data generation challenges precise enzyme identification:(1)Restricted applicability across varied sample types.(2)Narrow functional scope in target enzyme discovery.To address this,we developed Gene Surfing,a bioinformatics workflow platform based on Snakemake.It integrates modules for data quality control(Fastp),genome assembly(MEGAHIT),assembly evaluation(QUAST and MetaQUAST),functional annotation(Prokka),and homologous sequence retrieval(MMseqs2).Gene Surfing offers scalability,reproducibility,and efficiency,addressing key challenges in enzyme identification.Validation results include:Cellulose-degrading enzymes(GH5 family):1,311,316 potential lignocellulolytic enzyme se-quences were identified,with 127 sequences functionally validated(84.25%activity rate);Polyethylenedegrading enzymes:705 candidate sequences were found,38 of which were heterologously expressed,showing an 81.5%activity rate(31/38);Endonucleases(HNH superfamily):585 potential sequences were retrieved,with 4 out of 7 tested showing activity(57.1%success rate).
基金funding from the National Institutes of Health,United States(Grant Nos.2PO1CA163227 and P01CA250959)。
文摘Chromatin immunoprecipitation sequencing(Ch IP-seq)and the Assay for Transposase-Accessible Chromatin with high-throughput sequencing(ATAC-seq)have become essential technologies to effectively measure protein–DNA interactions and chromatin accessibility.However,there is a need for a scalable and reproducible pipeline that incorporates proper normalization between samples,correction of copy number variations,and integration of new downstream analysis tools.Here we present Containerized Bioinformatics workflow for Reproducible Ch IP/ATAC-seq Analysis(Co BRA),a modularized computational workflow which quantifies Ch IP-seq and ATAC-seq peak regions and performs unsupervised and supervised analyses.Co BRA provides a comprehensive state-of-the-art Ch IP-seq and ATAC-seq analysis pipeline that can be used by scientists with limited computational experience.This enables researchers to gain rapid insight into protein–DNA interactions and chromatin accessibility through sample clustering,differential peak calling,motif enrichment,comparison of sites to a reference database,and pathway analysis.Co BRA is publicly available online at https://bitbucket.org/cfce/cobra.