As a high-value eudicot family,many famous horticultural crop genomes have been deciphered in Oleaceae.However,there are currently no bioinformatics platforms focused on empowering genome research in Oleaceae.Herein,w...As a high-value eudicot family,many famous horticultural crop genomes have been deciphered in Oleaceae.However,there are currently no bioinformatics platforms focused on empowering genome research in Oleaceae.Herein,we developed the first comprehensive Oleaceae Genome Research Platform(OGRP,https://oleaceae.cgrpoee.top/).In OGRP,70 genomes of 10 Oleaceae species and 46 eudicots and 366 transcriptomes involving 18 Oleaceae plant tissues can be obtained.We built 34 window-operated bioinformatics tools,collected 38 professional practical software programs,and proposed 3 new pipelines,namely ancient polyploidization identification,ancestral karyotype reconstruction,and gene family evolution.Employing these pipelines to reanalyze the Oleaceae genomes,we clarified the polyploidization,reconstructed the ancestral karyotypes,and explored the effects of paleogenome evolution on genes with specific biological regulatory roles.Significantly,we generated a series of comparative genomic resources focusing on the Oleaceae,comprising 108 genomic synteny dot plots,1952225 collinear gene pairs,multiple genome alignments,and imprints of paleochromosome rearrangements.Moreover,in Oleaceae genomes,researchers can efficiently search for 1785987 functional annotations,22584 orthogroups,29582 important trait genes from 74 gene families,12664 transcription factor-related genes,9178872 transposable elements,and all involved regulatory pathways.In addition,we provided downloads and usage instructions for the tools,a species encyclopedia,ecological resources,relevant literatures,and external database links.In short,ORGP integrates rich data resources and powerful analytical tools with the characteristic of continuous updating,which can efficiently empower genome research and agricultural breeding in Oleaceae and other plants.展开更多
Rice is one of cereal crops and a model species for monocots.Since the release of the first draft rice genome sequences in 2002,considerable progress has been achieved in rice genomic researches,thanks to rapid develo...Rice is one of cereal crops and a model species for monocots.Since the release of the first draft rice genome sequences in 2002,considerable progress has been achieved in rice genomic researches,thanks to rapid development and efficient utilization of bioinformatics methods and tools.In this review,we summarize the progress of studies of rice genome sequencing and other omics and introduce the wellmaintained bioinformatics databases and tools developed for rice genome resources and breeding.After reviewing the history of rice bioinformatics,we use single-cell sequencing and machine learning as examples showing how bioinformatics integrates emerging technologies and how it continues to develop for future rice research.展开更多
Polyploidy is common among agriculturally important crops. Popular genetic methods and their implementations cannot always be applied to polyploid genetic data. We give an overview about available tools and their limi...Polyploidy is common among agriculturally important crops. Popular genetic methods and their implementations cannot always be applied to polyploid genetic data. We give an overview about available tools and their limitations in terms of levels of ploidy, auto- and allo-ploidy. The main classes of tools are genotype calling, linkage mapping and haplotyping. The usability of the tools is discussed with a focus on their applicability to data sets produced by state of the art technologies. We show that many challenges remain until the toolset for polyploidy provides similar functionalities as those which are already available for diploids. Some tools have been developed over a decade ago and are now outdated. In addition, we discuss necessary steps to overcome this shortage in the future.展开更多
The massive extension in biological data induced a need for user-friendly bioinformatics tools could be used for routine biological data manipulation. Bioanalyzer is a simple analytical software implements a variety o...The massive extension in biological data induced a need for user-friendly bioinformatics tools could be used for routine biological data manipulation. Bioanalyzer is a simple analytical software implements a variety of tools to perform common data analysis on different biological data types and databases. Bioanalyzer provides general aspects of data analysis such as handling nucleotide data, fetching different data formats information, NGS quality control, data visualization, performing multiple sequence alignment and sequence BLAST. These tools accept common biological data formats and produce human-readable output files could be stored on local computer machines. Bioanalyzer has a user-friendly graphical user interface to simplify massive biological data analysis and consume less memory and processing power. Bioanalyzer source code was written through Python programming language which provides less memory usage and initial startup time. Bioanalyzer is a free and open source software, where its code could be modified, extended or integrated in different bioinformatics pipelines. Bioinformatics Produce huge data in FASTA and Genbank format which can be used to produce a lot of annotation information which can be done with Python programming language that open the door form bioinformatics tool due to their elasticity in data analysis and simplicity which inspire us to develop new multiple tool software able to manipulate FASTA and Genbank files. The goal Develop new software uses Genomic data files to produce annotated data. Software was written using python programming language and biopython packages.展开更多
In this editorial preface, I briefly r eview cancer bioinformatics and introduce the four articles in this special issue highlighting important applications of the field: detection of chromatin states; detection of SN...In this editorial preface, I briefly r eview cancer bioinformatics and introduce the four articles in this special issue highlighting important applications of the field: detection of chromatin states; detection of SNP- containing motifs and association with transcription factor-binding sites; improvements in functional enrichment modules; and gene association studies on aging and cancer. We expect this issue to provide bioinformatics scientists, cancer biologists, and clinical doctors with a better understanding of how cancer bioinformatics can be used to identify candidate biomarkers and targets and to conduct functional analysis.展开更多
Objective This study aims to investigate the expression,prognostic value,and function of kinesin superfamily 4A(KIF4A)in cervical cancer.Methods Cervical cancer cell lines(Hela and SiHa)and TCGA data were used for exp...Objective This study aims to investigate the expression,prognostic value,and function of kinesin superfamily 4A(KIF4A)in cervical cancer.Methods Cervical cancer cell lines(Hela and SiHa)and TCGA data were used for experimental and bioinformatic analyses.Overall survival(OS)and progression free survival(PFS)were compared between patients with high or low KIF4A expression.Copy number variation(CNV)and somatic mutations of patients were visualized and GISTIC 2.0 was used to identify significantly altered sites.The function of KIF4A was also explored based on transcriptome analysis and validated by experimental methods.Chemotherapeutic and immunotherapeutic benefits were inferred using multiple reference databases and algorithms.Results Patients with high KIF4A expression had better OS and PFS.KIF4A could inhibit proliferation and migration and induce G1 arrest of cervical cancer cells.Higher CNV load was observed in patients with low KIF4A expression,while the group with low KIF4A expression displayed more significantly altered sites.A total of 13 genes were found to mutate more in the low KIF4A expression group,including NOTCH1 and PUM1.The analysis revealed that low KIF4A expression may indicate an immune escape phenotype,and patients in this group may benefit more from immunotherapy.With respect to chemotherapy,cisplatin and gemcitabine may respond better in patients with high KIF4A expression,while 5-fluorouracil etc.may be responded better in patients with low KIF4A expression Conclusion KIF4A is a tumor suppressor gene in cervical cancer,and it can be used as a prognostic and therapeutic biomarker in cervical cancer.展开更多
Severe acute respiratory syndrome coronavirus(SARS-CoV)and SARS-CoV-2 are thought to transmit to humans via wild mammals,especially bats.However,evidence for direct bat-to-human transmission is lacking.Involvement of ...Severe acute respiratory syndrome coronavirus(SARS-CoV)and SARS-CoV-2 are thought to transmit to humans via wild mammals,especially bats.However,evidence for direct bat-to-human transmission is lacking.Involvement of intermediate hosts is considered a reason for SARS-CoV-2 transmission to humans and emergence of outbreak.Large biodiversity is found in tropical territories,such as Brazil.On the similar line,this study aimed to predict potential coronavirus hosts among Brazilian wild mammals based on angiotensin-converting enzyme 2(ACE2)sequences using evolutionary bioinformatics.Cougar,maned wolf,and bush dogs were predicted as potential hosts for coronavirus.These indigenous carnivores are philogenetically closer to the known SARS-CoV/SARS-CoV-2 hosts and presented low ACE2 divergence.A new coronavirus transmission chain was developed in which white-tailed deer,a susceptible SARS-CoV-2 host,have the central position.Cougar play an important role because of its low divergent ACE2 level in deer and humans.The discovery of these potential coronavirus hosts will be useful for epidemiological surveillance and discovery of interventions that can contribute to break the transmission chain.展开更多
Bioinformatics analysis often requires the filtering of multi-datasets,based on frequency or frequency of occurrence,for decisions on retention or deletion.Existing tools for this purpose often present a challenge wit...Bioinformatics analysis often requires the filtering of multi-datasets,based on frequency or frequency of occurrence,for decisions on retention or deletion.Existing tools for this purpose often present a challenge with complex installation,which necessitate custom coding,thereby impeding efficient data processing activities.To address this issue,Filterx,a user-friendly command line tool that written in C language,was developed that supports multi-condition filtering,based on frequency or occurrence.This tool enables users to complete the data processing tasks through a simple command line,greatly reducing both workload and data processing time.In addition,future development of this tool could facilitate its integration into various bioinformatics data analysis pipelines.展开更多
Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic...Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contex- tualization of the term TBI, describes the discipline's brief history and past accomplishments, as well as current loci, and concludes with predictions of future directions in the field.展开更多
Realizing personalized medicine requires integrating diverse data types with bioinformatics.The most vital data are genomic information for individuals that are from advanced next-generation sequencing(NGS) technologi...Realizing personalized medicine requires integrating diverse data types with bioinformatics.The most vital data are genomic information for individuals that are from advanced next-generation sequencing(NGS) technologies at present.The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data.The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike.Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine.We review some key bioinformatics tasks,issues,and challenges in contexts of IT requirements,data quality,analysis tools and pipelines,and validation of biomarkers.展开更多
Natural products are among the most important sources of lead molecules for drug discovery.With the development of affordable whole-genome sequencing technologies and other‘omics tools,the field of natural products r...Natural products are among the most important sources of lead molecules for drug discovery.With the development of affordable whole-genome sequencing technologies and other‘omics tools,the field of natural products research is currently undergoing a shift in paradigms.While,for decades,mainly analytical and chemical methods gave access to this group of compounds,nowadays genomics-based methods offer complementary approaches to find,identify and characterize such molecules.This paradigm shift also resulted in a high demand for computational tools to assist researchers in their daily work.In this context,this review gives a summary of tools and databases that currently are available to mine,identify and characterize natural product biosynthesis pathways and their producers based on‘omics data.A web portal called Secondary Metabolite Bioinformatics Portal(SMBP at http://www.secondarymetabolites.org)is introduced to provide a one-stop catalog and links to these bioinformatics resources.In addition,an outlook is presented how the existing tools and those to be developed will influence synthetic biology approaches in the natural products field.展开更多
In the 2017 first issue of this Journal - Genomes, Proteomes and Bioinformatics - a special database article entitled "GSA: Gen- ome Sequence Archive" is published. This article provides a brief introduction to th...In the 2017 first issue of this Journal - Genomes, Proteomes and Bioinformatics - a special database article entitled "GSA: Gen- ome Sequence Archive" is published. This article provides a brief introduction to the platform developed by the authors from the BIG Data Center (BIGD) of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS). The aim of the GSA project is to collect, integrate, and archive raw sequence data submitted by domestic and international users. It is one of the major activities being carried on by a team of around 50 young bioinformaticians at BIGD. In addition to the GSA system, they are also working on several bioinformatics service-orientated projects as described in one of their recent publications .展开更多
通过稀有鮈鲫(Gobiocypris rarus)卵膜蛋白质组测序鉴定黏性卵膜中富集的透明带蛋白(Zona pellucida glycoproteins,Zps),建立可用于卵型研究的稀有鮈鲫zp基因突变体和转基因斑马鱼(Danio rerio)。实验收集稀有鮈鲫胚胎(1hour post fert...通过稀有鮈鲫(Gobiocypris rarus)卵膜蛋白质组测序鉴定黏性卵膜中富集的透明带蛋白(Zona pellucida glycoproteins,Zps),建立可用于卵型研究的稀有鮈鲫zp基因突变体和转基因斑马鱼(Danio rerio)。实验收集稀有鮈鲫胚胎(1hour post fertilization,hpf),分离收集足量卵膜,提取卵膜中的蛋白质进行测序分析。将测序数据进行拼接并与鲤科鱼类蛋白质进行比对,选取匹配度和丰度最高的基因zp2l2和zp4l进行实验。分别在zp2l2和zp4l的第一个外显子、第六个外显子设计CRISPR/Cas9敲除靶点,构建zp基因突变体。扩增斑马鱼zp3.2基因上游启动子序列(2874 base pairs,bps)和zp2l2、zp4l编码序列(1137、3009 bp),通过无缝克隆(Infusion cloning)将zp3.2启动子连接进入具有Tol2转座元件的质粒中,构得质粒pT2AL-zp3.2-eGPF;再将zp2l2、zp4l编码序列克隆到pT2AL-zp3.2-eGPF质粒中,得到可用于胚胎注射的转基因质粒pT2AL-zp3.2-zp2l2-eGPF和pT2AL-zp3.2-zp4l-eGPF。体外转录合成转座酶mRNA与构建的转基因质粒混合进行斑马鱼胚胎显微注射。通过稀有鮈鲫胚胎显微注射及三代鉴定,成功在稀有鮈鲫中敲除zp2l2和zp4l获得稳定遗传的纯合突变体;稀有鮈鲫zp转基因斑马鱼经传代鉴定,筛选得到了可稳定遗传的转基因品系Tg(zp3.2:zp2l2-eGPF)和Tg(zp3.2:zp4l-eGPF),这2个转基因品系的绿色荧光蛋白在斑马鱼胚胎发育的15d(days post fertilization,dpf)开始在卵巢和肝脏中表达。实验构建zp2l2和zp4l稀有鮈鲫突变体和转基因斑马鱼的方法,为鲤科鱼类卵型形成机制的研究提供了可行的实验思路和技术路线。展开更多
Word cloud visualization is a compelling graphical representation that visually depicts the frequency of words within a given text or dataset[1].Research on word clouds focuses on two main aspects.The first emphasizes...Word cloud visualization is a compelling graphical representation that visually depicts the frequency of words within a given text or dataset[1].Research on word clouds focuses on two main aspects.The first emphasizes processing words,such as using the latent Dirichlet allocation(LDA)algorithm to uncover topics in the documents[2],while the second involves visual impact through striking word arrangements[3,4].In the realm of extensive biomedical data,effectiveknowledge delivery to biologists is crucial.展开更多
基金supported by the National Natural Science Foundation of China(32470676 and 32170236)Central Guidance on Local Science and Technology Development Fund of Hebei Province(246Z2508G)+2 种基金Hebei Natural Science Foundation(C2020209064)Tangshan Science and Technology Program Project(21130217C)Key research project of North China University of Science and Technology(ZD-YG-202313-23).
文摘As a high-value eudicot family,many famous horticultural crop genomes have been deciphered in Oleaceae.However,there are currently no bioinformatics platforms focused on empowering genome research in Oleaceae.Herein,we developed the first comprehensive Oleaceae Genome Research Platform(OGRP,https://oleaceae.cgrpoee.top/).In OGRP,70 genomes of 10 Oleaceae species and 46 eudicots and 366 transcriptomes involving 18 Oleaceae plant tissues can be obtained.We built 34 window-operated bioinformatics tools,collected 38 professional practical software programs,and proposed 3 new pipelines,namely ancient polyploidization identification,ancestral karyotype reconstruction,and gene family evolution.Employing these pipelines to reanalyze the Oleaceae genomes,we clarified the polyploidization,reconstructed the ancestral karyotypes,and explored the effects of paleogenome evolution on genes with specific biological regulatory roles.Significantly,we generated a series of comparative genomic resources focusing on the Oleaceae,comprising 108 genomic synteny dot plots,1952225 collinear gene pairs,multiple genome alignments,and imprints of paleochromosome rearrangements.Moreover,in Oleaceae genomes,researchers can efficiently search for 1785987 functional annotations,22584 orthogroups,29582 important trait genes from 74 gene families,12664 transcription factor-related genes,9178872 transposable elements,and all involved regulatory pathways.In addition,we provided downloads and usage instructions for the tools,a species encyclopedia,ecological resources,relevant literatures,and external database links.In short,ORGP integrates rich data resources and powerful analytical tools with the characteristic of continuous updating,which can efficiently empower genome research and agricultural breeding in Oleaceae and other plants.
基金supported by the National Natural Science Foundation of China(31971865)Zhejiang Natural Science Foundation(LZ17C130001)+1 种基金the Innovation Method Project of China(2018IM0301002)the Jiangsu Collaborative Innovation Center for Modern Crop Production。
文摘Rice is one of cereal crops and a model species for monocots.Since the release of the first draft rice genome sequences in 2002,considerable progress has been achieved in rice genomic researches,thanks to rapid development and efficient utilization of bioinformatics methods and tools.In this review,we summarize the progress of studies of rice genome sequencing and other omics and introduce the wellmaintained bioinformatics databases and tools developed for rice genome resources and breeding.After reviewing the history of rice bioinformatics,we use single-cell sequencing and machine learning as examples showing how bioinformatics integrates emerging technologies and how it continues to develop for future rice research.
文摘Polyploidy is common among agriculturally important crops. Popular genetic methods and their implementations cannot always be applied to polyploid genetic data. We give an overview about available tools and their limitations in terms of levels of ploidy, auto- and allo-ploidy. The main classes of tools are genotype calling, linkage mapping and haplotyping. The usability of the tools is discussed with a focus on their applicability to data sets produced by state of the art technologies. We show that many challenges remain until the toolset for polyploidy provides similar functionalities as those which are already available for diploids. Some tools have been developed over a decade ago and are now outdated. In addition, we discuss necessary steps to overcome this shortage in the future.
文摘The massive extension in biological data induced a need for user-friendly bioinformatics tools could be used for routine biological data manipulation. Bioanalyzer is a simple analytical software implements a variety of tools to perform common data analysis on different biological data types and databases. Bioanalyzer provides general aspects of data analysis such as handling nucleotide data, fetching different data formats information, NGS quality control, data visualization, performing multiple sequence alignment and sequence BLAST. These tools accept common biological data formats and produce human-readable output files could be stored on local computer machines. Bioanalyzer has a user-friendly graphical user interface to simplify massive biological data analysis and consume less memory and processing power. Bioanalyzer source code was written through Python programming language which provides less memory usage and initial startup time. Bioanalyzer is a free and open source software, where its code could be modified, extended or integrated in different bioinformatics pipelines. Bioinformatics Produce huge data in FASTA and Genbank format which can be used to produce a lot of annotation information which can be done with Python programming language that open the door form bioinformatics tool due to their elasticity in data analysis and simplicity which inspire us to develop new multiple tool software able to manipulate FASTA and Genbank files. The goal Develop new software uses Genomic data files to produce annotated data. Software was written using python programming language and biopython packages.
文摘In this editorial preface, I briefly r eview cancer bioinformatics and introduce the four articles in this special issue highlighting important applications of the field: detection of chromatin states; detection of SNP- containing motifs and association with transcription factor-binding sites; improvements in functional enrichment modules; and gene association studies on aging and cancer. We expect this issue to provide bioinformatics scientists, cancer biologists, and clinical doctors with a better understanding of how cancer bioinformatics can be used to identify candidate biomarkers and targets and to conduct functional analysis.
基金supported by grants from Wuhan University Medical Faculty Innovation Seed Fund Cultivation Project(No.TFZZ2018025)Xiao-ping CHEN Foundation for the Development of Science and Technology of Hubei Province(No.CXPJJH12000001-2020313)the National Natural Science Foundation of China(No.81670123 and No.81670144).
文摘Objective This study aims to investigate the expression,prognostic value,and function of kinesin superfamily 4A(KIF4A)in cervical cancer.Methods Cervical cancer cell lines(Hela and SiHa)and TCGA data were used for experimental and bioinformatic analyses.Overall survival(OS)and progression free survival(PFS)were compared between patients with high or low KIF4A expression.Copy number variation(CNV)and somatic mutations of patients were visualized and GISTIC 2.0 was used to identify significantly altered sites.The function of KIF4A was also explored based on transcriptome analysis and validated by experimental methods.Chemotherapeutic and immunotherapeutic benefits were inferred using multiple reference databases and algorithms.Results Patients with high KIF4A expression had better OS and PFS.KIF4A could inhibit proliferation and migration and induce G1 arrest of cervical cancer cells.Higher CNV load was observed in patients with low KIF4A expression,while the group with low KIF4A expression displayed more significantly altered sites.A total of 13 genes were found to mutate more in the low KIF4A expression group,including NOTCH1 and PUM1.The analysis revealed that low KIF4A expression may indicate an immune escape phenotype,and patients in this group may benefit more from immunotherapy.With respect to chemotherapy,cisplatin and gemcitabine may respond better in patients with high KIF4A expression,while 5-fluorouracil etc.may be responded better in patients with low KIF4A expression Conclusion KIF4A is a tumor suppressor gene in cervical cancer,and it can be used as a prognostic and therapeutic biomarker in cervical cancer.
文摘Severe acute respiratory syndrome coronavirus(SARS-CoV)and SARS-CoV-2 are thought to transmit to humans via wild mammals,especially bats.However,evidence for direct bat-to-human transmission is lacking.Involvement of intermediate hosts is considered a reason for SARS-CoV-2 transmission to humans and emergence of outbreak.Large biodiversity is found in tropical territories,such as Brazil.On the similar line,this study aimed to predict potential coronavirus hosts among Brazilian wild mammals based on angiotensin-converting enzyme 2(ACE2)sequences using evolutionary bioinformatics.Cougar,maned wolf,and bush dogs were predicted as potential hosts for coronavirus.These indigenous carnivores are philogenetically closer to the known SARS-CoV/SARS-CoV-2 hosts and presented low ACE2 divergence.A new coronavirus transmission chain was developed in which white-tailed deer,a susceptible SARS-CoV-2 host,have the central position.Cougar play an important role because of its low divergent ACE2 level in deer and humans.The discovery of these potential coronavirus hosts will be useful for epidemiological surveillance and discovery of interventions that can contribute to break the transmission chain.
基金supported by grant CNTC-110202101039(JY-16)and YNTC-2022530000241008.
文摘Bioinformatics analysis often requires the filtering of multi-datasets,based on frequency or frequency of occurrence,for decisions on retention or deletion.Existing tools for this purpose often present a challenge with complex installation,which necessitate custom coding,thereby impeding efficient data processing activities.To address this issue,Filterx,a user-friendly command line tool that written in C language,was developed that supports multi-condition filtering,based on frequency or occurrence.This tool enables users to complete the data processing tasks through a simple command line,greatly reducing both workload and data processing time.In addition,future development of this tool could facilitate its integration into various bioinformatics data analysis pipelines.
基金supported in part by the Clinical and Translational Science Award(Grant No.UL1TR001117)to Duke University from the National Institutes of Health(NIH),United States
文摘Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contex- tualization of the term TBI, describes the discipline's brief history and past accomplishments, as well as current loci, and concludes with predictions of future directions in the field.
文摘Realizing personalized medicine requires integrating diverse data types with bioinformatics.The most vital data are genomic information for individuals that are from advanced next-generation sequencing(NGS) technologies at present.The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data.The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike.Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine.We review some key bioinformatics tasks,issues,and challenges in contexts of IT requirements,data quality,analysis tools and pipelines,and validation of biomarkers.
文摘Natural products are among the most important sources of lead molecules for drug discovery.With the development of affordable whole-genome sequencing technologies and other‘omics tools,the field of natural products research is currently undergoing a shift in paradigms.While,for decades,mainly analytical and chemical methods gave access to this group of compounds,nowadays genomics-based methods offer complementary approaches to find,identify and characterize such molecules.This paradigm shift also resulted in a high demand for computational tools to assist researchers in their daily work.In this context,this review gives a summary of tools and databases that currently are available to mine,identify and characterize natural product biosynthesis pathways and their producers based on‘omics data.A web portal called Secondary Metabolite Bioinformatics Portal(SMBP at http://www.secondarymetabolites.org)is introduced to provide a one-stop catalog and links to these bioinformatics resources.In addition,an outlook is presented how the existing tools and those to be developed will influence synthetic biology approaches in the natural products field.
文摘In the 2017 first issue of this Journal - Genomes, Proteomes and Bioinformatics - a special database article entitled "GSA: Gen- ome Sequence Archive" is published. This article provides a brief introduction to the platform developed by the authors from the BIG Data Center (BIGD) of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS). The aim of the GSA project is to collect, integrate, and archive raw sequence data submitted by domestic and international users. It is one of the major activities being carried on by a team of around 50 young bioinformaticians at BIGD. In addition to the GSA system, they are also working on several bioinformatics service-orientated projects as described in one of their recent publications .
文摘通过稀有鮈鲫(Gobiocypris rarus)卵膜蛋白质组测序鉴定黏性卵膜中富集的透明带蛋白(Zona pellucida glycoproteins,Zps),建立可用于卵型研究的稀有鮈鲫zp基因突变体和转基因斑马鱼(Danio rerio)。实验收集稀有鮈鲫胚胎(1hour post fertilization,hpf),分离收集足量卵膜,提取卵膜中的蛋白质进行测序分析。将测序数据进行拼接并与鲤科鱼类蛋白质进行比对,选取匹配度和丰度最高的基因zp2l2和zp4l进行实验。分别在zp2l2和zp4l的第一个外显子、第六个外显子设计CRISPR/Cas9敲除靶点,构建zp基因突变体。扩增斑马鱼zp3.2基因上游启动子序列(2874 base pairs,bps)和zp2l2、zp4l编码序列(1137、3009 bp),通过无缝克隆(Infusion cloning)将zp3.2启动子连接进入具有Tol2转座元件的质粒中,构得质粒pT2AL-zp3.2-eGPF;再将zp2l2、zp4l编码序列克隆到pT2AL-zp3.2-eGPF质粒中,得到可用于胚胎注射的转基因质粒pT2AL-zp3.2-zp2l2-eGPF和pT2AL-zp3.2-zp4l-eGPF。体外转录合成转座酶mRNA与构建的转基因质粒混合进行斑马鱼胚胎显微注射。通过稀有鮈鲫胚胎显微注射及三代鉴定,成功在稀有鮈鲫中敲除zp2l2和zp4l获得稳定遗传的纯合突变体;稀有鮈鲫zp转基因斑马鱼经传代鉴定,筛选得到了可稳定遗传的转基因品系Tg(zp3.2:zp2l2-eGPF)和Tg(zp3.2:zp4l-eGPF),这2个转基因品系的绿色荧光蛋白在斑马鱼胚胎发育的15d(days post fertilization,dpf)开始在卵巢和肝脏中表达。实验构建zp2l2和zp4l稀有鮈鲫突变体和转基因斑马鱼的方法,为鲤科鱼类卵型形成机制的研究提供了可行的实验思路和技术路线。
基金supported by the National Key R&D Program of China(2022YFC2704304 and 2021YFF0702000)the National Natural Science Foundation of China(32341020 and 32341021)+1 种基金Hubei Innovation Group Project(2021CFA005)the Research Core Facilities for Life Science(HUST).
文摘Word cloud visualization is a compelling graphical representation that visually depicts the frequency of words within a given text or dataset[1].Research on word clouds focuses on two main aspects.The first emphasizes processing words,such as using the latent Dirichlet allocation(LDA)algorithm to uncover topics in the documents[2],while the second involves visual impact through striking word arrangements[3,4].In the realm of extensive biomedical data,effectiveknowledge delivery to biologists is crucial.