Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of...Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illus- trate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more sig- nificantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.展开更多
Copepods are among the most abundant and successful metazoans in the marine ecosystem. However, genomic resources related to fundamental cellular processes are still limited in this particular group of crustaceans. Ri...Copepods are among the most abundant and successful metazoans in the marine ecosystem. However, genomic resources related to fundamental cellular processes are still limited in this particular group of crustaceans. Ribosomal proteins are the building blocks of ribosomes, the primary site for protein synthesis. In this study, we characterized and analyzed the c DNAs of cytoplasmic ribosomal proteins(c RPs) of two calanoid copepods, P seudodiaptomus poplesia and A cartia pacifi ca. We obtained 79 c RP c DNAs from P. poplesia and 67 from A. pacifi ca by c DNA library construction/sequencing and rapid amplifi cation of c DNA ends. Analysis of the nucleic acid composition showed that the copepod c RP-encoding genes had higher GC content in the protein-coding regions(CDSs) than in the untranslated regions(UTRs), and single nucleotide repeats(>3 repeats) were common, with "A" repeats being the most frequent, especially in the CDSs. The 3′-UTRs of the c RP genes were signifi cantly longer than the 5′-UTRs. Codon usage analysis showed that the third positions of the codons were dominated by C or G. The deduced amino acid sequences of the c RPs contained high proportions of positively charged residues and had high p I values. This is the fi rst report of a complete set of c RP-encoding genes from copepods. Our results shed light on the characteristics of c RPs in copepods, and provide fundamental data for further studies of protein synthesis in copepods. The copepod c RP information revealed in this study indicates that additional comparisons and analysis should be performed on different taxonomic categories such as orders and families.展开更多
COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months,and a global fight against both has been intensifying.Here,we describe an analysis procedure where genome...COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months,and a global fight against both has been intensifying.Here,we describe an analysis procedure where genome composition and its variables are related,through the genetic code to molecular mechanisms,based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex.Our analysis starts with primary sequence information,identity-based phylogeny based on 22,051 SARS-CoV-2 sequences,and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades.All are tailored to two key mechanisms:strand-biased and function-associated mutations.Our findings are listed as follows:1)The most dominant mutation is C-to-U permutation,whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity,albeit assumed most slightly deleterious.2)The second abundance group includes three negative-strand mutations(U-to-C,A-to-G,and G-to-A)and a positive-strand mutation(G-to-U)due to DNA repair mechanisms after cellular abasic events.3)A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis.4)Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes.These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes,to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications.Such actions are in desperate need,especially in the middle of the War against COVID-19.展开更多
The study investigated the genetic variation of Parachanna obscura from five rivers(Anambra,Ibbi,Imo,Katsina-Ala and Ogun)in Nigeria using the mitochondrial cytochrome oxidase 1 gene.DNA was extracted from 19,22,16,18...The study investigated the genetic variation of Parachanna obscura from five rivers(Anambra,Ibbi,Imo,Katsina-Ala and Ogun)in Nigeria using the mitochondrial cytochrome oxidase 1 gene.DNA was extracted from 19,22,16,18 and 21 fin clips per river population,respectively and subjected to polymerase chain reaction.A total of 96 sequences,each with 671 bp were obtained with 38(5.6%)polymorphic,27(3.8%)parsimoniously informative and 659(98.2%)conserved sites.Mean nucleotide composition was C=28.07%,T=29.43%,A=22.18%,G=20.32%.A total of 40 haplotypes with 38 unique sequences as well as 24 substitutions with 22 transversions and two transitions were obtained.Nucleotide diversity among populations ranged from 0.00184 to 0.00888 representing Ibbi and Imo,respectively while haplotype diversity ranged from 0.77056 to 0.95000 also,from Ibbi and Imo,respectively.Analyses of molecular variance showed that the intra-population variation accounted for 50.05%.Topology from phylogenetic analyses revealed that P.obscura from Imo River was distinctly different from the rest.展开更多
Horizontal gene transfer (HGT) has long been considered as a principal force for an organism to gain novel genes in genome evolution. Homology search, phylogenetic analysis and nucleotide composition analysis are th...Horizontal gene transfer (HGT) has long been considered as a principal force for an organism to gain novel genes in genome evolution. Homology search, phylogenetic analysis and nucleotide composition analysis are three major objective approaches to arguably determine the occurrence and directionality of HGT. Here, 21 genes that possess the potential to horizontal transfer were acquired from the whole genome of Magnaporthe grisea according to annotation, among which three candidate genes (corresponding protein accession numbers are EAA55123, EAA47200 and EAA52136) were selected for further analysis. According to BLAST homology results, we subsequently conducted phylogenetic analysis of the three candidate HGT genes. Moreover, nucleotide composition analysis was conducted to further validate these HGTs. In addition, the functions of the three candidate genes were searched in COG database. Consequently, we conclude that the gene encoding protein EAA55123 is transferred from Clostridium perfringens. Another HGT event is between EAA52136 and a certain metazoan's corresponding gene, but the direction remains uncertain. Yet, EAA47200 is not a transferred gene.展开更多
基金supported by grants from Knowledge Innovation Program of the Chinese Academy of Sciences(Grant No.KSCX2-EW-R-01-04)Natural Science Foundation of China(Grant No.90919024 and 30900831)+2 种基金the Ministry of Science and Technology of China as the National Science and Technology Key Project (Grant No.2008ZX10004-013)the Special Foundation Work Program(Grant No.2009FY120100)the National Basic Research Program(Grant No. 2011CB944100)
文摘Although strand-biased gene distribution (SGD) was described some two decades ago, the underlying molecular mechanisms and their relationship remain elusive. Its facets include, but are not limited to, the degree of biases, the strand-preference of genes, and the influence of background nucleotide composition variations. Using a dataset composed of 364 non-redundant bacterial genomes, we sought to illus- trate our current understanding of SGD. First, when we divided the collection of bacterial genomes into non-polC and polC groups according to their possession of DnaE isoforms that correlate closely with taxonomy, the SGD of the polC group stood out more sig- nificantly than that of the non-polC group. Second, when examining horizontal gene transfer, coupled with gene functional conservation (essentiality) and expressivity (level of expression), we realized that they all contributed to SGD. Third, we further demonstrated a weaker G-dominance on the leading strand of the non-polC group but strong purine dominance (both G and A) on the leading strand of the polC group. We propose that strand-biased nucleotide composition plays a decisive role for SGD since the polC-bearing genomes are not only AT-rich but also have pronounced purine-rich leading strands, and we believe that a special mutation spectrum that leads to a strong purine asymmetry and a strong strand-biased nucleotide composition coupled with functional selections for genes and their functions are both at work.
基金Supported by the National Natural Science Foundation of China(Nos.31372509,41328009)the National Science Foundation for Young Scientists of China(No.41106095)
文摘Copepods are among the most abundant and successful metazoans in the marine ecosystem. However, genomic resources related to fundamental cellular processes are still limited in this particular group of crustaceans. Ribosomal proteins are the building blocks of ribosomes, the primary site for protein synthesis. In this study, we characterized and analyzed the c DNAs of cytoplasmic ribosomal proteins(c RPs) of two calanoid copepods, P seudodiaptomus poplesia and A cartia pacifi ca. We obtained 79 c RP c DNAs from P. poplesia and 67 from A. pacifi ca by c DNA library construction/sequencing and rapid amplifi cation of c DNA ends. Analysis of the nucleic acid composition showed that the copepod c RP-encoding genes had higher GC content in the protein-coding regions(CDSs) than in the untranslated regions(UTRs), and single nucleotide repeats(>3 repeats) were common, with "A" repeats being the most frequent, especially in the CDSs. The 3′-UTRs of the c RP genes were signifi cantly longer than the 5′-UTRs. Codon usage analysis showed that the third positions of the codons were dominated by C or G. The deduced amino acid sequences of the c RPs contained high proportions of positively charged residues and had high p I values. This is the fi rst report of a complete set of c RP-encoding genes from copepods. Our results shed light on the characteristics of c RPs in copepods, and provide fundamental data for further studies of protein synthesis in copepods. The copepod c RP information revealed in this study indicates that additional comparisons and analysis should be performed on different taxonomic categories such as orders and families.
基金This work was supported by grants from The Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA19090116 to SS,Grant No.XDA19050302 to ZZ)National Key R&D Program of China(Grant Nos.2020YFC0848900 and 2017YFC0907502)+4 种基金13th Five-year Informatization Plan of Chinese Academy of Sciences(Grant No.XXH13505-05)K.C.Wong Education Foundation to ZZ,and International Partnership Program of the Chinese Academy of Sciences(Grant No.153F11KYSB20160008)The Youth Innovation Promotion Association of Chinese Academy of Science(Grant No.2017141 to SS)National Natural Science Foundation of China(Grant No.31671350 to JY)the Key Research Program of Frontier Sciences,Chinese Academy of Sciences(Grant No.QYZDY-SSW-SMC017 to JY).
文摘COVID-19 and its causative pathogen SARS-CoV-2 have rushed the world into a staggering pandemic in a few months,and a global fight against both has been intensifying.Here,we describe an analysis procedure where genome composition and its variables are related,through the genetic code to molecular mechanisms,based on understanding of RNA replication and its feedback loop from mutation to viral proteome sequence fraternity including effective sites on the replicase-transcriptase complex.Our analysis starts with primary sequence information,identity-based phylogeny based on 22,051 SARS-CoV-2 sequences,and evaluation of sequence variation patterns as mutation spectra and its 12 permutations among organized clades.All are tailored to two key mechanisms:strand-biased and function-associated mutations.Our findings are listed as follows:1)The most dominant mutation is C-to-U permutation,whose abundant second-codon-position counts alter amino acid composition toward higher molecular weight and lower hydrophobicity,albeit assumed most slightly deleterious.2)The second abundance group includes three negative-strand mutations(U-to-C,A-to-G,and G-to-A)and a positive-strand mutation(G-to-U)due to DNA repair mechanisms after cellular abasic events.3)A clade-associated biased mutation trend is found attributable to elevated level of negative-sense strand synthesis.4)Within-clade permutation variation is very informative for associating non-synonymous mutations and viral proteome changes.These findings demand a platform where emerging mutations are mapped onto mostly subtle but fast-adjusting viral proteomes and transcriptomes,to provide biological and clinical information after logical convergence for effective pharmaceutical and diagnostic applications.Such actions are in desperate need,especially in the middle of the War against COVID-19.
基金The authors appreciate the International Foundation for Science(IFS),Stockholm,Sweden for funding this research through their Grant Number I-2-A-6090-1 provided to Friday Elijah Osho to study the phenotypic and genetic characterization of Parachanna obscura from Nigeria’s freshwater environments.
文摘The study investigated the genetic variation of Parachanna obscura from five rivers(Anambra,Ibbi,Imo,Katsina-Ala and Ogun)in Nigeria using the mitochondrial cytochrome oxidase 1 gene.DNA was extracted from 19,22,16,18 and 21 fin clips per river population,respectively and subjected to polymerase chain reaction.A total of 96 sequences,each with 671 bp were obtained with 38(5.6%)polymorphic,27(3.8%)parsimoniously informative and 659(98.2%)conserved sites.Mean nucleotide composition was C=28.07%,T=29.43%,A=22.18%,G=20.32%.A total of 40 haplotypes with 38 unique sequences as well as 24 substitutions with 22 transversions and two transitions were obtained.Nucleotide diversity among populations ranged from 0.00184 to 0.00888 representing Ibbi and Imo,respectively while haplotype diversity ranged from 0.77056 to 0.95000 also,from Ibbi and Imo,respectively.Analyses of molecular variance showed that the intra-population variation accounted for 50.05%.Topology from phylogenetic analyses revealed that P.obscura from Imo River was distinctly different from the rest.
基金supported in part by grants from the National Natural Science Foundation of China (General Programs No. 30270331 and No. 30670469)Director Fund of the State Key Laboratory of Oral Diseases (Sichuan University)+1 种基金the Science and Technology Fund for Distinguished Young Scholars of Sichuan Province (No.06ZQ026-035)the Key Technologies R&D Program of Sichuan Province (2006Z08-010)
文摘Horizontal gene transfer (HGT) has long been considered as a principal force for an organism to gain novel genes in genome evolution. Homology search, phylogenetic analysis and nucleotide composition analysis are three major objective approaches to arguably determine the occurrence and directionality of HGT. Here, 21 genes that possess the potential to horizontal transfer were acquired from the whole genome of Magnaporthe grisea according to annotation, among which three candidate genes (corresponding protein accession numbers are EAA55123, EAA47200 and EAA52136) were selected for further analysis. According to BLAST homology results, we subsequently conducted phylogenetic analysis of the three candidate HGT genes. Moreover, nucleotide composition analysis was conducted to further validate these HGTs. In addition, the functions of the three candidate genes were searched in COG database. Consequently, we conclude that the gene encoding protein EAA55123 is transferred from Clostridium perfringens. Another HGT event is between EAA52136 and a certain metazoan's corresponding gene, but the direction remains uncertain. Yet, EAA47200 is not a transferred gene.