Chloroplast is a typical plant cell organelle where photosynthesis takes place. In this study, a total of 1 808 chloroplast core proteins in Arabidopsis thaliana were reliably identified by combining the results of pr...Chloroplast is a typical plant cell organelle where photosynthesis takes place. In this study, a total of 1 808 chloroplast core proteins in Arabidopsis thaliana were reliably identified by combining the results of previously published studies and our own predictions. We then constructed a chloroplast protein interaction network primarily based on these core protein interactions. The network had 22 925 protein interaction pairs which involved 2 214 proteins. A total of 160 previously uncharacterized proteins were annotated in this network. The subunits of the photosynthetic complexes were modularized, and the functional relationships among photosystem Ⅰ (PSI), photosystem Ⅱ (PSII), light harvesting complex of photosystem Ⅰ (LHC Ⅰ) and light harvesting complex of photosystem Ⅰ (LHC Ⅱ) could be deduced from the predicted protein interactions in this network. We further confirmed an interaction between an unknown protein AT1G52220 and a photosynthetic subunit PSI-D2 by yeast two-hybrid analysis. Our chloroplast protein interaction network should be useful for functional mining of photosynthetic proteins and investigation of chloroplast-related functions at the systems biology level in Arabidopsis.展开更多
Pockets in proteins have been known to be very important for the life process. There have been several studies in the past to automatically extract the pockets from the structure information of known proteins. However...Pockets in proteins have been known to be very important for the life process. There have been several studies in the past to automatically extract the pockets from the structure information of known proteins. However, it is difficult to find a study comparing the precision of the extracted pockets from known pockets on the protein. In this paper, we propose an algorithm for extracting pockets from structure data of proteins and analyze the quality of the algorithm by comparing the extracted pockets with some known pockets. These results in this paper can be used to set the parameter values of the pocket extraction algorithm for getting better results.展开更多
In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This pape...In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This paper gave theoretical analyses and computational experiments to suggest the length limits of the input sequence. When the sequence length exceeds a certain critical point, the probability of discovering the motif decreases sharply. The work not only gave an explanation on the unsatisfying results of the existed motif discovery problems that the input sequence length might be too long and exceed the point, but also provided an estimation of input sequence length we should accept to get more meaningful and reliable results in motif discovery.展开更多
During the last decade,the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges,including access to human data,as well as transfer,storage,and sharing of enor...During the last decade,the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges,including access to human data,as well as transfer,storage,and sharing of enormous amounts of data.To promote data-driven biological research,the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station(K-BDS),which consists of multiple databases for individual data types.Here,we introduce the Korean Nucleotide Archive(KoNA),a repository of nucleotide sequence data.As of July 2022,the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects.To ensure data quality and prepare for international alignment,a standard operating procedure was adopted,which is similar to that of the International Nucleotide Sequence Database Collaboration.The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline,followed by manual examination.To ensure fast and stable data transfer,a high-speed transmission system called GBox is used in KoNA.Furthermore,the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express.This seamless coupling of KoNA,GBox,and Bio-Express enhances the data experience,including submission,access,and analysis of raw nucleotide sequences.KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics.The KoNA is available at https://www.kobic.re.kr/kona/.展开更多
Protein-protein interactions play key roles in cells. Lots of experimental approaches and in silico methods have been developed to identify and predict large-scale pro- tein-protein interactions. However, compared wit...Protein-protein interactions play key roles in cells. Lots of experimental approaches and in silico methods have been developed to identify and predict large-scale pro- tein-protein interactions. However, compared with the tradi- tionally experimental results, the high-throughput pro- tein-protein interaction data often contain the false positives in high probability. In order to fully utilize the large-scale data, it is necessary to develop bioinformatic methods for systematically evaluating those data in order to further im- prove the data reliability and mine biological information. This review summarizes the methodologies of analysis and application of high-throughput protein-protein interaction data, including the evaluation methods, the relationship be- tween protein-protein interaction data and other protein biological information, and their applications in biological study. In addition, this paper also suggests some interesting topics on mining high-throughput protein-protein interaction data.展开更多
The outbreak of Streptococcus suis re-cently in some districts of Sichuan Province in China has caused over 30 deaths and over 200 infections in human beings. In order to study the pathogenicity mechanism and to preve...The outbreak of Streptococcus suis re-cently in some districts of Sichuan Province in China has caused over 30 deaths and over 200 infections in human beings. In order to study the pathogenicity mechanism and to prevent the bacteria from spreading and infecting human beings and swine, we have annotated and analyzed the genomes of two strains, Streptococcus suis P1/7 and 89-1591 re-spective1y. The whole length of P1/7 is 2.007 Mb, and has 1969 ORFs. In contrast, the partial genome sequence of 89-1591 is 1.98 Mb in length and exists in 177 contigs with 1918 ORFs. Analysis shows that the average lengths of CDSs in two genomes are very close, and the numbers of the homolog ORFs are 1306 between those two strains. Most of the tox-icity factors of the two strains are homologeous, but there are still some significant differences between those two strains. For example, among the 11 genes (cps2A―cps2K) encoding for the capsules in P1/7, 4 (cps2A, 2B, 2I, 2J) are not detected in strain 89-1591. At the same time, the genes encoding EF and Haemolysin in P1/7 are also not found in strain 89-1591. Besides, the genes related to DNA replica-tion, repair and recombination differ from each other significantly and there also exist certain differences among the surface proteins. Those characteristics indicate that those two strains have evolved their ownspecific functions to adapt to the different environ-ments and that the pathogenesis of the two strains is different. We have accumulated comprehensive ge-nomics information for future systematic studies of S. sui. Our results are helpful for disease prevention, vaccine development, as well as drug design for S. suis.展开更多
Leptospira interrogans serovar Lai is a pathogenic bacterium that causes a spirochetal zoonosis in humans and some animals. With its complete genome sequence available, it is possible to analyze protein-protein intera...Leptospira interrogans serovar Lai is a pathogenic bacterium that causes a spirochetal zoonosis in humans and some animals. With its complete genome sequence available, it is possible to analyze protein-protein interactions from a whole- genome standpoint. Here we combine four recently developed computational approaches (gene fusion method, gene neighbor method, phylogenetic profiles method, and operon method) to predict protein-pro- tein interaction networks of Leptospira interrogans strain Lai. Through comprehensive analysis on in- teractions among proteins of motility and chemotaxis system, signal transduction, lipopolysaccaride bio- synthesis and a series of proteins related to adhesion and invasion, we provided information for further studying on its pathogenic mechanism. In addition, we also assigned 203 previously uncharacterized proteins with possible functions based on the known functions of its interacting partners. This work is helpful for further investigating L. interrogans strain Lai.展开更多
Emerging as a new field in biology recently, Systems Biology provides a branch new way to study the biological activities in organisms. In order to decode the complexity of life systematically, systems biology integra...Emerging as a new field in biology recently, Systems Biology provides a branch new way to study the biological activities in organisms. In order to decode the complexity of life systematically, systems biology integrates the "-omics" and uses the high throughput methods from transcriptomics, protomics and metabonomics to detect the dynamic activities in cell; and then, it incorporates bioinformatics methods to integrate and analyze those data, and simulate the biological processes based on the model built from those integrated data. In this paper, the current state, the research field and the methods for the Systems Biology are introduced bdefly, and then, several ideas about future development in this field are also proposed.展开更多
基金Acknowledgements We thank the RIKEN BRC in Japan for provision of all full-length cDNA in this study. National Natural Science Foundation of China (grants numbers 30530100 and 90408010), the State Key Program of Basic Research of China (grant numbers 2007CB947600 and 2007CB108800), and Hi-Tech Research and Development Program of China (grant number 2006AA02Z313) supported this project.
文摘Chloroplast is a typical plant cell organelle where photosynthesis takes place. In this study, a total of 1 808 chloroplast core proteins in Arabidopsis thaliana were reliably identified by combining the results of previously published studies and our own predictions. We then constructed a chloroplast protein interaction network primarily based on these core protein interactions. The network had 22 925 protein interaction pairs which involved 2 214 proteins. A total of 160 previously uncharacterized proteins were annotated in this network. The subunits of the photosynthetic complexes were modularized, and the functional relationships among photosystem Ⅰ (PSI), photosystem Ⅱ (PSII), light harvesting complex of photosystem Ⅰ (LHC Ⅰ) and light harvesting complex of photosystem Ⅰ (LHC Ⅱ) could be deduced from the predicted protein interactions in this network. We further confirmed an interaction between an unknown protein AT1G52220 and a photosynthetic subunit PSI-D2 by yeast two-hybrid analysis. Our chloroplast protein interaction network should be useful for functional mining of photosynthetic proteins and investigation of chloroplast-related functions at the systems biology level in Arabidopsis.
基金Project supported by Creative Research Initiative from the Ministry of Science and Technology (MOST), Korea. BHAK Jonghwa is supported by Biogreen21 Fund and MOST Funds, Korea
文摘Pockets in proteins have been known to be very important for the life process. There have been several studies in the past to automatically extract the pockets from the structure information of known proteins. However, it is difficult to find a study comparing the precision of the extracted pockets from known pockets on the protein. In this paper, we propose an algorithm for extracting pockets from structure data of proteins and analyze the quality of the algorithm by comparing the extracted pockets with some known pockets. These results in this paper can be used to set the parameter values of the pocket extraction algorithm for getting better results.
文摘In the study of motif discovery, especially the transcription factor DNA binding sites discovery, a too long input sequence would return non-informative motifs rather than those biological functional motifs. This paper gave theoretical analyses and computational experiments to suggest the length limits of the input sequence. When the sequence length exceeds a certain critical point, the probability of discovering the motif decreases sharply. The work not only gave an explanation on the unsatisfying results of the existed motif discovery problems that the input sequence length might be too long and exceed the point, but also provided an estimation of input sequence length we should accept to get more meaningful and reliable results in motif discovery.
基金supported by the Next-generation Genome-InfraNET for the advancement of genome research and service(Grant No.2019M3C9A5069653)the Construction of biological data station(Grant No.2020M3A9I6A01036057)grants from the National Research Foundation of Korea.
文摘During the last decade,the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges,including access to human data,as well as transfer,storage,and sharing of enormous amounts of data.To promote data-driven biological research,the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station(K-BDS),which consists of multiple databases for individual data types.Here,we introduce the Korean Nucleotide Archive(KoNA),a repository of nucleotide sequence data.As of July 2022,the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects.To ensure data quality and prepare for international alignment,a standard operating procedure was adopted,which is similar to that of the International Nucleotide Sequence Database Collaboration.The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline,followed by manual examination.To ensure fast and stable data transfer,a high-speed transmission system called GBox is used in KoNA.Furthermore,the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express.This seamless coupling of KoNA,GBox,and Bio-Express enhances the data experience,including submission,access,and analysis of raw nucleotide sequences.KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics.The KoNA is available at https://www.kobic.re.kr/kona/.
基金This work was supported by the 863 Hi-Tech Program(Grant Nos.2001AA231011,2002AA231051,2003AA23101 l&2004BA711A21)National 973 Key Basic Research Program(Grant Nos.2002CB713807,2003CB715901&2004CB518606)the Na-tional Natural Science Foundation of China(Grant No.904080 10).
文摘Protein-protein interactions play key roles in cells. Lots of experimental approaches and in silico methods have been developed to identify and predict large-scale pro- tein-protein interactions. However, compared with the tradi- tionally experimental results, the high-throughput pro- tein-protein interaction data often contain the false positives in high probability. In order to fully utilize the large-scale data, it is necessary to develop bioinformatic methods for systematically evaluating those data in order to further im- prove the data reliability and mine biological information. This review summarizes the methodologies of analysis and application of high-throughput protein-protein interaction data, including the evaluation methods, the relationship be- tween protein-protein interaction data and other protein biological information, and their applications in biological study. In addition, this paper also suggests some interesting topics on mining high-throughput protein-protein interaction data.
文摘The outbreak of Streptococcus suis re-cently in some districts of Sichuan Province in China has caused over 30 deaths and over 200 infections in human beings. In order to study the pathogenicity mechanism and to prevent the bacteria from spreading and infecting human beings and swine, we have annotated and analyzed the genomes of two strains, Streptococcus suis P1/7 and 89-1591 re-spective1y. The whole length of P1/7 is 2.007 Mb, and has 1969 ORFs. In contrast, the partial genome sequence of 89-1591 is 1.98 Mb in length and exists in 177 contigs with 1918 ORFs. Analysis shows that the average lengths of CDSs in two genomes are very close, and the numbers of the homolog ORFs are 1306 between those two strains. Most of the tox-icity factors of the two strains are homologeous, but there are still some significant differences between those two strains. For example, among the 11 genes (cps2A―cps2K) encoding for the capsules in P1/7, 4 (cps2A, 2B, 2I, 2J) are not detected in strain 89-1591. At the same time, the genes encoding EF and Haemolysin in P1/7 are also not found in strain 89-1591. Besides, the genes related to DNA replica-tion, repair and recombination differ from each other significantly and there also exist certain differences among the surface proteins. Those characteristics indicate that those two strains have evolved their ownspecific functions to adapt to the different environ-ments and that the pathogenesis of the two strains is different. We have accumulated comprehensive ge-nomics information for future systematic studies of S. sui. Our results are helpful for disease prevention, vaccine development, as well as drug design for S. suis.
文摘Leptospira interrogans serovar Lai is a pathogenic bacterium that causes a spirochetal zoonosis in humans and some animals. With its complete genome sequence available, it is possible to analyze protein-protein interactions from a whole- genome standpoint. Here we combine four recently developed computational approaches (gene fusion method, gene neighbor method, phylogenetic profiles method, and operon method) to predict protein-pro- tein interaction networks of Leptospira interrogans strain Lai. Through comprehensive analysis on in- teractions among proteins of motility and chemotaxis system, signal transduction, lipopolysaccaride bio- synthesis and a series of proteins related to adhesion and invasion, we provided information for further studying on its pathogenic mechanism. In addition, we also assigned 203 previously uncharacterized proteins with possible functions based on the known functions of its interacting partners. This work is helpful for further investigating L. interrogans strain Lai.
文摘Emerging as a new field in biology recently, Systems Biology provides a branch new way to study the biological activities in organisms. In order to decode the complexity of life systematically, systems biology integrates the "-omics" and uses the high throughput methods from transcriptomics, protomics and metabonomics to detect the dynamic activities in cell; and then, it incorporates bioinformatics methods to integrate and analyze those data, and simulate the biological processes based on the model built from those integrated data. In this paper, the current state, the research field and the methods for the Systems Biology are introduced bdefly, and then, several ideas about future development in this field are also proposed.