Crystal structure prediction aims to predict stable and easily experimentally synthesized materials,which accelerates the discovery of new materials.It is worth noting that the stability of materials is the basis for ...Crystal structure prediction aims to predict stable and easily experimentally synthesized materials,which accelerates the discovery of new materials.It is worth noting that the stability of materials is the basis for ensuring high performance and reliable application of materials.Among which,the thermodynamic and molecular dynamics stability is especially important.Therefore,this paper proposes a method to predict stable crystal structures using formation energy and Lennard-Jones potential as evaluation indicators.Specifically,we use graph neural network models to predict the formation energy of crystals,and employ empirical formulas to calculate the Lennard-Jones potential.Then,we apply Bayesian optimization algorithms to search for crystal structures with low formation energy and Lennard-Jones potential approaching zero,in order to ensure the thermodynamic stability and dynamics stability of materials.In addition,considering the impact of the bonding situation between atoms in the crystal on the structural stability,this article uses contact map to analyze the atomic bonding situation of each crystal to screen out more stable materials.Finally,the experimental results show that the method we proposed can not only reduce the time for crystal structure prediction,but also ensure the stability of crystal materials.展开更多
[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm su...[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.展开更多
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein...Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein sequence deduced from the cDNA sequence obtained using RT-PCR amplification was identified as tannase through sequence alignment and phylogenetic analysis.Structure models based on the tannase sequence were collected using I-TASSER,and the model with the best match to the surface charge density-pH titration profile was selected as the final structure for tannase from Aspergillusniger N5-5.This work provides an effective method for protein structure research.The structure constructed in this work should be very important to understand the enzyme bioactivities and further developments of fungi tannases.展开更多
Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some st...Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natur...Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic...The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using ...In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated ...The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.展开更多
The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the &...The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.展开更多
A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physic...A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.展开更多
RNAs have important biological functions and the functions of RNAs are generally coupled to their structures, especiallytheir secondary structures. In this work, we have made a comprehensive evaluation of the performa...RNAs have important biological functions and the functions of RNAs are generally coupled to their structures, especiallytheir secondary structures. In this work, we have made a comprehensive evaluation of the performances of existingtop RNA secondary structure prediction methods, including five deep-learning (DL) based methods and five minimum freeenergy (MFE) based methods. First, we made a brief overview of these RNA secondary structure prediction methods.Afterwards, we built two rigorous test datasets consisting of RNAs with non-redundant sequences and comprehensivelyexamined the performances of the RNA secondary structure prediction methods through classifying the RNAs into differentlength ranges and different types. Our examination shows that the DL-based methods generally perform better thanthe MFE-based methods for RNAs with long lengths and complex structures, while the MFE-based methods can achievegood performance for small RNAs and some specialized MFE-based methods can achieve good prediction accuracy forpseudoknots. Finally, we provided some insights and perspectives in modeling RNA secondary structures.展开更多
Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt ...Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.展开更多
Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local...Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with the system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching by automatic decomposition methods. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals, and the decomposition can dramatically reduce the searching space. Sufficient examples for test, including the high-pressure phases of methane, ammonia, MgAl2O4 and boron, show that these new evolution schemes can significantly improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.展开更多
Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this ...Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this paper,we established a data set of carbon materials by high-throughput calculation with available carbon structures obtained from the Samara Carbon Allotrope Database.We then trained a machine learning(ML)model that specifically predicts the elastic modulus(bulk modulus,shear modulus,and the Young's modulus)and confirmed that the accuracy is better than that of AFLOW-ML in predicting the elastic modulus of a carbon allotrope.We further combined our ML model with the CALYPSO code to search for new carbon structures with a high Young's modulus.A new carbon allotrope not included in the Samara Carbon Allotrope Database,named Cmcm-C24,which exhibits a hardness greater than 80 GPa,was firstly revealed.The Cmcm-C24 phase was identified as a semiconductor with a direct bandgap.The structural stability,elastic modulus,and electronic properties of the new carbon allotrope were systematically studied,and the obtained results demonstrate the feasibility of ML methods accelerating the material search process.展开更多
RNAs play crucial and versatile roles in biological processes.Computational prediction approaches can help to understand RNA structures and their stabilizing factors,thus providing information on their functions,and f...RNAs play crucial and versatile roles in biological processes.Computational prediction approaches can help to understand RNA structures and their stabilizing factors,thus providing information on their functions,and facilitating the design of new RNAs.Machine learning(ML)techniques have made tremendous progress in many fields in the past few years.Although their usage in protein-related fields has a long history,the use of ML methods in predicting RNA tertiary structures is new and rare.Here,we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation,the difficulties and potentials of these approaches when applied in the field.展开更多
基金supported by the Nature Science Foundation of China(Nos.61671362 and 62071366)。
文摘Crystal structure prediction aims to predict stable and easily experimentally synthesized materials,which accelerates the discovery of new materials.It is worth noting that the stability of materials is the basis for ensuring high performance and reliable application of materials.Among which,the thermodynamic and molecular dynamics stability is especially important.Therefore,this paper proposes a method to predict stable crystal structures using formation energy and Lennard-Jones potential as evaluation indicators.Specifically,we use graph neural network models to predict the formation energy of crystals,and employ empirical formulas to calculate the Lennard-Jones potential.Then,we apply Bayesian optimization algorithms to search for crystal structures with low formation energy and Lennard-Jones potential approaching zero,in order to ensure the thermodynamic stability and dynamics stability of materials.In addition,considering the impact of the bonding situation between atoms in the crystal on the structural stability,this article uses contact map to analyze the atomic bonding situation of each crystal to screen out more stable materials.Finally,the experimental results show that the method we proposed can not only reduce the time for crystal structure prediction,but also ensure the stability of crystal materials.
基金Supported by the Science Foundation of Hengyang Normal University of China(09A36)~~
文摘[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
基金the National Natural Science Foundation of China (No. 21374117)the 100 Talents Program of Chinese Academy of Sciences for financial support
文摘Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein sequence deduced from the cDNA sequence obtained using RT-PCR amplification was identified as tannase through sequence alignment and phylogenetic analysis.Structure models based on the tannase sequence were collected using I-TASSER,and the model with the best match to the surface charge density-pH titration profile was selected as the final structure for tannase from Aspergillusniger N5-5.This work provides an effective method for protein structure research.The structure constructed in this work should be very important to understand the enzyme bioactivities and further developments of fungi tannases.
基金supported by the National Natural Science Foundation of China(Grant Nos.11074191,11175132,and 11374234)the National Basic Research Programof China(Grant No.2011CB933600)the Program for New Century Excellent Talents of China(Grant No.NCET 08-0408)
文摘Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金financially supported by the National Natural Science Foundations of China (Nos. 41831288 and51672257)the Fundamental Research Funds for the Central Universities (Nos. 2652018305 and 2652017335)+3 种基金Guangdong Innovation Research Team for Higher Education (No. 2017KCXTD030)the High-Level Talents Project of Dongguan University of Technology (No. KCYKYQD2017017)Engineering Research Center of None-food Biomass Efficient Pyrolysis and Utilization Technology of Guangdong Higher Education Institutes (No. 2016GCZX009)Russian Science Foundation (No. 19-77-10013)。
文摘Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
基金This work was supported by the National Science Foundation(Grant DMR-9812351)
文摘The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
基金Supported by the National Natural Science Foundation of China(60133010,70071042,60073043)
文摘In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
基金Supported by the National Natrual Science Foundation of China (No.60373044) and Knowl-edge Innovative Project of CAS (No.KSCX2-SW-233).
文摘The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.
文摘The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.
基金The National Natural Science Founda-tion of China (No.10471051) and the National Basic Research Program (973) of China (No.2004CB318000)
文摘A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.
基金supported by grants from the National Science Foundation of China(Grant Nos.12375038 and 12075171 to ZJT,and 12205223 to YLT).
文摘RNAs have important biological functions and the functions of RNAs are generally coupled to their structures, especiallytheir secondary structures. In this work, we have made a comprehensive evaluation of the performances of existingtop RNA secondary structure prediction methods, including five deep-learning (DL) based methods and five minimum freeenergy (MFE) based methods. First, we made a brief overview of these RNA secondary structure prediction methods.Afterwards, we built two rigorous test datasets consisting of RNAs with non-redundant sequences and comprehensivelyexamined the performances of the RNA secondary structure prediction methods through classifying the RNAs into differentlength ranges and different types. Our examination shows that the DL-based methods generally perform better thanthe MFE-based methods for RNAs with long lengths and complex structures, while the MFE-based methods can achievegood performance for small RNAs and some specialized MFE-based methods can achieve good prediction accuracy forpseudoknots. Finally, we provided some insights and perspectives in modeling RNA secondary structures.
基金the National Key R&D Program of China(Grant No.2020YFA0907000)lthe National Natural Science Foundation of China(Grant Nos.32271297,62072435,31770775,and 31671369)for providing financial support for this study and publication charges.
文摘Protein structure prediction is an interdisciplinary research topic that has attracted researchers from multiple fields,including biochemistry,medicine,physics,mathematics,and computer science.These researchers adopt various research paradigms to attack the same structure prediction problem:biochemists and physicists attempt to reveal the principles governing protein folding;mathematicians,especially statisticians,usually start from assuming a probability distribution of protein structures given a target sequence and then find the most likely structure,while computer scientists formulate protein structure prediction as an optimization problem-finding the structural conformation with the lowest energy or minimizing the difference between predicted structure and native structure.These research paradigms fall into the two statistical modeling cultures proposed by Leo Breiman,namely,data modeling and algorithmic modeling.Recently,we have also witnessed the great success of deep learning in protein structure prediction.In this review,we present a survey of the efforts for protein structure prediction.We compare the research paradigms adopted by researchers from different fields,with an emphasis on the shift of research paradigms in the era of deep learning.In short,the algorithmic modeling techniques,especially deep neural networks,have considerably improved the accuracy of protein structure prediction;however,theories interpreting the neural networks and knowledge on protein folding are still highly desired.
基金support from the National Natural Science Foundation of China (Grant Nos. 11974162 and 11834006)the National Key R&D Program of China (Grant Nos. 2016YFA0300404)the Fundamental Research Funds for the Central Universities.
文摘Crystal structure prediction algorithms have become powerful tools for materials discovery in recent years, however, they are usually limited to relatively small systems. The main challenge is that the number of local minima grows exponentially with the system size. In this work, we proposed two crossover-mutation schemes based on graph theory to accelerate the evolutionary structure searching by automatic decomposition methods. These schemes can detect molecules or clusters inside periodic networks using quotient graphs for crystals, and the decomposition can dramatically reduce the searching space. Sufficient examples for test, including the high-pressure phases of methane, ammonia, MgAl2O4 and boron, show that these new evolution schemes can significantly improve the success rate and searching efficiency compared with the standard method in both isolated and extended systems.
基金This work was financlally supported by the Fundamental Research Funds for the Central Universities,the Na-tional Natural Science Foundation of China(Grant Nos.11965005 and 11964026)the 111 Project(No.B17035)the Natural Sci-ence Basie Research plan in Shaanxi Province of China(Grant Nos.2020JM-186 and 2020JM-621).
文摘Based on structure prediction method,the machine learning method is used instead of the density functional theory(DFT)method to predict the material properties,thereby accelerating the material search process.In this paper,we established a data set of carbon materials by high-throughput calculation with available carbon structures obtained from the Samara Carbon Allotrope Database.We then trained a machine learning(ML)model that specifically predicts the elastic modulus(bulk modulus,shear modulus,and the Young's modulus)and confirmed that the accuracy is better than that of AFLOW-ML in predicting the elastic modulus of a carbon allotrope.We further combined our ML model with the CALYPSO code to search for new carbon structures with a high Young's modulus.A new carbon allotrope not included in the Samara Carbon Allotrope Database,named Cmcm-C24,which exhibits a hardness greater than 80 GPa,was firstly revealed.The Cmcm-C24 phase was identified as a semiconductor with a direct bandgap.The structural stability,elastic modulus,and electronic properties of the new carbon allotrope were systematically studied,and the obtained results demonstrate the feasibility of ML methods accelerating the material search process.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11774158,11974173,11774157,and 11934008)。
文摘RNAs play crucial and versatile roles in biological processes.Computational prediction approaches can help to understand RNA structures and their stabilizing factors,thus providing information on their functions,and facilitating the design of new RNAs.Machine learning(ML)techniques have made tremendous progress in many fields in the past few years.Although their usage in protein-related fields has a long history,the use of ML methods in predicting RNA tertiary structures is new and rare.Here,we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation,the difficulties and potentials of these approaches when applied in the field.