[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm su...[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.展开更多
Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate t...Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.展开更多
Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein...Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein sequence deduced from the cDNA sequence obtained using RT-PCR amplification was identified as tannase through sequence alignment and phylogenetic analysis.Structure models based on the tannase sequence were collected using I-TASSER,and the model with the best match to the surface charge density-pH titration profile was selected as the final structure for tannase from Aspergillusniger N5-5.This work provides an effective method for protein structure research.The structure constructed in this work should be very important to understand the enzyme bioactivities and further developments of fungi tannases.展开更多
Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some st...Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.展开更多
The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier us...The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.展开更多
Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natur...Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.展开更多
Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure ...Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.展开更多
The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic...The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.展开更多
Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of...Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.展开更多
In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using ...In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.展开更多
A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudokno...A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.展开更多
The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated ...The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.展开更多
The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the &...The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.展开更多
A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physic...A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.展开更多
RNAs have important biological functions and the functions of RNAs are generally coupled to their structures, especiallytheir secondary structures. In this work, we have made a comprehensive evaluation of the performa...RNAs have important biological functions and the functions of RNAs are generally coupled to their structures, especiallytheir secondary structures. In this work, we have made a comprehensive evaluation of the performances of existingtop RNA secondary structure prediction methods, including five deep-learning (DL) based methods and five minimum freeenergy (MFE) based methods. First, we made a brief overview of these RNA secondary structure prediction methods.Afterwards, we built two rigorous test datasets consisting of RNAs with non-redundant sequences and comprehensivelyexamined the performances of the RNA secondary structure prediction methods through classifying the RNAs into differentlength ranges and different types. Our examination shows that the DL-based methods generally perform better thanthe MFE-based methods for RNAs with long lengths and complex structures, while the MFE-based methods can achievegood performance for small RNAs and some specialized MFE-based methods can achieve good prediction accuracy forpseudoknots. Finally, we provided some insights and perspectives in modeling RNA secondary structures.展开更多
Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA ...Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes.Thus,a plethora of approaches have been developed to predict,identify,or solve RNA structures based on various computational,molecular,genetic,chemical,or physicochemical strategies.Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation,time,speed,cost,and throughput,but they strongly underperform in terms of accuracy that significantly limits their broader application.Nonetheless,the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs.Here,we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity.We found that while many programs performed well in relatively simple tasks,their performance varied significantly in more complex RNA folding problems.However,in general,a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures,at least based on the specific class of sequences tested,suggesting that it may represent the future of RNA structure prediction algorithms.展开更多
HER2 protein overexpression is associated with the malignant degree and poor prognosis of breast cancer.HER2 levels are elevated in 20%of breast tumors.Several covalent tyrosine kinase inhibitors have been found to re...HER2 protein overexpression is associated with the malignant degree and poor prognosis of breast cancer.HER2 levels are elevated in 20%of breast tumors.Several covalent tyrosine kinase inhibitors have been found to reduce tumor cell survival and proliferation in vitro and inhibit downstream HER2 signaling.In the field of protein structure prediction,AlphaFold2,which achieved excellent results in CASP14,can periodically predict protein structures with atomic precision in the absence of similar protein structures.In this study,AlphaFold2 was used to predict the monomeric structure of the HER2 protein.This predicted structure was compared to the conformation of HER2 in complex with a covalent inhibitor,allowing for an examination of the conformational changes induced by the inhibitor.By combining the conformational changes of HER2 protein with the docking results of Protein-Ligand Interaction Profiler,other potential binding sites were identified,which could further reveal the mechanism of drug discovery.展开更多
High-entropy alloys(HEA)are novel materials obtained by introducing chemical disorder through mixing multiple-principal components,performing rather attractive features together with charming and exceptional propertie...High-entropy alloys(HEA)are novel materials obtained by introducing chemical disorder through mixing multiple-principal components,performing rather attractive features together with charming and exceptional properties in comparison with traditional alloys.However,the trade-off relationship is still present between strength and ductility in HEAs,significantly limiting the practical and wide application of HEAs.Moreover,the preparation of HEAs by trial-and-error method is time-consuming and resource-wasting,hindering the high-speed and high-quality development of HEAs.Herein,the primary objective of this work is to summarize the latest advancements in HEAs,focusing on methods for predicting phase structures and the factors influencing mechanical properties.Additionally,strengthening and toughening strategies for HEAs are highlighted,thus maximizing their application potential.Besides,challenges and future investigation direction of HEAs are also identified and proposed.展开更多
As an extreme physical condition,high pressure serves as a potent means to substantially modify the interatomic distances and bonding patterns within condensed matter,thereby enabling the macroscopic manipulation of m...As an extreme physical condition,high pressure serves as a potent means to substantially modify the interatomic distances and bonding patterns within condensed matter,thereby enabling the macroscopic manipulation of material properties.We employed the CALYPSO method to predict the stable structures of RbB_(2)C_(4)across the pressure range from 0 GPa to 100 GPa and investigated its physical properties through first-principles calculations.Specially,we found four novel structures,namely,P6_(3)/mcm-,Amm2-,P1-,and I4/mmm-RbB_(2)C_(4).Under pressure conditions,electronic structure calculations reveal that all of them exhibit metallic characteristics.The calculation results of formation enthalpy show that the P6_(3)/mcm structure can be synthesized within the pressure range of 0–40 GPa.Specially,the Amm2,P1,and I4/mmm structures can be synthesized above 4 GPa,6 GPa,10 GPa,respectively.Moreover,the estimated Vickers hardness value of I4/mmm-RbB_(2)C_(4)compound is 47 GPa,suggesting that it is a superhard material.Interestingly,this study uncovers the continuous transformation of the crystal structure of RbB_(2)C_(4)from a layered configuration to folded and tubular forms,ultimately attaining a stabilized cage-like structure under the pressure span of 0–100 GPa.The application of pressure offers a formidable impetus for the advancement and innovation in condensed matter physics,facilitating the exploration of novel states and functions of matter.展开更多
Novel ordered intermetallic compounds have stimulated much interest.Ru–Al alloys are a prominent class of hightemperature structural materials,but the experimentally reported crystal structure of the intermetallic Ru...Novel ordered intermetallic compounds have stimulated much interest.Ru–Al alloys are a prominent class of hightemperature structural materials,but the experimentally reported crystal structure of the intermetallic Ru_(2)Al_(5) phase remains elusive and debatable.To resolve this controversy,we extensively explored the crystal structures of Ru_(2)Al_(5) using first-principles calculations combined with crystal structure prediction technique.Among the calculated x-ray diffraction patterns and lattice parameters of five candidate Ru2Al5structures,those of the orthorhombic Pmmn structure best aligned with recent experimental results.The structural stabilities of the five Ru_(2)Al_(5)structures were confirmed through formation energy,elastic constants,and phonon spectrum calculations.We also comprehensively analyzed the mechanical and electronic properties of the five candidates.This work can guide the exploration of novel ordered intermetallic compounds in Ru–Al alloys.展开更多
基金Supported by the Science Foundation of Hengyang Normal University of China(09A36)~~
文摘[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.
基金supported by the National Natural Science Foundation of China(Nos.61671288,91530321,61603161)Science and Technology Commission of Shanghai Municipality(Nos.16JC1404300,17JC1403500,16ZR1448700)
文摘Membrane proteins are an important kind of proteins embedded in the membranes of cells and play crucial roles in living organisms, such as ion channels,transporters, receptors. Because it is difficult to determinate the membrane protein's structure by wet-lab experiments,accurate and fast amino acid sequence-based computational methods are highly desired. In this paper, we report an online prediction tool called Mem Brain, whose input is the amino acid sequence. Mem Brain consists of specialized modules for predicting transmembrane helices, residue–residue contacts and relative accessible surface area of a-helical membrane proteins. Mem Brain achieves aprediction accuracy of 97.9% of ATMH, 87.1% of AP,3.2 ± 3.0 of N-score, 3.1 ± 2.8 of C-score. Mem BrainContact obtains 62%/64.1% prediction accuracy on training and independent dataset on top L/5 contact prediction,respectively. And Mem Brain-Rasa achieves Pearson correlation coefficient of 0.733 and its mean absolute error of13.593. These prediction results provide valuable hints for revealing the structure and function of membrane proteins.Mem Brain web server is free for academic use and available at www.csbio.sjtu.edu.cn/bioinf/Mem Brain/.
基金the National Natural Science Foundation of China (No. 21374117)the 100 Talents Program of Chinese Academy of Sciences for financial support
文摘Tannases produced by filamentous fungi are in a family of important hydrolases of gallotannins and have broad industry applications.But until now,the 3-D structures of fungi tannases have not been reported.The protein sequence deduced from the cDNA sequence obtained using RT-PCR amplification was identified as tannase through sequence alignment and phylogenetic analysis.Structure models based on the tannase sequence were collected using I-TASSER,and the model with the best match to the surface charge density-pH titration profile was selected as the final structure for tannase from Aspergillusniger N5-5.This work provides an effective method for protein structure research.The structure constructed in this work should be very important to understand the enzyme bioactivities and further developments of fungi tannases.
基金supported by the National Natural Science Foundation of China(Grant Nos.11074191,11175132,and 11374234)the National Basic Research Programof China(Grant No.2011CB933600)the Program for New Century Excellent Talents of China(Grant No.NCET 08-0408)
文摘Many recent exciting discoveries have revealed the versatility of RNAs and their importance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three- dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.
文摘The algorithm based on combination learning usually is superior to a singleclassification algorithm on the task of protein secondary structure prediction. However,the assignment of the weight of the base classifier usually lacks decision-makingevidence. In this paper, we propose a protein secondary structure prediction method withdynamic self-adaptation combination strategy based on entropy, where the weights areassigned according to the entropy of posterior probabilities outputted by base classifiers.The higher entropy value means a lower weight for the base classifier. The final structureprediction is decided by the weighted combination of posterior probabilities. Extensiveexperiments on CB513 dataset demonstrates that the proposed method outperforms theexisting methods, which can effectively improve the prediction performance.
基金financially supported by the National Natural Science Foundations of China (Nos. 41831288 and51672257)the Fundamental Research Funds for the Central Universities (Nos. 2652018305 and 2652017335)+3 种基金Guangdong Innovation Research Team for Higher Education (No. 2017KCXTD030)the High-Level Talents Project of Dongguan University of Technology (No. KCYKYQD2017017)Engineering Research Center of None-food Biomass Efficient Pyrolysis and Utilization Technology of Guangdong Higher Education Institutes (No. 2016GCZX009)Russian Science Foundation (No. 19-77-10013)。
文摘Mineral apatite compounds have attracted significant interest due to their chemical stability and adjustable hexagonal structure,which makes them suitable as new photovoltaic functional materials.The band gap of natural apatite is ~5.45 eV,and such a large value limits their applications in the field of catalysis and energy devices.In this research,we designed a method to narrow the band gap via the tetrahedral substitution effect in apatite-based compounds.The density functional theory(DFT) and experimental investigation of the electronic and optical properties revealed that the continuous incorporation of [MO_(4)]^(4-) tetrahedrons(M=Si,Ge,Sn,and Mn) into the crystal lattice can significantly reduce the band gap.In particular,this phenomenon was observed when the[MnO_(4)]^(4-) tetrahedron replaces the [PO_(4)]^(4-) tetrahedron because of the formation of a Mn 3 d-derived conduction band minimum(CBM) and interacts with other elements,leading to band broadening and obvious reduction of the band gap.This approach allowed us to propose a novel scheme in the band gap engineering of apatite-based compounds toward an entire spectral range modification.
文摘Protein structure prediction is one of the most essential objectives practiced by theoretical chemistry and bioinformatics as it is of a vital importance in medicine,biotechnology and more.Protein secondary structure prediction(PSSP)has a significant role in the prediction of protein tertiary structure,as it bridges the gap between the protein primary sequences and tertiary structure prediction.Protein secondary structures are classified into two categories:3-state category and 8-state category.Predicting the 3 states and the 8 states of secondary structures from protein sequences are called the Q3 prediction and the Q8 prediction problems,respectively.The 8 classes of secondary structures reveal more precise structural information for a variety of applications than the 3 classes of secondary structures,however,Q8 prediction has been found to be very challenging,that is why all previous work done in PSSP have focused on Q3 prediction.In this paper,we develop an ensemble Machine Learning(ML)approach for Q8 PSSP to explore the performance of ensemble learning algorithms compared to that of individual ML algorithms in Q8 PSSP.The ensemble members considered for constructing the ensemble models are well known classifiers,namely SVM(Support Vector Machines),KNN(K-Nearest Neighbor),DT(Decision Tree),RF(Random Forest),and NB(Naïve Bayes),with two feature extraction techniques,namely LDA(Linear Discriminate Analysis)and PCA(Principal Component Analysis).Experiments have been conducted for evaluating the performance of single models and ensemble models,with PCA and LDA,in Q8 PSSP.The novelty of this paper lies in the introduction of ensemble learning in Q8 PSSP problem.The experimental results confirmed that ensemble ML models are more accurate than individual ML models.They also indicated that features extracted by LDA are more effective than those extracted by PCA.
基金This work was supported by the National Science Foundation(Grant DMR-9812351)
文摘The structure type for the crystal of 4,4'-bis-(2-hydroxy-ethoxyl)-biphenyl 1 has been predicted by using the previously developed interfacial model for small organic molecules. Based on the calculated hydrophobic to hydrophilic volume of 1, this model predicts the crystal structure to be of lamellar or bicontinuous type, which has been confirmed by the X-ray single-crystal structure analysis (C20H26O6, monoclinic, P21/C, a = 16.084(1), b = 6.0103(4), c = 9.6410(7) A, β9 = 103.014(2)°, V= 908.1(1) A3, Z = 2, Dc= 1.325 g/cm3, F(000)=388,μ = 0.097 mm-1, MoKα radiation, λ = 0.71073 A, R = 0.0382 and wR = 0.0882 with I > 2σ(I) for 7121 reflections collected, 1852 unique reflections and 170 parameters). As predicted, the hydrophobic and hydrophilic portions of 1 form in the lamellae. The same interfacial model is applied to other amphilphilic small molecule organic systems for structural type prediction.
基金Project supported by the National Natural Science Foundation of China(Grant No.31570722).
文摘Secondary structures of RNAs are the basis of understanding their tertiary structures and functions and so their predictions are widely needed due to increasing discovery of noncoding RNAs.In the last decades,a lot of methods have been proposed to predict RNA secondary structures but their accuracies encountered bottleneck.Here we present a method for RNA secondary structure prediction using direct coupling analysis and a remove-and-expand algorithm that shows better performance than four existing popular multiple-sequence methods.We further show that the results can also be used to improve the prediction accuracy of the single-sequence methods.
基金Supported by the National Natural Science Foundation of China(60133010,70071042,60073043)
文摘In this paper, the applications of evolutionary algorithm in prediction of protein secondary structure and tertiary structures are introduced, and recent studies on solving protein structure prediction problems using evolutionary algorithms are reviewed, and the challenges and prospects of EAs applied to protein structure modeling are analyzed and discussed.
文摘A simple stepwise folding process has been developed to simulate RNA secondary structure formation.Modifications for the energy parameters of various loops were included in the program.Five possible types of pseudoknots including the well known H-type pseudoknot were permitted to occur if reasonable.We have applied this approach to e number of RNA sequences.The prediction accuracies we obtained were higher than those in published papers.
基金Supported by the National Natrual Science Foundation of China (No.60373044) and Knowl-edge Innovative Project of CAS (No.KSCX2-SW-233).
文摘The architecture of a BioAccel (internal code) chip for RNA secondary structure prediction is described in the letter. The system is based on a BioBus (internal code), whose distinguishing features are: Two separated control and data channels, and a slave-associated arbitration scheme. Two reference systems based on the AMBA AHB bus and Coreconnect bus are introduced to evaluate the performance of the system. The simulation results are attractive. The average communication bandwidth of the chip is increased at severalfold, and the read and write latencies are reduced about 40 percent.
文摘The hydrophobic-polar (HP) lattice model is an important simplified model for studying protein folding. In this paper, we present an improved ACO algorithm for the protein structure prediction. In the algorithm, the "lone"ethod is applied to deal with the infeasible structures, and the "oint mutation and reconstruction"ethod is applied in local search phase. The empirical results show that the presented method is feasible and effective to solve the problem of protein structure prediction, and notable improvements in CPU time are obtained.
基金The National Natural Science Founda-tion of China (No.10471051) and the National Basic Research Program (973) of China (No.2004CB318000)
文摘A three-dimensional off-lattice protein model with two species of monomers, hydrophobic and hydrophilic, is studied. Enligh- tened by the law of reciprocity among things in the physical world, a heuristic quasi-physical algorithm for protein structure prediction problem is put forward. First, by elaborately simulating the movement of the smooth elastic balls in the physical world, the algorithm finds low energy configurations for a given monomer chain. An "off-trap" strategy is then proposed to get out of local minima. Experimental results show promising performance. For all chains with lengths 13≤n ≤55, the proposed algorithm finds states with lower energy than the putative ground states reported in literatures. Furthermore, for chain lengths n = 21, 34, and 55, the algorithm finds new low energy configurations different from those given in literatures.
基金supported by grants from the National Science Foundation of China(Grant Nos.12375038 and 12075171 to ZJT,and 12205223 to YLT).
文摘RNAs have important biological functions and the functions of RNAs are generally coupled to their structures, especiallytheir secondary structures. In this work, we have made a comprehensive evaluation of the performances of existingtop RNA secondary structure prediction methods, including five deep-learning (DL) based methods and five minimum freeenergy (MFE) based methods. First, we made a brief overview of these RNA secondary structure prediction methods.Afterwards, we built two rigorous test datasets consisting of RNAs with non-redundant sequences and comprehensivelyexamined the performances of the RNA secondary structure prediction methods through classifying the RNAs into differentlength ranges and different types. Our examination shows that the DL-based methods generally perform better thanthe MFE-based methods for RNAs with long lengths and complex structures, while the MFE-based methods can achievegood performance for small RNAs and some specialized MFE-based methods can achieve good prediction accuracy forpseudoknots. Finally, we provided some insights and perspectives in modeling RNA secondary structures.
基金supported by the National Natural Science Foundation of China(Grant No.32000462 to Fei Qi,Grant No.32170619 to Philipp Kapranovand Grant No.32201055 to Yue Chen)+2 种基金the Research Fund for International Senior Scientists from the National Natural Science Foundation of China(Grant No.32150710525 to Philipp Kapranov)the Natural Science Foundation of Fujian Province,China(Grant No.2020J02006 to Philipp Kapranov)the Scientific Research Funds of Huaqiao University,China(Grant No.22BS114 to Fei Qi,Grant No.21BS127 to Yue Chen,and Grant No.15BS101 to Philipp Kapranov).
文摘Accurate identification of the correct,biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes.Thus,a plethora of approaches have been developed to predict,identify,or solve RNA structures based on various computational,molecular,genetic,chemical,or physicochemical strategies.Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation,time,speed,cost,and throughput,but they strongly underperform in terms of accuracy that significantly limits their broader application.Nonetheless,the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs.Here,we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity.We found that while many programs performed well in relatively simple tasks,their performance varied significantly in more complex RNA folding problems.However,in general,a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures,at least based on the specific class of sequences tested,suggesting that it may represent the future of RNA structure prediction algorithms.
文摘HER2 protein overexpression is associated with the malignant degree and poor prognosis of breast cancer.HER2 levels are elevated in 20%of breast tumors.Several covalent tyrosine kinase inhibitors have been found to reduce tumor cell survival and proliferation in vitro and inhibit downstream HER2 signaling.In the field of protein structure prediction,AlphaFold2,which achieved excellent results in CASP14,can periodically predict protein structures with atomic precision in the absence of similar protein structures.In this study,AlphaFold2 was used to predict the monomeric structure of the HER2 protein.This predicted structure was compared to the conformation of HER2 in complex with a covalent inhibitor,allowing for an examination of the conformational changes induced by the inhibitor.By combining the conformational changes of HER2 protein with the docking results of Protein-Ligand Interaction Profiler,other potential binding sites were identified,which could further reveal the mechanism of drug discovery.
基金supported by the National Natural Science Foundation of China(Nos.52375451,52005396)Shandong Provincial Natural Science Foundation,China(Nos.ZR2023YQ052,ZR2023ME087)+6 种基金Shandong Provincial Technological SME Innovation Capability Promotion Project,China(No.2023TSGC0375)Young Taishan Scholars Program of Shandong Province,China(No.tsqn202306041)Guangdong Basic and Applied Basic Research Foundation,China(No.2023 A1515010044)Shandong Provincial Youth Innovation Team,China(No.2022KJ038)Open Project of State Key Laboratory of Solid Lubrication,China(No.LSL-22-11)Young Talent Fund of University Association for Science and Technology in Shaanxi,China(No.20210414)Qilu Youth Scholar Project Funding of Shandong University,China。
文摘High-entropy alloys(HEA)are novel materials obtained by introducing chemical disorder through mixing multiple-principal components,performing rather attractive features together with charming and exceptional properties in comparison with traditional alloys.However,the trade-off relationship is still present between strength and ductility in HEAs,significantly limiting the practical and wide application of HEAs.Moreover,the preparation of HEAs by trial-and-error method is time-consuming and resource-wasting,hindering the high-speed and high-quality development of HEAs.Herein,the primary objective of this work is to summarize the latest advancements in HEAs,focusing on methods for predicting phase structures and the factors influencing mechanical properties.Additionally,strengthening and toughening strategies for HEAs are highlighted,thus maximizing their application potential.Besides,challenges and future investigation direction of HEAs are also identified and proposed.
基金Project supported by the Jilin Provincial Science and Technology Development Joint Fund Project(Grant No.YDZJ202201ZYTS581)supported by the Scientific and Technological Research Project of Jilin Provincial Education Department(Grant No.JJKH20240077KJ).
文摘As an extreme physical condition,high pressure serves as a potent means to substantially modify the interatomic distances and bonding patterns within condensed matter,thereby enabling the macroscopic manipulation of material properties.We employed the CALYPSO method to predict the stable structures of RbB_(2)C_(4)across the pressure range from 0 GPa to 100 GPa and investigated its physical properties through first-principles calculations.Specially,we found four novel structures,namely,P6_(3)/mcm-,Amm2-,P1-,and I4/mmm-RbB_(2)C_(4).Under pressure conditions,electronic structure calculations reveal that all of them exhibit metallic characteristics.The calculation results of formation enthalpy show that the P6_(3)/mcm structure can be synthesized within the pressure range of 0–40 GPa.Specially,the Amm2,P1,and I4/mmm structures can be synthesized above 4 GPa,6 GPa,10 GPa,respectively.Moreover,the estimated Vickers hardness value of I4/mmm-RbB_(2)C_(4)compound is 47 GPa,suggesting that it is a superhard material.Interestingly,this study uncovers the continuous transformation of the crystal structure of RbB_(2)C_(4)from a layered configuration to folded and tubular forms,ultimately attaining a stabilized cage-like structure under the pressure span of 0–100 GPa.The application of pressure offers a formidable impetus for the advancement and innovation in condensed matter physics,facilitating the exploration of novel states and functions of matter.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.11965005 and 11964026)the Natural Science Basic Research Plan in Shaanxi Province,China(Grant Nos.2023-JC-YB-021 and 2022JM-035)+1 种基金the Fundamental Research Funds for the Central Universitiesthe 111 Project(Grant No.B17035)。
文摘Novel ordered intermetallic compounds have stimulated much interest.Ru–Al alloys are a prominent class of hightemperature structural materials,but the experimentally reported crystal structure of the intermetallic Ru_(2)Al_(5) phase remains elusive and debatable.To resolve this controversy,we extensively explored the crystal structures of Ru_(2)Al_(5) using first-principles calculations combined with crystal structure prediction technique.Among the calculated x-ray diffraction patterns and lattice parameters of five candidate Ru2Al5structures,those of the orthorhombic Pmmn structure best aligned with recent experimental results.The structural stabilities of the five Ru_(2)Al_(5)structures were confirmed through formation energy,elastic constants,and phonon spectrum calculations.We also comprehensively analyzed the mechanical and electronic properties of the five candidates.This work can guide the exploration of novel ordered intermetallic compounds in Ru–Al alloys.