Fenton oxidation is a promising water treatment method to degrade organic pollutants. In this study, 30 different organic compounds were selected and their reaction rate constants(k) were determined for the Fenton o...Fenton oxidation is a promising water treatment method to degrade organic pollutants. In this study, 30 different organic compounds were selected and their reaction rate constants(k) were determined for the Fenton oxidation process. Gaussian09 and Material Studio software sets were used to carry out calculations and obtain values of 10 different molecular descriptors for each studied compound. Ferric-oxyhydroxide coagulation experiments were conducted to determine the coagulation percentage. Based upon the adsorption capacity,all of the investigated organic compounds were divided into two groups(Group A and Group B). The percentage adsorption of organic compounds in Group A was less than 15%(wt./wt.)and that in the Group B was higher than 15%(wt./wt.). For Group A, removal of the compounds by oxidation was the dominant process while for Group B, removal by both oxidation and coagulation(as a synergistic process) took place. Results showed that the relationship between the rate constants(k values) and the molecular descriptors of Group A was more pronounced than for Group B compounds. For the oxidation-dominated process,EHOMOand Fukui indices(f(0)_x, f(-)_x, f(+)_x) were the most significant factors. The influence of bond order was more significant for the synergistic process of oxidation and coagulation than for the oxidation-dominated process. The influences of all other molecular descriptors on the synergistic process were weaker than on the oxidation-dominated process.展开更多
A 3-Dimension-Quantitative Structure-Activity Relationship</span></span><span><span><span style="font-family:""> (</span></span></span><span><spa...A 3-Dimension-Quantitative Structure-Activity Relationship</span></span><span><span><span style="font-family:""> (</span></span></span><span><span><span style="font-family:"">3D-QSAR</span></span></span><span><span><sup><span style="font-family:"">1</span></sup></span></span><span><span><span style="font-family:"">) </span></span></span><span><span><span style="font-family:"">approach is applied for the prediction of accurate chemical</span></span></span><span><span><span style="font-family:""> products made from biological activity and toxicity. Quantum chemical technique allows the construction of the molecular descriptors. The molecular quantum descriptors are classified into five principal component factors. Various linear <span>regression equations are obtained using the statistical technique. In this</span> study, the researchers propose the three best regression equations based on quantum molecular descriptors discussed earlier in this study. The observed EC50 vs calculated EC50 is plotted using the best fitting with the quantum descriptors.展开更多
The discovery of fluorescence materials with an inverted singlet-triplet(IST)energy gap,where the singlet excited state(S_(1))lies below the triplet excited state(T_(1)),mark a transformative advancement in organic li...The discovery of fluorescence materials with an inverted singlet-triplet(IST)energy gap,where the singlet excited state(S_(1))lies below the triplet excited state(T_(1)),mark a transformative advancement in organic light-emitting diodes(OLEDs)technology.However,designing the potential IST emitters are greatly challenging,and their IST energy gap,arising from double electron excitation,can only be accurately described by time-consuming post-Hartree-Fock(HF)methods,which blocks large-scale high-throughput screening speed.Here,we develop a four-orbital model to elucidate detailly the roles of double excitations in the IST formation,and establish two molecular descriptors(K_(S)and O_(D))based on exchange integral and molecular orbital energy.By these descriptors,we rapidly identify 41 IST candidates out of 3,486 molecules.The descriptors-aided approach achieves a screening success rate of 90%and reduces computational costs by 13 times compared to full post-HF calculations.Importantly,wepredicted a series of excellent non-traditional near-infrared IST emitters from a dataset of 1028 compounds with emission wavelengths ranging from 852.2 to 1002.3 nm,which open new avenues for designing highly efficient near-infrared OLED materials.展开更多
Computational approaches,encompassing both physics-based and machine learning(ML)methodologies,have gained substantial traction in drug repurposing efforts targeting specific therapeutic entities.The human dopamine(DA...Computational approaches,encompassing both physics-based and machine learning(ML)methodologies,have gained substantial traction in drug repurposing efforts targeting specific therapeutic entities.The human dopamine(DA)transporter(hDAT)is the primary therapeutic target of numerous psychiatric medications.However,traditional hDAT-targeting drugs,which interact with the primary binding site,encounter significant limitations,including addictive potential and stimulant effects.In this study,we propose an integrated workflow combining virtual screening based on weighted holistic atom localization and entity shape(WHALES)descriptors with in vitro experimental validation to repurpose novel hDAT-targeting drugs.Initially,WHALES descriptors facilitated a similarity search,employing four benztropine-like atypical inhibitors known to bind hDAT's allosteric site as templates.Consequently,from a compound library of 4,921 marketed and clinically tested drugs,we identified 27 candidate atypical inhibitors.Subsequently,ADMETlab was employed to predict the pharmacokinetic and toxicological properties of these candidates,while induced-fit docking(IFD)was performed to estimate their binding affinities.Six compounds were selected for in vitro assessments of neurotransmitter reuptake inhibitory activities.Among these,three exhibited significant inhibitory potency,with half maximal inhibitory concentration(IC_(50))values of 0.753μM,0.542μM,and 1.210μM,respectively.Finally,molecular dynamics(MD)simulations and end-point binding free energy analyses were conducted to elucidate and confirm the inhibitory mechanisms of the repurposed drugs against hDAT in its inward-open conformation.In conclusion,our study not only identifies promising active compounds as potential atypical inhibitors for novel therapeutic drug development targeting hDAT but also validates the effectiveness of our integrated computational and experimental workflow for drug repurposing.展开更多
Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Her...Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Here,we describe a general strategy to improve the accuracy of docking pose prediction by implementing the structural descriptor-based fltering and KGS-penalty function-based conformational clustering in an unbiased manner.We assessed our method against 150 high-quality protein–ligand complex structures.Surprisingly,such simple components are suffcient to improve the accuracy of docking pose prediction.The success rate of predicting near-native docking pose increased from 53%of the targets to 78%.We expect that our strategy may have general usage in improving currently available molecular docking programs.展开更多
Using density functional theory, noncovalent interactions and two mechanisms of covalent functionalization of drug carmustine with functionalized carbon nanotube(CNT) have been investigated. Quantum molecular descri...Using density functional theory, noncovalent interactions and two mechanisms of covalent functionalization of drug carmustine with functionalized carbon nanotube(CNT) have been investigated. Quantum molecular descriptors of noncovalent configurations were studied. It was specified that binding of drug carmustine with functionalized CNT is thermodynamically suitable. NTCOOH and NTCOCl can bond to the NH group of carmustine through OH(COOH mechanism) and Cl(COCl mechanism) groups, respectively. The activation energies, activation enthalpies and activation Gibbs free energies of two pathways were calculated and compared with each other. The activation parameters related to COOH mechanism are higher than those related to COCl mechanism, and therefore COCl mechanism is suitable for covalent functionalization. COOH functionalized CNT(NTCOOH) has more binding energy than COCl functionalized CNT(NTCOCl) and can act as a favorable system for carmustine drug delivery within biological and chemical systems(noncovalent). These results could be generalized to other similar drugs.展开更多
A new method of quantitative structure retention relationship(QSRR) studies was reported for predicting gas chromatography(GC) relative retention times(RRTs) of chlorinated phenols (CPs) using a DB 5 column. Chemica...A new method of quantitative structure retention relationship(QSRR) studies was reported for predicting gas chromatography(GC) relative retention times(RRTs) of chlorinated phenols (CPs) using a DB 5 column. Chemical descriptors were calculated from the molecular structure of CPs and related to their gas chromatographic RRTs by using multiple linear regression analysis. The proposed model had a multiple square correlation coefficient R 2=0.970, standard error SE =0.0472, and significant level P =0.0000. The QSRR model also reveals that the gas chromatographic relative retention times of CPs are associated with physicochemical property interactions with the stationary phase,and influenced by the number of chlorine and oxygen in the CP melecules.展开更多
Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development...Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers.However,the complexity of developing and applying algorithms hinders it wide application.This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package.The simplified molecular input line entry specification(SMILES)information of two compounds is simply inputted to determine whether a co-crystal can be formed.The data set includs the co-crystal records presented in the Cambridge Structural Database(CSD)and the records of no co-crystal formation from extant literature and experiments.RDKit molecular descriptors are adopted as the features of a compound in the data set.The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy,sensitivity,and F1 score.The prediction success rate of the model exceeds 90%.The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.展开更多
The enzyme leukotriene A4 (LTA4) plays an important role as precursor of slow reactive substances as LTC4, LTD4, and LTE4. It is an attractive target for molecular modeling and QSAR study. Our effort is mainly focused...The enzyme leukotriene A4 (LTA4) plays an important role as precursor of slow reactive substances as LTC4, LTD4, and LTE4. It is an attractive target for molecular modeling and QSAR study. Our effort is mainly focused on exploring the SAR for inhibitors of the LTA4 hydrolase through docking study, pharmacophore modeling and molecular descriptor study. The binding of these small molecules on LTA4 hydrolase enzyme was described by the models developed on 2D molecular descriptors, with good predictive power (39 compounds, 6 descriptors, r2 0.98, SEE 0.167, F-value 268.53, q2 0.90, r2adj 0.97, P-value < 0.0001, SD of residuals 0.15). Docking studies were employed to presume the probable binding conformation of these analogues and exploring the SAR for the compounds. The novel pharmacophore represents the ligand features that are involved in interactions with the target protein, as well as the space around the ligand occupied by the protein. The efforts are aimed to discover the SAR for the inhibitors of LTA4 hydrolase through techniques of QSAR, docking and pharmacophore.展开更多
Direct application of bio-oil from fast pyrolysis as a fuel has remained a challenge due to its undesirable attributes such as low heating value,high viscosity,high corrosiveness and storage instability.Solvent additi...Direct application of bio-oil from fast pyrolysis as a fuel has remained a challenge due to its undesirable attributes such as low heating value,high viscosity,high corrosiveness and storage instability.Solvent addition is a simple method for circumventing these disadvantages to allow further processing and storage.In this work,computer-aided molecular design tools were developed to design optimal solvents to upgrade bio-oil whilst having low environmental impact.Firstly,target solvent requirements were translated into measurable physical properties.As different property prediction models consist different levels of structural information,molecular signature descriptor was used as a common platform to formulate the design problem.Because of the differences in the required structural information of different property prediction models,signatures of different heights were needed in formulating the design problem.Due to the combinatorial nature of higher-order signatures,the complexity of a computer-aided molecular design problem increases with the height of signatures.Thus,a multi-stage framework was developed by developing consistency rules that restrict the number of higher-order signatures.Finally,phase stability analysis was conducted to evaluate the stability of the solvent-oil blend.As a result,optimal solvents that improve the solvent-oil blend properties while displaying low environmental impact were identified.展开更多
A molecular vector-type descriptor containing 6 variables is used to describe the structure of aromatic hydrocarbons (AHs) and relate to normal boiling points (bp) of AHs. The col relation coefficient (R) between the ...A molecular vector-type descriptor containing 6 variables is used to describe the structure of aromatic hydrocarbons (AHs) and relate to normal boiling points (bp) of AHs. The col relation coefficient (R) between the estimated bp and experimental bp is 0.9988 and the root mean square error (RMS) is 7.907 degreesC for 66 AHs. The RMS obtained by cross-validation is 9.131 degreesC, which implies the relationship model having good prediction ability.展开更多
Objective To develop a model based on a graph convolutional network(GCN)to achieve ef-ficient classification of the cold and hot medicinal properties of Chinese herbal medicines(CHMs).Methods After screening the datas...Objective To develop a model based on a graph convolutional network(GCN)to achieve ef-ficient classification of the cold and hot medicinal properties of Chinese herbal medicines(CHMs).Methods After screening the dataset provided in the published literature,this study includ-ed 495 CHMs and their 8075 compounds.Three molecular descriptors were used to repre-sent the compounds:the molecular access system(MACCS),extended connectivity finger-print(ECFP),and two-dimensional(2D)molecular descriptors computed by the RDKit open-source toolkit(RDKit_2D).A homogeneous graph with CHMs as nodes was constructed and a classification model for the cold and hot medicinal properties of CHMs was developed based on a GCN using the molecular descriptor information of the compounds as node features.Fi-nally,using accuracy and F1 score to evaluate model performance,the GCN model was ex-perimentally compared with the traditional machine learning approaches,including decision tree(DT),random forest(RF),k-nearest neighbor(KNN),Naïve Bayes classifier(NBC),and support vector machine(SVM).MACCS,ECFP,and RDKit_2D molecular descriptors were al-so adopted as features for comparison.Results The experimental results show that the GCN achieved better performance than the traditional machine learning approach when using MACCS as features,with the accuracy and F1 score reaching 0.8364 and 0.8453,respectively.The accuracy and F1 score have increased by 0.8690 and 0.8120,respectively,compared with the lowest performing feature combina-tion OMER(only the combination of MACCS,ECFP,and RDKit_2D).The accuracy and F1 score of DT,RF,KNN,NBC,and SVM are 0.5051 and 0.5018,0.6162 and 0.6015,0.6768 and 0.6243,0.6162 and 0.6071,0.6364 and 0.6225,respectively.Conclusion In this study,by introducing molecular descriptors as features,it is verified that molecular descriptors and fingerprints play a key role in classifying the cold and hot medici-nal properties of CHMs.Meanwhile,excellent classification performance was achieved using the GCN model,providing an important algorithmic basis for the in-depth study of the“struc-ture-property”relationship of CHMs.展开更多
Combining high mobility and high-efficiency luminescence in one material is challenging because of their contradictory design principles.Here,under the three-state exciton model,a molecular descriptor O=(|t_(h)+t_(e)|...Combining high mobility and high-efficiency luminescence in one material is challenging because of their contradictory design principles.Here,under the three-state exciton model,a molecular descriptor O=(|t_(h)+t_(e)|−|t_(h)-t_(e)|)∕2Jis proposed to quantitatively design materials with balanced luminescence and mobility in aggregated states,where a large𝑂would promise high crystalline photoluminescence quantum yield(PLQY)with small J(excitonic coupling)and significant t_(h) and t_(e)(hole and electron transfer integrals)would indicate high mobility.Through theoretical calculation and experimental validation,it is found that the asymmetric anthracene derivatives are quite effective in simultaneously achieving high mobility and high PLQY.Following the asymmetric guideline,the newly developed compounds,2-phenyl vinyl anthracene(2-PhVA)and 6-(2-(anthracene-2-yl)vinyl)benzo[b]thiophene(6-BTVA)demonstrate high O values alongside excellent performance:2-PhVA exhibits a PLQY of 81.5%and a maximum hole mobility of 10.0 cm^(2) V^(−1) s^(−1),and 6-BTVA shows a PLQY of 30.9%with a maximum mobility of 9.3 cm^(2) V^(−1) s^(−1).The above results demonstrate the validation of the descriptor and the asymmetric strategy in further developing high-mobility light-emitting aggregated materials.展开更多
The photovoltaic performance of organic solar cells(OSCs)is significantly determined by the electron donor and acceptor materials in active layers.Traditional trial-and-error experiments for exploring high-performance...The photovoltaic performance of organic solar cells(OSCs)is significantly determined by the electron donor and acceptor materials in active layers.Traditional trial-and-error experiments for exploring high-performance materials suffer from long development cycles,high experimental costs,and low screening efficiency.Herein,the established database includes 547 donor-acceptor pairs,integrating photovoltaic parameters and molecular representations.The 30 molecular structure descriptors that closely relate power conversion efficiency(PCE)were extracted.Long short-term memory networks(LSTM),convolutional neural networks(CNN),and symbolic regression(SR)were trained to predict the PCE of OSCs.After hyperparameter optimization via grid search algorithm,the metrics indicate the trained models achieved high-precision for PCE prediction,and the performance of LSTM model prevail over than that of other models.Through dual validation by SHapley Additive exPlanations(SHAP)interpretability analysis and SR formulas,it was revealed that the number of structural units with double rings or more in acceptor molecules showed the significant correlation with PCE.Based on the dataset constructed using molecular fragment recombination strategy,the developed LSTM generative model successfully generated 210,660 novel donor molecules and 878,268 acceptor molecules.Following screening of 185,015,936,880 donor-acceptor pairs by the LSTM prediction model,5753 donor-acceptor pairs with the predicted PCE exceeding 18.50%were identified,among which the highest predicted PCE reached 18.66%.This approach provides theoretical guidance for the discovery of organic photovoltaic materials and may accelerate the development of high-performance OSCs,but also can be generalized to functional molecular design.展开更多
With the growing emphasis on sustainable development,the demand for environmentally friendly solvents in green chemical processes and carbon dioxide capture is increasing.Ionic liquids(ILs),as promising green solvents...With the growing emphasis on sustainable development,the demand for environmentally friendly solvents in green chemical processes and carbon dioxide capture is increasing.Ionic liquids(ILs),as promising green solvents,offer significant potential but face considerable challenges,particularly in solvent selection.To overcome the limitations of traditional screening methods,machine learning(ML)techniques have recently been applied,offering a more efficient and data-driven approach.This review provides an overview of key ML methods used in solvent screening and compares them with traditional experimental and theoretical techniques.It examines the role of descriptor selection in structure‒property-based methods,such as quantitative structure-activity relationships(QSAR)and quantitative structure‒property relationships(QSPR),which are critical for predicting IL properties.The review also explores the application of these methods to screen IL properties,including toxicity,viscosity,density,and CO_(2) solubility.Additionally,it discusses challenges in selecting appropriate models based on data scale and task complexity,integrating physical information for model interpretability,and achieving multi-objective optimization to balance key properties in ionic liquid(IL)design.Finally,it summarizes the achievements,limitations,and prospects of ML applications in ILs research,offering insights into how these methods can advance the development of sustainable ILs.展开更多
Ionic liquids(ILs)have garnered significant interest owing to their distinct physicochemical traits.Nonetheless,their extensive application is curtailed by ecotoxicity concerns.This study aimed to develop a quantitati...Ionic liquids(ILs)have garnered significant interest owing to their distinct physicochemical traits.Nonetheless,their extensive application is curtailed by ecotoxicity concerns.This study aimed to develop a quantitative structure-activity relationship(QSAR)model for predicting the toxicity of ILs in biological cells.Toxicity data of ILs on leukemia rat cell line IPC-81,Escherichia coli(E.coli),and acetylcholinesterase(AChE)were collected from open-source databases,and two integrated models,random forest(RF)and gradient boosted decision tree(GBDT),were used to train the data.The molecular structures of the ILs were represented by three different methods,namely molecular descriptor(MD),molecular fingerprint(MF),and molecular identifier(MI),respectively.The Tanimoto similarity coefficients indicate that MD has a stronger ability to recognize structural similarity.Statistical metrics of model performance showed that the two models(MD-RF and MD-GBDT)with MD as an input feature performed better in the three datasets.The application of the SHapley Additive exPlanations(SHAP)method explains the importance of different features.Specifically,reducing the carbon chain length and the number of fluorine atoms in the structure of ILs can effectively reduce their toxic effects on biological cells.展开更多
Bond dissociation energy(BDE),which refers to the enthalpy change for the homolysis of a specific covalent bond,is one of the basic thermodynamic properties of molecules.It is very important for understanding chemical...Bond dissociation energy(BDE),which refers to the enthalpy change for the homolysis of a specific covalent bond,is one of the basic thermodynamic properties of molecules.It is very important for understanding chemical reactivities,chemical properties and chemical transformations.Here,a machine learning-based comprehensive BDE prediction model was established based on the iBonD experimental BDE dataset and the calculated BDE dataset by St.John et al.Differential Structural and PhysicOChemical(D-SPOC)descriptors that reflected changes in molecules'structural and physicochemical features in the process of bond homolysis were designed as input features.展开更多
Chemical substances are essential in all aspects of human life,and understanding their properties is essential for developing chemical systems.The properties of chemical species can be accurately obtained by experimen...Chemical substances are essential in all aspects of human life,and understanding their properties is essential for developing chemical systems.The properties of chemical species can be accurately obtained by experiments or ab initio computational calculations;however,these are time-consuming and costly.In this work,machine learning models(ML)for estimating entropy,S,and constant pressure heat capacity,Cp,at 298.15 K,are developed for alkanes,alkenes,and alkynes.The training data for entropy and heat capacity are collected from the literature.Molecular descriptors generated using alvaDesc software are used as input features for the ML models.Support vector regression(SVR),v-support vector regression(v-SVR),and random forest regression(RFR)algorithms were trained with K-fold cross-validation on two levels.The first level assessed the models’performance,and the second level generated the final models.Between the three ML models chosen,SVR shows better performance on the test dataset.The SVR model was then compared against traditional Benson’s group additivity to illustrate the advantages of using the ML model.Finally,a sensitivity analysis is performed to find the most critical descriptors in the property estimations.展开更多
Quantitative structure property relationship (QSPR) method is used to study the correlation models between the structures of a set of diverse organic compounds and their log P . Molecular descriptors calculated fr...Quantitative structure property relationship (QSPR) method is used to study the correlation models between the structures of a set of diverse organic compounds and their log P . Molecular descriptors calculated from structure alone are used to describe the molecular structures. A subset of the calculated descriptors, selected using forward stepwise regression, is used in the QSPR models development. Multiple linear regression (MLR) and radial basis function neural networks (RBFNNs) are utilized to construct the linear and non linear correlation model, respectively. The optimal QSPR model developed is based on a 7 17 1 RBFNNs architecture using seven calculated molecular descriptors. The root mean square errors in predictions for the training, predicting and overall data sets are 0.284, 0.327 and 0.291 log P units, respectively.展开更多
Separating monomeric cycloalkanes from naphtha obtained from direct coal liquefaction not only facilitates the valuable utilization of naphtha but also holds potential for addressing China’s domestic chemical feedsto...Separating monomeric cycloalkanes from naphtha obtained from direct coal liquefaction not only facilitates the valuable utilization of naphtha but also holds potential for addressing China’s domestic chemical feedstock market demand for these compounds.In extractive distillation processes of naphtha,relative volatility serves as a crucial parameter for extractant selection.However,determining relative volatility through conventional vapor-liquid equilibrium experiments for extractant selection proves challenging due to the complexity of naphtha’s compound composition.To address this challenge,a prediction model for the relative volatility of n-heptane/methylcyclohexane in various extractants has been developed using machine-learning quantitative structure-property relationship methods.The model enables rapid and cost-effective extractant selection.The statistical analysis of the model revealed favorable performance indicators,including a coefficient of determination of 0.88,cross-validation coefficient of 0.94,and root mean square error of 0.02.Factors such asα,EHOMO,ρ,and logPoct/water collectively influence relative volatility.Analysis of standardized coefficients in the multivariate linear regression equation identified density as the primary factor affecting the relative volatility of n-heptane/methylcyclohexane in the different extractants.Extractants with higher densities,devoid of branched chains,exhibited increased relative volatility compared to their counterparts with branched chains.Subsequently,the process of separating cycloalkane monomers from direct coal liquefaction products via extractive distillation was optimized using Aspen Plus software,achieving purities exceeding 0.99 and yields exceeding 0.90 for cyclohexane and methylcyclohexane monomers.Economic,energy consumption,and environmental assessments were conducted.Salicylic acid emerged as the most suitable extractant for purifying cycloalkanes in direct coal liquefaction naphtha due to its superior separation effectiveness,cost efficiency,and environmental benefits.The tower parameters of the simulated separation unit provide valuable insights for the design of actual industrial equipment.展开更多
基金supported by the National Natural Science Funds of China (No. NSFC21177083)the Shanghai Municipal Commission of Economy and Informatization Project (No. CXY-2013-52)
文摘Fenton oxidation is a promising water treatment method to degrade organic pollutants. In this study, 30 different organic compounds were selected and their reaction rate constants(k) were determined for the Fenton oxidation process. Gaussian09 and Material Studio software sets were used to carry out calculations and obtain values of 10 different molecular descriptors for each studied compound. Ferric-oxyhydroxide coagulation experiments were conducted to determine the coagulation percentage. Based upon the adsorption capacity,all of the investigated organic compounds were divided into two groups(Group A and Group B). The percentage adsorption of organic compounds in Group A was less than 15%(wt./wt.)and that in the Group B was higher than 15%(wt./wt.). For Group A, removal of the compounds by oxidation was the dominant process while for Group B, removal by both oxidation and coagulation(as a synergistic process) took place. Results showed that the relationship between the rate constants(k values) and the molecular descriptors of Group A was more pronounced than for Group B compounds. For the oxidation-dominated process,EHOMOand Fukui indices(f(0)_x, f(-)_x, f(+)_x) were the most significant factors. The influence of bond order was more significant for the synergistic process of oxidation and coagulation than for the oxidation-dominated process. The influences of all other molecular descriptors on the synergistic process were weaker than on the oxidation-dominated process.
文摘A 3-Dimension-Quantitative Structure-Activity Relationship</span></span><span><span><span style="font-family:""> (</span></span></span><span><span><span style="font-family:"">3D-QSAR</span></span></span><span><span><sup><span style="font-family:"">1</span></sup></span></span><span><span><span style="font-family:"">) </span></span></span><span><span><span style="font-family:"">approach is applied for the prediction of accurate chemical</span></span></span><span><span><span style="font-family:""> products made from biological activity and toxicity. Quantum chemical technique allows the construction of the molecular descriptors. The molecular quantum descriptors are classified into five principal component factors. Various linear <span>regression equations are obtained using the statistical technique. In this</span> study, the researchers propose the three best regression equations based on quantum molecular descriptors discussed earlier in this study. The observed EC50 vs calculated EC50 is plotted using the best fitting with the quantum descriptors.
基金support from the National Natural Science Foundation of China(Grant Nos.22325305 and 22273105)the Strategic Priority Research Program of Sciences(XDB0520103)+1 种基金National Key R&D Program of China(2024YFB3614300)the Fundamental Research Funds for the Central Universities(Grant Nos.E2E40307X2 and E2ET0309X2).We gratefully acknowledge WQ&UCAS Research Academy Intelligent Computing Center(WRA-ICC)for providing computation facilities.
文摘The discovery of fluorescence materials with an inverted singlet-triplet(IST)energy gap,where the singlet excited state(S_(1))lies below the triplet excited state(T_(1)),mark a transformative advancement in organic light-emitting diodes(OLEDs)technology.However,designing the potential IST emitters are greatly challenging,and their IST energy gap,arising from double electron excitation,can only be accurately described by time-consuming post-Hartree-Fock(HF)methods,which blocks large-scale high-throughput screening speed.Here,we develop a four-orbital model to elucidate detailly the roles of double excitations in the IST formation,and establish two molecular descriptors(K_(S)and O_(D))based on exchange integral and molecular orbital energy.By these descriptors,we rapidly identify 41 IST candidates out of 3,486 molecules.The descriptors-aided approach achieves a screening success rate of 90%and reduces computational costs by 13 times compared to full post-HF calculations.Importantly,wepredicted a series of excellent non-traditional near-infrared IST emitters from a dataset of 1028 compounds with emission wavelengths ranging from 852.2 to 1002.3 nm,which open new avenues for designing highly efficient near-infrared OLED materials.
基金supported by the Natural Science Foundation of China(Grant No.:21505009)the Natural Science Foundation of Chongqing,China(Grant No.:2023NSCQ-MSX0140)the Open Project of Central Nervous System Drug Key Laboratory of Sichuan Province,China(Grant No.:230012-01SZ).
文摘Computational approaches,encompassing both physics-based and machine learning(ML)methodologies,have gained substantial traction in drug repurposing efforts targeting specific therapeutic entities.The human dopamine(DA)transporter(hDAT)is the primary therapeutic target of numerous psychiatric medications.However,traditional hDAT-targeting drugs,which interact with the primary binding site,encounter significant limitations,including addictive potential and stimulant effects.In this study,we propose an integrated workflow combining virtual screening based on weighted holistic atom localization and entity shape(WHALES)descriptors with in vitro experimental validation to repurpose novel hDAT-targeting drugs.Initially,WHALES descriptors facilitated a similarity search,employing four benztropine-like atypical inhibitors known to bind hDAT's allosteric site as templates.Consequently,from a compound library of 4,921 marketed and clinically tested drugs,we identified 27 candidate atypical inhibitors.Subsequently,ADMETlab was employed to predict the pharmacokinetic and toxicological properties of these candidates,while induced-fit docking(IFD)was performed to estimate their binding affinities.Six compounds were selected for in vitro assessments of neurotransmitter reuptake inhibitory activities.Among these,three exhibited significant inhibitory potency,with half maximal inhibitory concentration(IC_(50))values of 0.753μM,0.542μM,and 1.210μM,respectively.Finally,molecular dynamics(MD)simulations and end-point binding free energy analyses were conducted to elucidate and confirm the inhibitory mechanisms of the repurposed drugs against hDAT in its inward-open conformation.In conclusion,our study not only identifies promising active compounds as potential atypical inhibitors for novel therapeutic drug development targeting hDAT but also validates the effectiveness of our integrated computational and experimental workflow for drug repurposing.
文摘Structure-based virtual screening(molecular docking)is now one of the most pragmatic techniques to leverage target structure for ligand discovery.Accurate binding pose prediction is critical to molecular docking.Here,we describe a general strategy to improve the accuracy of docking pose prediction by implementing the structural descriptor-based fltering and KGS-penalty function-based conformational clustering in an unbiased manner.We assessed our method against 150 high-quality protein–ligand complex structures.Surprisingly,such simple components are suffcient to improve the accuracy of docking pose prediction.The success rate of predicting near-native docking pose increased from 53%of the targets to 78%.We expect that our strategy may have general usage in improving currently available molecular docking programs.
文摘Using density functional theory, noncovalent interactions and two mechanisms of covalent functionalization of drug carmustine with functionalized carbon nanotube(CNT) have been investigated. Quantum molecular descriptors of noncovalent configurations were studied. It was specified that binding of drug carmustine with functionalized CNT is thermodynamically suitable. NTCOOH and NTCOCl can bond to the NH group of carmustine through OH(COOH mechanism) and Cl(COCl mechanism) groups, respectively. The activation energies, activation enthalpies and activation Gibbs free energies of two pathways were calculated and compared with each other. The activation parameters related to COOH mechanism are higher than those related to COCl mechanism, and therefore COCl mechanism is suitable for covalent functionalization. COOH functionalized CNT(NTCOOH) has more binding energy than COCl functionalized CNT(NTCOCl) and can act as a favorable system for carmustine drug delivery within biological and chemical systems(noncovalent). These results could be generalized to other similar drugs.
文摘A new method of quantitative structure retention relationship(QSRR) studies was reported for predicting gas chromatography(GC) relative retention times(RRTs) of chlorinated phenols (CPs) using a DB 5 column. Chemical descriptors were calculated from the molecular structure of CPs and related to their gas chromatographic RRTs by using multiple linear regression analysis. The proposed model had a multiple square correlation coefficient R 2=0.970, standard error SE =0.0472, and significant level P =0.0000. The QSRR model also reveals that the gas chromatographic relative retention times of CPs are associated with physicochemical property interactions with the stationary phase,and influenced by the number of chlorine and oxygen in the CP melecules.
基金The authors acknowledge the National Natural Science Foundation of China(No.22278443)CAMS Innovation Fund for Medical Sciences(No.2022-I2M-1-015)+1 种基金the Key R&D Program of Shan Dong Province(No.2019JZZY020909)the Xinjiang Uygur Autonomous Region Innovation Environment Construction Special Fund and Technology Innovation Base Construction Key Laboratory Open Project(No.2022D04016)for the financial support.
文摘Co-crystal formation can improve the physicochemical properties of a compound,thus enhancing its druggability.Therefore,artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers.However,the complexity of developing and applying algorithms hinders it wide application.This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package.The simplified molecular input line entry specification(SMILES)information of two compounds is simply inputted to determine whether a co-crystal can be formed.The data set includs the co-crystal records presented in the Cambridge Structural Database(CSD)and the records of no co-crystal formation from extant literature and experiments.RDKit molecular descriptors are adopted as the features of a compound in the data set.The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy,sensitivity,and F1 score.The prediction success rate of the model exceeds 90%.The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.
基金Project supported by Department of Science and Technology,Govt.of India for Awarding Young Scientist Fellowship (SR/FT/LS-161/2008)
文摘The enzyme leukotriene A4 (LTA4) plays an important role as precursor of slow reactive substances as LTC4, LTD4, and LTE4. It is an attractive target for molecular modeling and QSAR study. Our effort is mainly focused on exploring the SAR for inhibitors of the LTA4 hydrolase through docking study, pharmacophore modeling and molecular descriptor study. The binding of these small molecules on LTA4 hydrolase enzyme was described by the models developed on 2D molecular descriptors, with good predictive power (39 compounds, 6 descriptors, r2 0.98, SEE 0.167, F-value 268.53, q2 0.90, r2adj 0.97, P-value < 0.0001, SD of residuals 0.15). Docking studies were employed to presume the probable binding conformation of these analogues and exploring the SAR for the compounds. The novel pharmacophore represents the ligand features that are involved in interactions with the target protein, as well as the space around the ligand occupied by the protein. The efforts are aimed to discover the SAR for the inhibitors of LTA4 hydrolase through techniques of QSAR, docking and pharmacophore.
基金The authors would like to express sincere gratitude to Ministry of Higher Education Malaysia for the realization of this research project under the Grant FRGS/1/2019/TK02/UNIM/02/1However,only the authors are responsible for the opinion expressed in this paper and for any remaining errors.
文摘Direct application of bio-oil from fast pyrolysis as a fuel has remained a challenge due to its undesirable attributes such as low heating value,high viscosity,high corrosiveness and storage instability.Solvent addition is a simple method for circumventing these disadvantages to allow further processing and storage.In this work,computer-aided molecular design tools were developed to design optimal solvents to upgrade bio-oil whilst having low environmental impact.Firstly,target solvent requirements were translated into measurable physical properties.As different property prediction models consist different levels of structural information,molecular signature descriptor was used as a common platform to formulate the design problem.Because of the differences in the required structural information of different property prediction models,signatures of different heights were needed in formulating the design problem.Due to the combinatorial nature of higher-order signatures,the complexity of a computer-aided molecular design problem increases with the height of signatures.Thus,a multi-stage framework was developed by developing consistency rules that restrict the number of higher-order signatures.Finally,phase stability analysis was conducted to evaluate the stability of the solvent-oil blend.As a result,optimal solvents that improve the solvent-oil blend properties while displaying low environmental impact were identified.
文摘A molecular vector-type descriptor containing 6 variables is used to describe the structure of aromatic hydrocarbons (AHs) and relate to normal boiling points (bp) of AHs. The col relation coefficient (R) between the estimated bp and experimental bp is 0.9988 and the root mean square error (RMS) is 7.907 degreesC for 66 AHs. The RMS obtained by cross-validation is 9.131 degreesC, which implies the relationship model having good prediction ability.
基金Hunan Provincial Natural Science Foundation(2022JJ30438)Natural Science Foundation of Changsha(kq2202260)Hunan Province Traditional Chinese Medicine Research Project(B2023039).
文摘Objective To develop a model based on a graph convolutional network(GCN)to achieve ef-ficient classification of the cold and hot medicinal properties of Chinese herbal medicines(CHMs).Methods After screening the dataset provided in the published literature,this study includ-ed 495 CHMs and their 8075 compounds.Three molecular descriptors were used to repre-sent the compounds:the molecular access system(MACCS),extended connectivity finger-print(ECFP),and two-dimensional(2D)molecular descriptors computed by the RDKit open-source toolkit(RDKit_2D).A homogeneous graph with CHMs as nodes was constructed and a classification model for the cold and hot medicinal properties of CHMs was developed based on a GCN using the molecular descriptor information of the compounds as node features.Fi-nally,using accuracy and F1 score to evaluate model performance,the GCN model was ex-perimentally compared with the traditional machine learning approaches,including decision tree(DT),random forest(RF),k-nearest neighbor(KNN),Naïve Bayes classifier(NBC),and support vector machine(SVM).MACCS,ECFP,and RDKit_2D molecular descriptors were al-so adopted as features for comparison.Results The experimental results show that the GCN achieved better performance than the traditional machine learning approach when using MACCS as features,with the accuracy and F1 score reaching 0.8364 and 0.8453,respectively.The accuracy and F1 score have increased by 0.8690 and 0.8120,respectively,compared with the lowest performing feature combina-tion OMER(only the combination of MACCS,ECFP,and RDKit_2D).The accuracy and F1 score of DT,RF,KNN,NBC,and SVM are 0.5051 and 0.5018,0.6162 and 0.6015,0.6768 and 0.6243,0.6162 and 0.6071,0.6364 and 0.6225,respectively.Conclusion In this study,by introducing molecular descriptors as features,it is verified that molecular descriptors and fingerprints play a key role in classifying the cold and hot medici-nal properties of CHMs.Meanwhile,excellent classification performance was achieved using the GCN model,providing an important algorithmic basis for the in-depth study of the“struc-ture-property”relationship of CHMs.
基金supported by the National Natural Science Foundation of China(T2225028,22475219,12204167,22325305,T2350009,52203210,and 22003030)the Chinese Academy of Sciences(Hundred Talents Plan,Youth Innovation Promotion Association,the Strategic Priority Research Program of Sciences[XDB0520200]and Young Scientists in Basic Research[YSBR-053])the Guangdong Provincial Natural Science Foundation(2024A1515011185).
文摘Combining high mobility and high-efficiency luminescence in one material is challenging because of their contradictory design principles.Here,under the three-state exciton model,a molecular descriptor O=(|t_(h)+t_(e)|−|t_(h)-t_(e)|)∕2Jis proposed to quantitatively design materials with balanced luminescence and mobility in aggregated states,where a large𝑂would promise high crystalline photoluminescence quantum yield(PLQY)with small J(excitonic coupling)and significant t_(h) and t_(e)(hole and electron transfer integrals)would indicate high mobility.Through theoretical calculation and experimental validation,it is found that the asymmetric anthracene derivatives are quite effective in simultaneously achieving high mobility and high PLQY.Following the asymmetric guideline,the newly developed compounds,2-phenyl vinyl anthracene(2-PhVA)and 6-(2-(anthracene-2-yl)vinyl)benzo[b]thiophene(6-BTVA)demonstrate high O values alongside excellent performance:2-PhVA exhibits a PLQY of 81.5%and a maximum hole mobility of 10.0 cm^(2) V^(−1) s^(−1),and 6-BTVA shows a PLQY of 30.9%with a maximum mobility of 9.3 cm^(2) V^(−1) s^(−1).The above results demonstrate the validation of the descriptor and the asymmetric strategy in further developing high-mobility light-emitting aggregated materials.
基金supported by the National Natural Science Foundation of China(NNSFC)(GrantNo.12264025).
文摘The photovoltaic performance of organic solar cells(OSCs)is significantly determined by the electron donor and acceptor materials in active layers.Traditional trial-and-error experiments for exploring high-performance materials suffer from long development cycles,high experimental costs,and low screening efficiency.Herein,the established database includes 547 donor-acceptor pairs,integrating photovoltaic parameters and molecular representations.The 30 molecular structure descriptors that closely relate power conversion efficiency(PCE)were extracted.Long short-term memory networks(LSTM),convolutional neural networks(CNN),and symbolic regression(SR)were trained to predict the PCE of OSCs.After hyperparameter optimization via grid search algorithm,the metrics indicate the trained models achieved high-precision for PCE prediction,and the performance of LSTM model prevail over than that of other models.Through dual validation by SHapley Additive exPlanations(SHAP)interpretability analysis and SR formulas,it was revealed that the number of structural units with double rings or more in acceptor molecules showed the significant correlation with PCE.Based on the dataset constructed using molecular fragment recombination strategy,the developed LSTM generative model successfully generated 210,660 novel donor molecules and 878,268 acceptor molecules.Following screening of 185,015,936,880 donor-acceptor pairs by the LSTM prediction model,5753 donor-acceptor pairs with the predicted PCE exceeding 18.50%were identified,among which the highest predicted PCE reached 18.66%.This approach provides theoretical guidance for the discovery of organic photovoltaic materials and may accelerate the development of high-performance OSCs,but also can be generalized to functional molecular design.
基金supported by the“Carbon Upcycling Project for Platform Chemicals”(Grant Nos.2022M3J3A1045999 and 2022M3J3A1039377)through the National Research Foundation(NRF)funded of the Ministry of Science and ICT,Republic of Koreathe Natural Science Foundation of Jiangsu Province,China(Grant Nos.BZ2023051,BK20200694,and BK20240546)+1 种基金the Science and Technology Project of Changzhou,China(Grant No.CJ20241053)the Jiangsu Specially-Appointed Professors Program,China.
文摘With the growing emphasis on sustainable development,the demand for environmentally friendly solvents in green chemical processes and carbon dioxide capture is increasing.Ionic liquids(ILs),as promising green solvents,offer significant potential but face considerable challenges,particularly in solvent selection.To overcome the limitations of traditional screening methods,machine learning(ML)techniques have recently been applied,offering a more efficient and data-driven approach.This review provides an overview of key ML methods used in solvent screening and compares them with traditional experimental and theoretical techniques.It examines the role of descriptor selection in structure‒property-based methods,such as quantitative structure-activity relationships(QSAR)and quantitative structure‒property relationships(QSPR),which are critical for predicting IL properties.The review also explores the application of these methods to screen IL properties,including toxicity,viscosity,density,and CO_(2) solubility.Additionally,it discusses challenges in selecting appropriate models based on data scale and task complexity,integrating physical information for model interpretability,and achieving multi-objective optimization to balance key properties in ionic liquid(IL)design.Finally,it summarizes the achievements,limitations,and prospects of ML applications in ILs research,offering insights into how these methods can advance the development of sustainable ILs.
基金supported by the National Natural Science Foundation of China(No.22078166)Qingdao University of Science and Technology Independent Research and Innovation Program(S2023KY002).
文摘Ionic liquids(ILs)have garnered significant interest owing to their distinct physicochemical traits.Nonetheless,their extensive application is curtailed by ecotoxicity concerns.This study aimed to develop a quantitative structure-activity relationship(QSAR)model for predicting the toxicity of ILs in biological cells.Toxicity data of ILs on leukemia rat cell line IPC-81,Escherichia coli(E.coli),and acetylcholinesterase(AChE)were collected from open-source databases,and two integrated models,random forest(RF)and gradient boosted decision tree(GBDT),were used to train the data.The molecular structures of the ILs were represented by three different methods,namely molecular descriptor(MD),molecular fingerprint(MF),and molecular identifier(MI),respectively.The Tanimoto similarity coefficients indicate that MD has a stronger ability to recognize structural similarity.Statistical metrics of model performance showed that the two models(MD-RF and MD-GBDT)with MD as an input feature performed better in the three datasets.The application of the SHapley Additive exPlanations(SHAP)method explains the importance of different features.Specifically,reducing the carbon chain length and the number of fluorine atoms in the structure of ILs can effectively reduce their toxic effects on biological cells.
基金the National Natural Science Foundation of China(22373056,22031006,22393891)the National Key R&D Program of China(2023YFA1506402)+1 种基金the National Science&Technology Fundamental Resource Investigation Program of China(2018FY201200)Haihe Laboratory of Sustainable Chemical Transformations for financial support.L.Z.is supported by the National Program of Top-notchYoung Professionals.
文摘Bond dissociation energy(BDE),which refers to the enthalpy change for the homolysis of a specific covalent bond,is one of the basic thermodynamic properties of molecules.It is very important for understanding chemical reactivities,chemical properties and chemical transformations.Here,a machine learning-based comprehensive BDE prediction model was established based on the iBonD experimental BDE dataset and the calculated BDE dataset by St.John et al.Differential Structural and PhysicOChemical(D-SPOC)descriptors that reflected changes in molecules'structural and physicochemical features in the process of bond homolysis were designed as input features.
基金This work was supported by King Abdullah University of Science and Technology(KAUST)Office of Sponsored Research under the award number OSR-2019-CRG7-4077the KAUST Clean Fuels Consortium(KCFC)and its member companies.
文摘Chemical substances are essential in all aspects of human life,and understanding their properties is essential for developing chemical systems.The properties of chemical species can be accurately obtained by experiments or ab initio computational calculations;however,these are time-consuming and costly.In this work,machine learning models(ML)for estimating entropy,S,and constant pressure heat capacity,Cp,at 298.15 K,are developed for alkanes,alkenes,and alkynes.The training data for entropy and heat capacity are collected from the literature.Molecular descriptors generated using alvaDesc software are used as input features for the ML models.Support vector regression(SVR),v-support vector regression(v-SVR),and random forest regression(RFR)algorithms were trained with K-fold cross-validation on two levels.The first level assessed the models’performance,and the second level generated the final models.Between the three ML models chosen,SVR shows better performance on the test dataset.The SVR model was then compared against traditional Benson’s group additivity to illustrate the advantages of using the ML model.Finally,a sensitivity analysis is performed to find the most critical descriptors in the property estimations.
文摘Quantitative structure property relationship (QSPR) method is used to study the correlation models between the structures of a set of diverse organic compounds and their log P . Molecular descriptors calculated from structure alone are used to describe the molecular structures. A subset of the calculated descriptors, selected using forward stepwise regression, is used in the QSPR models development. Multiple linear regression (MLR) and radial basis function neural networks (RBFNNs) are utilized to construct the linear and non linear correlation model, respectively. The optimal QSPR model developed is based on a 7 17 1 RBFNNs architecture using seven calculated molecular descriptors. The root mean square errors in predictions for the training, predicting and overall data sets are 0.284, 0.327 and 0.291 log P units, respectively.
基金National Natural Science Foundation of China(Grant No.22178243).
文摘Separating monomeric cycloalkanes from naphtha obtained from direct coal liquefaction not only facilitates the valuable utilization of naphtha but also holds potential for addressing China’s domestic chemical feedstock market demand for these compounds.In extractive distillation processes of naphtha,relative volatility serves as a crucial parameter for extractant selection.However,determining relative volatility through conventional vapor-liquid equilibrium experiments for extractant selection proves challenging due to the complexity of naphtha’s compound composition.To address this challenge,a prediction model for the relative volatility of n-heptane/methylcyclohexane in various extractants has been developed using machine-learning quantitative structure-property relationship methods.The model enables rapid and cost-effective extractant selection.The statistical analysis of the model revealed favorable performance indicators,including a coefficient of determination of 0.88,cross-validation coefficient of 0.94,and root mean square error of 0.02.Factors such asα,EHOMO,ρ,and logPoct/water collectively influence relative volatility.Analysis of standardized coefficients in the multivariate linear regression equation identified density as the primary factor affecting the relative volatility of n-heptane/methylcyclohexane in the different extractants.Extractants with higher densities,devoid of branched chains,exhibited increased relative volatility compared to their counterparts with branched chains.Subsequently,the process of separating cycloalkane monomers from direct coal liquefaction products via extractive distillation was optimized using Aspen Plus software,achieving purities exceeding 0.99 and yields exceeding 0.90 for cyclohexane and methylcyclohexane monomers.Economic,energy consumption,and environmental assessments were conducted.Salicylic acid emerged as the most suitable extractant for purifying cycloalkanes in direct coal liquefaction naphtha due to its superior separation effectiveness,cost efficiency,and environmental benefits.The tower parameters of the simulated separation unit provide valuable insights for the design of actual industrial equipment.