In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functiona...In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functional relationship between the state variable and basic variables in reliability design. The algorithm has treated successfully some problems of implicit performance function in reliability analysis. However, its theoretical basis of empirical risk minimization narrows its range of applications for...展开更多
Miniature air quality sensors are widely used in urban grid-based monitoring due to their flexibility in deployment and low cost.However,the raw data collected by these devices often suffer from low accuracy caused by...Miniature air quality sensors are widely used in urban grid-based monitoring due to their flexibility in deployment and low cost.However,the raw data collected by these devices often suffer from low accuracy caused by environmental interference and sensor drift,highlighting the need for effective calibration methods to improve data reliability.This study proposes a data correction method based on Bayesian Optimization Support Vector Regression(BO-SVR),which combines the nonlinear modeling capability of Support Vector Regression(SVR)with the efficient global hyperparameter search of Bayesian Optimization.By introducing cross-validation loss as the optimization objective and using Gaussian process modeling with an Expected Improvement acquisition strategy,the approach automatically determines optimal hyperparameters for accurate pollutant concentration prediction.Experiments on real-world micro-sensor datasets demonstrate that BO-SVR outperforms traditional SVR,grid search SVR,and random forest(RF)models across multiple pollutants,including PM_(2.5),PM_(10),CO,NO_(2),SO_(2),and O_(3).The proposed method achieves lower prediction residuals,higher fitting accuracy,and better generalization,offering an efficient and practical solution for enhancing the quality of micro-sensor air monitoring data.展开更多
The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accura...The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.展开更多
In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the...In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.展开更多
Firstly,general regression neural network(GRNN) was used for variable selection of key influencing factors of residential load(RL) forecasting.Secondly,the key influencing factors chosen by GRNN were used as the input...Firstly,general regression neural network(GRNN) was used for variable selection of key influencing factors of residential load(RL) forecasting.Secondly,the key influencing factors chosen by GRNN were used as the input and output terminals of urban and rural RL for simulating and learning.In addition,the suitable parameters of final model were obtained through applying the evidence theory to combine the optimization results which were calculated with the PSO method and the Bayes theory.Then,the model of PSO-Bayes least squares support vector machine(PSO-Bayes-LS-SVM) was established.A case study was then provided for the learning and testing.The empirical analysis results show that the mean square errors of urban and rural RL forecast are 0.02% and 0.04%,respectively.At last,taking a specific province RL in China as an example,the forecast results of RL from 2011 to 2015 were obtained.展开更多
Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most...Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most commonly used formulations of support vector machines for regression (SVRs) aiming to emphasize its usability on large-scale applications. We review the general concept of support vector machines (SVMs), address the state-of-the-art on training methods SVMs, and explain the fundamental principle of SVRs. The most common learning methods for SVRs are introduced and linear programming-based SVR formulations are explained emphasizing its suitability for large-scale learning. Finally, this paper also discusses some open problems and current trends.展开更多
Radiometric normalization,as an essential step for multi-source and multi-temporal data processing,has received critical attention.Relative Radiometric Normalization(RRN)method has been primarily used for eliminating ...Radiometric normalization,as an essential step for multi-source and multi-temporal data processing,has received critical attention.Relative Radiometric Normalization(RRN)method has been primarily used for eliminating the radiometric inconsistency.The radiometric trans-forming relation between the subject image and the reference image is an essential aspect of RRN.Aimed at accurate radiometric transforming relation modeling,the learning-based nonlinear regression method,Support Vector machine Regression(SVR)is used for fitting the complicated radiometric transforming relation for the coarse-resolution data-referenced RRN.To evaluate the effectiveness of the proposed method,a series of experiments are performed,including two synthetic data experiments and one real data experiment.And the proposed method is compared with other methods that use linear regression,Artificial Neural Network(ANN)or Random Forest(RF)for radiometric transforming relation modeling.The results show that the proposed method performs well on fitting the radiometric transforming relation and could enhance the RRN performance.展开更多
In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using ...In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.展开更多
Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawin...Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawings, documentation and the like are still incomplete. As such, various techniques have been applied to accurately estimate construction costs at an early stage, when project information is limited. While the various techniques have their pros and cons, there has been little effort made to determine the best technique in terms of cost estimating performance. The objective of this research is to compare the accuracy of three estimating techniques (regression analysis (RA), neural network (NN), and support vector machine techniques (SVM)) by performing estimations of construction costs. By comparing the accuracy of these techniques using historical cost data, it was found that NN model showed more accurate estimation results than the RA and SVM models. Consequently, it is determined that NN model is most suitable for estimating the cost of school building projects.展开更多
Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introdu...Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introduced machine learning algorithms to path loss predictions because it offers a flexible network architecture and extensive data can be used. We introduced support vector regression (SVR) and radial basis function (RBF) models to path loss predictions in the investigated environments. The SVR model was able to process several input parameters without introducing complexity to the network architecture. The RBF on its part provides a good function approximation. Hyperparameter tuning of the machine learning models was carried out in order to achieve optimal results. The performances of the SVR and RBF models were compared and result validated using the root-mean squared error (RMSE). The two machine learning algorithms were also compared with the Cost-231, SUI, Egli, Freespace, Cost-231 W-I models. The analytical models overpredicted path loss. Overall, the machine learning models predicted path loss with greater accuracy than the empirical models. The SVR model performed best across all the indices with RMSE values of 1.378 dB, 1.4523 dB, 2.1568 dB in rural, suburban and urban settings respectively and should therefore be adopted for signal propagation in the investigated environments and beyond.展开更多
The principle of the support vector regression machine(SVR) is first analysed. Then the new data-dependent kernel function is constructed from information geometry perspective. The current waveforms change regularly...The principle of the support vector regression machine(SVR) is first analysed. Then the new data-dependent kernel function is constructed from information geometry perspective. The current waveforms change regularly in accordance with the different horizontal offset when the rotational frequency of the high speed rotational arc sensor is in the range from 15 Hz to 30 Hz. The welding current data is pretreated by wavelet filtering, mean filtering and normalization treatment. The SVR model is constructed by making use of the evolvement laws, the decision function can be achieved by training the SVR and the seam offset can be identified. The experimental results show that the precision of the offset identification can be greatly improved by modifying the SVR and applying mean filteringfrom the longitudinal direction.展开更多
The prediction of magnitude (M) of reservoir induced earthquake is an important task in earthquake engineering. In this article, we employ a Support Vector Machine (SVM) and Gaussian Process Regression (GPR) for...The prediction of magnitude (M) of reservoir induced earthquake is an important task in earthquake engineering. In this article, we employ a Support Vector Machine (SVM) and Gaussian Process Regression (GPR) for prediction of reservoir induced earthquake M based on reservoir parameters. Comprehensive parameter (E) and maximum reservoir depth] (H) are considered as inputs to the SVM and GPR. We give an equation for determination oil reservoir induced earthquake M. The developed SVM and GPR have been compared with] the Artificial Neural Network (ANN) method. The results show that the developed SVM and] GPR are efficient tools for prediction of reservoir induced earthquake M. /展开更多
In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot al...In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot always provide sufficiently reliable solutions.Nevertheless,Machine Learning(ML)techniques,which offer advanced regression tools to address complicated engineering issues,have been developed and widely explored.This study investigates the selected ML techniques to evaluate their suitability for application in the hot deformation behavior of metallic materials.The ML-based regression methods of Artificial Neural Networks(ANNs),Support Vector Machine(SVM),Decision Tree Regression(DTR),and Gaussian Process Regression(GPR)are applied to mathematically describe hot flow stress curve datasets acquired experimentally for a medium-carbon steel.Although the GPR method has not been used for such a regression task before,the results showed that its performance is the most favorable and practically unrivaled;neither the ANN method nor the other studied ML techniques provide such precise results of the solved regression analysis.展开更多
Permafrost is one of the key components of the cryosphere.Previous studies show that the extent of permafrost has shifted to higher elevations in Nepal.These researches,however,has been hampered by inconsistency in th...Permafrost is one of the key components of the cryosphere.Previous studies show that the extent of permafrost has shifted to higher elevations in Nepal.These researches,however,has been hampered by inconsistency in their study period.Proxies like rock glaciers and climatic variables,such as multi-decadal annual air temperature,are used to link towards the likely occurrence of permafrost.Here,the rock glacier inventory of Solukhumbu was prepared,and classified based on their activity(Intact/Relict)from Google Earth.Talus-based rock glaciers were observed more than glacier-derived ones.These rock glaciers were highly correlated with Mean Annual Air Temperature,followed by potential incoming solar radiation and slope.Three machine learning models(Logistic Regression,Random Forest and Support Vector Machines)were trained to generate permafrost probability distribution maps based on their prediction of the probability of rock glaciers being intact as opposed to relict.Logistic Regression and Support Vector Machines were able to produce a similar spatial distribution of permafrost.However,the Random Forest has low precision of spatial variation.The permafrost distribution map suggests the likely occurrence of permafrost to be above 5000 m,indicating a potential for rock and landslides should it thaw in the future.While higher-resolution input data can improve the results,this approach remains promising for application in permafrost regions where information about the ice content of rock glaciers isverylimited.展开更多
Determining the optimal ceramic content of the ceramics-in-polymer composite electrolytes and the appropriate stack pressure can effectively improve the interfacial contact of solid-state batteries(SSBs).Based on the ...Determining the optimal ceramic content of the ceramics-in-polymer composite electrolytes and the appropriate stack pressure can effectively improve the interfacial contact of solid-state batteries(SSBs).Based on the contact mechanics model and constructed by the conjugate gradient method,continuous convolution,and fast Fourier transform,this paper analyzes and compares the interfacial contact responses involving the polymers commonly used in SSBs,which provides the original training data for machine learning.A support vector regression model is established to predict the relationship between the content of ceramics and the interfacial resistance.The Bayesian optimization and K-fold cross-validation are introduced to find the optimal combination of hyperparameters,which accelerates the training process and improves the model’s accuracy.We found the relationship between the content of ceramics,the stack pressure,and the interfacial resistance.The results can be taken as a reference for the design of the low-resistance composite electrolytes for solid-state batteries.展开更多
BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms ...BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.展开更多
Very few studies have benefited from the synergetic implementation of visible,near-infrared,and shortwave infrared(VNIR-SWIR)spectra and terrain attributes in predicting Pb content in agricultural soils.To fill this g...Very few studies have benefited from the synergetic implementation of visible,near-infrared,and shortwave infrared(VNIR-SWIR)spectra and terrain attributes in predicting Pb content in agricultural soils.To fill this gap,this study aimed to predict lead(Pb)contents in agricultural soils by combining machine learning algorithms(MLAs)with VNIR-SWIR spectra or/and terrain attributes under three distinct approaches.Six MLAs were tested,including artificial neural network(ANN),partial least squares regression,support vector machine(SVM),Gaussian process regression(GPR),extreme gradient boosting(EGB),and Cubist.The VNIR-SWIR spectral data were preprocessed by methods of discrete wavelet transformation,logarithmic transformation-Savitzky Golay smoothing,standard normal variate(SNV),multiplicative scatter correction,first derivative(Fi D),and second derivative.In approach 1,MLAs were combined with the preprocessed VNIR-SWIR spectral data.The Cubist-Fi D combination was the most effective,achieving a coefficient of determination(R2)of 0.63,a concordance correlation coefficient(CCC)of 0.51,a mean absolute error(MAE)of 6.87 mg kg^(-1),and a root mean square error(RMSE)of8.66 mg kg^(-1).In approach 2,MLAs were combined with both preprocessed VNIR-SWIR spectral data and terrain attributes,and the EGB-SNV combination yielded superior results with R2of 0.75,CCC of 0.65,MAE of 5.48 mg kg^(-1),and RMSE of 7.34 mg kg^(-1).Approach 3 combined MLAs and terrain attributes,and Cubist demonstrated the best prediction results,with R^(2) of 0.75,CCC of 0.66,MAE of 6.18 mg kg^(-1),and RMSE of 7.71 mg kg^(-1).The cumulative assessment identified the fusion of terrain properties,SNV-preprocessed VNIR-SWIR spectra,and EGB as the optimal method for estimating Pb content in agricultural soils,yielding the highest R2value and minimal error.Comparatively,GPR,ANN,and SVM techniques achieved higher R2values in approaches 2 and 3 but also exhibited higher estimation errors.In conclusion,the study underscores the significance of using relevant auxiliary datasets and appropriate MLAs for accurate Pb content prediction with minimal error in agricultural soils.The findings contribute valuable insights for developing successful soil management strategies based on predictive modeling.展开更多
The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will resu...The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.展开更多
Objective: Support Vector Machine (SVM) is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. In this paper, SVM wa...Objective: Support Vector Machine (SVM) is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. In this paper, SVM was applied to predict 5-year survival status of patients with nasopharyngeal carcinoma (NPC) after treatment, we expect to find a new way for prognosis studies in cancer so as to assist right clinical decision for individual patient. Methods: Two modelling methods were used in the study; SVM network and a standard parametric logistic regression were used to model 5-year survival status. And the two methods were compared on a prospective set of patients not used in model construction via receiver operating characteristic (ROC) curve analysis. Results: The SVM1, trained with the 25 original input variables without screening, yielded a ROC area of 0.868, at sensitivity to mortality of 79.2% and the specificity of 94.5%. Similarly, the SVM2, trained with 9 input variables which were obtained by optimal input variable selection from the 25 original variables by logistic regression screening, yielded a ROC area of 0.874, at a sensitivity to mortality of 79.2% and the specificity of 95.6%, while the logistic regression yielded a ROC area of 0.751 at a sensitivity to mortality of 66.7% and gave a specificity of 83.5%. Conclusion: SVM found a strong pattern in the database predictive of 5-year survival status. The logistic regression produces somewhat similar, but better, results. These results show that the SVM models have the potential to predict individual patient's 5-year survival status after treatment, and to assist the clinicians for making a good clinical decision.展开更多
The martensitic transformation temperature is the basis for the application of shape memory alloys(SMAs),and the ability to quickly and accurately predict the transformation temperature of SMAs has very important prac...The martensitic transformation temperature is the basis for the application of shape memory alloys(SMAs),and the ability to quickly and accurately predict the transformation temperature of SMAs has very important practical significance.In this work,machine learning(ML)methods were utilized to accelerate the search for shape memory alloys with targeted properties(phase transition temperature).A group of component data was selected to design shape memory alloys using reverse design method from numerous unexplored data.Component modeling and feature modeling were used to predict the phase transition temperature of the shape memory alloys.The experimental results of the shape memory alloys were obtained to verify the effectiveness of the support vector regression(SVR)model.The results show that the machine learning model can obtain target materials more efficiently and pertinently,and realize the accurate and rapid design of shape memory alloys with specific target phase transition temperature.On this basis,the relationship between phase transition temperature and material descriptors is analyzed,and it is proved that the key factors affecting the phase transition temperature of shape memory alloys are based on the strength of the bond energy between atoms.This work provides new ideas for the controllable design and performance optimization of Cu-based shape memory alloys.展开更多
基金National High-tech Research and Development Pro-gram (2006AA04Z405)
文摘In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functional relationship between the state variable and basic variables in reliability design. The algorithm has treated successfully some problems of implicit performance function in reliability analysis. However, its theoretical basis of empirical risk minimization narrows its range of applications for...
文摘Miniature air quality sensors are widely used in urban grid-based monitoring due to their flexibility in deployment and low cost.However,the raw data collected by these devices often suffer from low accuracy caused by environmental interference and sensor drift,highlighting the need for effective calibration methods to improve data reliability.This study proposes a data correction method based on Bayesian Optimization Support Vector Regression(BO-SVR),which combines the nonlinear modeling capability of Support Vector Regression(SVR)with the efficient global hyperparameter search of Bayesian Optimization.By introducing cross-validation loss as the optimization objective and using Gaussian process modeling with an Expected Improvement acquisition strategy,the approach automatically determines optimal hyperparameters for accurate pollutant concentration prediction.Experiments on real-world micro-sensor datasets demonstrate that BO-SVR outperforms traditional SVR,grid search SVR,and random forest(RF)models across multiple pollutants,including PM_(2.5),PM_(10),CO,NO_(2),SO_(2),and O_(3).The proposed method achieves lower prediction residuals,higher fitting accuracy,and better generalization,offering an efficient and practical solution for enhancing the quality of micro-sensor air monitoring data.
文摘The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.
基金Project supported by the National Natural Science Foundation of China (Grant No 60573065)the Natural Science Foundation of Shandong Province,China (Grant No Y2007G33)the Key Subject Research Foundation of Shandong Province,China(Grant No XTD0708)
文摘In this paper we apply the nonlinear time series analysis method to small-time scale traffic measurement data. The prediction-based method is used to determine the embedding dimension of the traffic data. Based on the reconstructed phase space, the local support vector machine prediction method is used to predict the traffic measurement data, and the BIC-based neighbouring point selection method is used to choose the number of the nearest neighbouring points for the local support vector machine regression model. The experimental results show that the local support vector machine prediction method whose neighbouring points are optimized can effectively predict the small-time scale traffic measurement data and can reproduce the statistical features of real traffic measurements.
基金Project(07JA790092) supported by the Research Grants from Humanities and Social Science Program of Ministry of Education of ChinaProject(10MR44) supported by the Fundamental Research Funds for the Central Universities in China
文摘Firstly,general regression neural network(GRNN) was used for variable selection of key influencing factors of residential load(RL) forecasting.Secondly,the key influencing factors chosen by GRNN were used as the input and output terminals of urban and rural RL for simulating and learning.In addition,the suitable parameters of final model were obtained through applying the evidence theory to combine the optimization results which were calculated with the PSO method and the Bayes theory.Then,the model of PSO-Bayes least squares support vector machine(PSO-Bayes-LS-SVM) was established.A case study was then provided for the learning and testing.The empirical analysis results show that the mean square errors of urban and rural RL forecast are 0.02% and 0.04%,respectively.At last,taking a specific province RL in China as an example,the forecast results of RL from 2011 to 2015 were obtained.
文摘Support Vector-based learning methods are an important part of Computational Intelligence techniques. Recent efforts have been dealing with the problem of learning from very large datasets. This paper reviews the most commonly used formulations of support vector machines for regression (SVRs) aiming to emphasize its usability on large-scale applications. We review the general concept of support vector machines (SVMs), address the state-of-the-art on training methods SVMs, and explain the fundamental principle of SVRs. The most common learning methods for SVRs are introduced and linear programming-based SVR formulations are explained emphasizing its suitability for large-scale learning. Finally, this paper also discusses some open problems and current trends.
基金This research was funded by the National Natural Science Fund of China[grant number 41701415]Science fund project of Wuhan Institute of Technology[grant number K201724]Science and Technology Development Funds Project of Department of Transportation of Hubei Province[grant number 201900001].
文摘Radiometric normalization,as an essential step for multi-source and multi-temporal data processing,has received critical attention.Relative Radiometric Normalization(RRN)method has been primarily used for eliminating the radiometric inconsistency.The radiometric trans-forming relation between the subject image and the reference image is an essential aspect of RRN.Aimed at accurate radiometric transforming relation modeling,the learning-based nonlinear regression method,Support Vector machine Regression(SVR)is used for fitting the complicated radiometric transforming relation for the coarse-resolution data-referenced RRN.To evaluate the effectiveness of the proposed method,a series of experiments are performed,including two synthetic data experiments and one real data experiment.And the proposed method is compared with other methods that use linear regression,Artificial Neural Network(ANN)or Random Forest(RF)for radiometric transforming relation modeling.The results show that the proposed method performs well on fitting the radiometric transforming relation and could enhance the RRN performance.
基金Project supported by the National Natural Science Foundation of China (Grant Nos. 10674172 and 10874229)
文摘In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method.
文摘Accurate cost estimation at the early stage of a construction project is key factor in a project’s success. But it is difficult to quickly and accurately estimate construction costs at the planning stage, when drawings, documentation and the like are still incomplete. As such, various techniques have been applied to accurately estimate construction costs at an early stage, when project information is limited. While the various techniques have their pros and cons, there has been little effort made to determine the best technique in terms of cost estimating performance. The objective of this research is to compare the accuracy of three estimating techniques (regression analysis (RA), neural network (NN), and support vector machine techniques (SVM)) by performing estimations of construction costs. By comparing the accuracy of these techniques using historical cost data, it was found that NN model showed more accurate estimation results than the RA and SVM models. Consequently, it is determined that NN model is most suitable for estimating the cost of school building projects.
文摘Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introduced machine learning algorithms to path loss predictions because it offers a flexible network architecture and extensive data can be used. We introduced support vector regression (SVR) and radial basis function (RBF) models to path loss predictions in the investigated environments. The SVR model was able to process several input parameters without introducing complexity to the network architecture. The RBF on its part provides a good function approximation. Hyperparameter tuning of the machine learning models was carried out in order to achieve optimal results. The performances of the SVR and RBF models were compared and result validated using the root-mean squared error (RMSE). The two machine learning algorithms were also compared with the Cost-231, SUI, Egli, Freespace, Cost-231 W-I models. The analytical models overpredicted path loss. Overall, the machine learning models predicted path loss with greater accuracy than the empirical models. The SVR model performed best across all the indices with RMSE values of 1.378 dB, 1.4523 dB, 2.1568 dB in rural, suburban and urban settings respectively and should therefore be adopted for signal propagation in the investigated environments and beyond.
基金Supported by National Natural Science Foundation of China( No. 50705030).
文摘The principle of the support vector regression machine(SVR) is first analysed. Then the new data-dependent kernel function is constructed from information geometry perspective. The current waveforms change regularly in accordance with the different horizontal offset when the rotational frequency of the high speed rotational arc sensor is in the range from 15 Hz to 30 Hz. The welding current data is pretreated by wavelet filtering, mean filtering and normalization treatment. The SVR model is constructed by making use of the evolvement laws, the decision function can be achieved by training the SVR and the seam offset can be identified. The experimental results show that the precision of the offset identification can be greatly improved by modifying the SVR and applying mean filteringfrom the longitudinal direction.
文摘The prediction of magnitude (M) of reservoir induced earthquake is an important task in earthquake engineering. In this article, we employ a Support Vector Machine (SVM) and Gaussian Process Regression (GPR) for prediction of reservoir induced earthquake M based on reservoir parameters. Comprehensive parameter (E) and maximum reservoir depth] (H) are considered as inputs to the SVM and GPR. We give an equation for determination oil reservoir induced earthquake M. The developed SVM and GPR have been compared with] the Artificial Neural Network (ANN) method. The results show that the developed SVM and] GPR are efficient tools for prediction of reservoir induced earthquake M. /
基金supported by the SP2024/089 Project by the Faculty of Materials Science and Technology,VˇSB-Technical University of Ostrava.
文摘In engineering practice,it is often necessary to determine functional relationships between dependent and independent variables.These relationships can be highly nonlinear,and classical regression approaches cannot always provide sufficiently reliable solutions.Nevertheless,Machine Learning(ML)techniques,which offer advanced regression tools to address complicated engineering issues,have been developed and widely explored.This study investigates the selected ML techniques to evaluate their suitability for application in the hot deformation behavior of metallic materials.The ML-based regression methods of Artificial Neural Networks(ANNs),Support Vector Machine(SVM),Decision Tree Regression(DTR),and Gaussian Process Regression(GPR)are applied to mathematically describe hot flow stress curve datasets acquired experimentally for a medium-carbon steel.Although the GPR method has not been used for such a regression task before,the results showed that its performance is the most favorable and practically unrivaled;neither the ANN method nor the other studied ML techniques provide such precise results of the solved regression analysis.
基金supported by the French National Research Institute for Sustainable Development through 2022 LMI WATER-HIMALsupported by a University Grant Commission Nepal through faculty research grant-76/77。
文摘Permafrost is one of the key components of the cryosphere.Previous studies show that the extent of permafrost has shifted to higher elevations in Nepal.These researches,however,has been hampered by inconsistency in their study period.Proxies like rock glaciers and climatic variables,such as multi-decadal annual air temperature,are used to link towards the likely occurrence of permafrost.Here,the rock glacier inventory of Solukhumbu was prepared,and classified based on their activity(Intact/Relict)from Google Earth.Talus-based rock glaciers were observed more than glacier-derived ones.These rock glaciers were highly correlated with Mean Annual Air Temperature,followed by potential incoming solar radiation and slope.Three machine learning models(Logistic Regression,Random Forest and Support Vector Machines)were trained to generate permafrost probability distribution maps based on their prediction of the probability of rock glaciers being intact as opposed to relict.Logistic Regression and Support Vector Machines were able to produce a similar spatial distribution of permafrost.However,the Random Forest has low precision of spatial variation.The permafrost distribution map suggests the likely occurrence of permafrost to be above 5000 m,indicating a potential for rock and landslides should it thaw in the future.While higher-resolution input data can improve the results,this approach remains promising for application in permafrost regions where information about the ice content of rock glaciers isverylimited.
基金the National Natural Science Foundation of China(12102085)the Postdoctoral Science Foundation of China(2023M730504)the Sichuan Province Regional Innovation and Cooperation Project(2024YFHZ0210).
文摘Determining the optimal ceramic content of the ceramics-in-polymer composite electrolytes and the appropriate stack pressure can effectively improve the interfacial contact of solid-state batteries(SSBs).Based on the contact mechanics model and constructed by the conjugate gradient method,continuous convolution,and fast Fourier transform,this paper analyzes and compares the interfacial contact responses involving the polymers commonly used in SSBs,which provides the original training data for machine learning.A support vector regression model is established to predict the relationship between the content of ceramics and the interfacial resistance.The Bayesian optimization and K-fold cross-validation are introduced to find the optimal combination of hyperparameters,which accelerates the training process and improves the model’s accuracy.We found the relationship between the content of ceramics,the stack pressure,and the interfacial resistance.The results can be taken as a reference for the design of the low-resistance composite electrolytes for solid-state batteries.
文摘BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.
基金supported by an institutional Ph.D.grant(No.21130/1312/3131)from the Faculty of Agrobiology,Food,and Natural Resources at the Czech University of Life Sciences Prague(CZU),Czech Republic。
文摘Very few studies have benefited from the synergetic implementation of visible,near-infrared,and shortwave infrared(VNIR-SWIR)spectra and terrain attributes in predicting Pb content in agricultural soils.To fill this gap,this study aimed to predict lead(Pb)contents in agricultural soils by combining machine learning algorithms(MLAs)with VNIR-SWIR spectra or/and terrain attributes under three distinct approaches.Six MLAs were tested,including artificial neural network(ANN),partial least squares regression,support vector machine(SVM),Gaussian process regression(GPR),extreme gradient boosting(EGB),and Cubist.The VNIR-SWIR spectral data were preprocessed by methods of discrete wavelet transformation,logarithmic transformation-Savitzky Golay smoothing,standard normal variate(SNV),multiplicative scatter correction,first derivative(Fi D),and second derivative.In approach 1,MLAs were combined with the preprocessed VNIR-SWIR spectral data.The Cubist-Fi D combination was the most effective,achieving a coefficient of determination(R2)of 0.63,a concordance correlation coefficient(CCC)of 0.51,a mean absolute error(MAE)of 6.87 mg kg^(-1),and a root mean square error(RMSE)of8.66 mg kg^(-1).In approach 2,MLAs were combined with both preprocessed VNIR-SWIR spectral data and terrain attributes,and the EGB-SNV combination yielded superior results with R2of 0.75,CCC of 0.65,MAE of 5.48 mg kg^(-1),and RMSE of 7.34 mg kg^(-1).Approach 3 combined MLAs and terrain attributes,and Cubist demonstrated the best prediction results,with R^(2) of 0.75,CCC of 0.66,MAE of 6.18 mg kg^(-1),and RMSE of 7.71 mg kg^(-1).The cumulative assessment identified the fusion of terrain properties,SNV-preprocessed VNIR-SWIR spectra,and EGB as the optimal method for estimating Pb content in agricultural soils,yielding the highest R2value and minimal error.Comparatively,GPR,ANN,and SVM techniques achieved higher R2values in approaches 2 and 3 but also exhibited higher estimation errors.In conclusion,the study underscores the significance of using relevant auxiliary datasets and appropriate MLAs for accurate Pb content prediction with minimal error in agricultural soils.The findings contribute valuable insights for developing successful soil management strategies based on predictive modeling.
基金Hebei Province Key Research and Development Project(No.20313701D)Hebei Province Key Research and Development Project(No.19210404D)+13 种基金Mobile computing and universal equipment for the Beijing Key Laboratory Open Project,The National Social Science Fund of China(17AJL014)Beijing University of Posts and Telecommunications Construction of World-Class Disciplines and Characteristic Development Guidance Special Fund “Cultural Inheritance and Innovation”Project(No.505019221)National Natural Science Foundation of China(No.U1536112)National Natural Science Foundation of China(No.81673697)National Natural Science Foundation of China(61872046)The National Social Science Fund Key Project of China(No.17AJL014)“Blue Fire Project”(Huizhou)University of Technology Joint Innovation Project(CXZJHZ201729)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902218004)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201902024006)Industry-University Cooperation Cooperative Education Project of the Ministry of Education(No.201901197007)Industry-University Cooperation Collaborative Education Project of the Ministry of Education(No.201901199005)The Ministry of Education Industry-University Cooperation Collaborative Education Project(No.201901197001)Shijiazhuang science and technology plan project(236240267A)Hebei Province key research and development plan project(20312701D)。
文摘The distribution of data has a significant impact on the results of classification.When the distribution of one class is insignificant compared to the distribution of another class,data imbalance occurs.This will result in rising outlier values and noise.Therefore,the speed and performance of classification could be greatly affected.Given the above problems,this paper starts with the motivation and mathematical representing of classification,puts forward a new classification method based on the relationship between different classification formulations.Combined with the vector characteristics of the actual problem and the choice of matrix characteristics,we firstly analyze the orderly regression to introduce slack variables to solve the constraint problem of the lone point.Then we introduce the fuzzy factors to solve the problem of the gap between the isolated points on the basis of the support vector machine.We introduce the cost control to solve the problem of sample skew.Finally,based on the bi-boundary support vector machine,a twostep weight setting twin classifier is constructed.This can help to identify multitasks with feature-selected patterns without the need for additional optimizers,which solves the problem of large-scale classification that can’t deal effectively with the very low category distribution gap.
文摘Objective: Support Vector Machine (SVM) is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. In this paper, SVM was applied to predict 5-year survival status of patients with nasopharyngeal carcinoma (NPC) after treatment, we expect to find a new way for prognosis studies in cancer so as to assist right clinical decision for individual patient. Methods: Two modelling methods were used in the study; SVM network and a standard parametric logistic regression were used to model 5-year survival status. And the two methods were compared on a prospective set of patients not used in model construction via receiver operating characteristic (ROC) curve analysis. Results: The SVM1, trained with the 25 original input variables without screening, yielded a ROC area of 0.868, at sensitivity to mortality of 79.2% and the specificity of 94.5%. Similarly, the SVM2, trained with 9 input variables which were obtained by optimal input variable selection from the 25 original variables by logistic regression screening, yielded a ROC area of 0.874, at a sensitivity to mortality of 79.2% and the specificity of 95.6%, while the logistic regression yielded a ROC area of 0.751 at a sensitivity to mortality of 66.7% and gave a specificity of 83.5%. Conclusion: SVM found a strong pattern in the database predictive of 5-year survival status. The logistic regression produces somewhat similar, but better, results. These results show that the SVM models have the potential to predict individual patient's 5-year survival status after treatment, and to assist the clinicians for making a good clinical decision.
基金financially supported by the National Natural Science Foundation of China(No.51974028)。
文摘The martensitic transformation temperature is the basis for the application of shape memory alloys(SMAs),and the ability to quickly and accurately predict the transformation temperature of SMAs has very important practical significance.In this work,machine learning(ML)methods were utilized to accelerate the search for shape memory alloys with targeted properties(phase transition temperature).A group of component data was selected to design shape memory alloys using reverse design method from numerous unexplored data.Component modeling and feature modeling were used to predict the phase transition temperature of the shape memory alloys.The experimental results of the shape memory alloys were obtained to verify the effectiveness of the support vector regression(SVR)model.The results show that the machine learning model can obtain target materials more efficiently and pertinently,and realize the accurate and rapid design of shape memory alloys with specific target phase transition temperature.On this basis,the relationship between phase transition temperature and material descriptors is analyzed,and it is proved that the key factors affecting the phase transition temperature of shape memory alloys are based on the strength of the bond energy between atoms.This work provides new ideas for the controllable design and performance optimization of Cu-based shape memory alloys.