Polynomial-time randomized algorithms were constructed to approximately solve optimal robust performance controller design problems in probabilistic sense and the rigorous mathematical justification of the approach wa...Polynomial-time randomized algorithms were constructed to approximately solve optimal robust performance controller design problems in probabilistic sense and the rigorous mathematical justification of the approach was given. The randomized algorithms here were based on a property from statistical learning theory known as (uniform) convergence of empirical means (UCEM). It is argued that in order to assess the performance of a controller as the plant varies over a pre-specified family, it is better to use the average performance of the controller as the objective function to be optimized, rather than its worst-case performance. The approach is illustrated to be efficient through an example.展开更多
The sampling problem for input-queued (IQ) randomized scheduling algorithms is analyzed.We observe that if the current scheduling decision is a maximum weighted matching (MWM),the MWM for the next slot mostly falls in...The sampling problem for input-queued (IQ) randomized scheduling algorithms is analyzed.We observe that if the current scheduling decision is a maximum weighted matching (MWM),the MWM for the next slot mostly falls in those matchings whose weight is closed to the current MWM.Using this heuristic,a novel randomized algorithm for IQ scheduling,named genetic algorithm-like scheduling algorithm (GALSA),is proposed.Evolutionary strategy is used for choosing sampling points in GALSA.GALSA works with only O(N) samples which means that GALSA has lower complexity than the famous randomized scheduling algorithm,APSARA.Simulation results show that the delay performance of GALSA is quite competitive with respect to that of APSARA.展开更多
A switch from avian-typeα-2,3 to human-typeα-2,6 receptors is an essential element for the initiation of a pandemic from an avian influenza virus.Some H9N2 viruses exhibit a preference for binding to human-typeα-2,...A switch from avian-typeα-2,3 to human-typeα-2,6 receptors is an essential element for the initiation of a pandemic from an avian influenza virus.Some H9N2 viruses exhibit a preference for binding to human-typeα-2,6 receptors.This identifies their potential threat to public health.However,our understanding of the molecular basis for the switch of receptor preference is still limited.In this study,we employed the random forest algorithm to identify the potentially key amino acid sites within hemagglutinin(HA),which are associated with the receptor binding ability of H9N2 avian influenza virus(AIV).Subsequently,these sites were further verified by receptor binding assays.A total of 12 substitutions in the HA protein(N158D,N158S,A160 N,A160D,A160T,T163I,T163V,V190T,V190A,D193 N,D193G,and N231D)were predicted to prefer binding toα-2,6 receptors.Except for the V190T substitution,the other substitutions were demonstrated to display an affinity for preferential binding toα-2,6 receptors by receptor binding assays.Especially,the A160T substitution caused a significant upregulation of immune-response genes and an increased mortality rate in mice.Our findings provide novel insights into understanding the genetic basis of receptor preference of the H9N2 AIV.展开更多
Feature selection is a crucial problem in efficient machine learning,and it also greatly contributes to the explainability of machine-driven decisions.Methods,like decision trees and Least Absolute Shrinkage and Selec...Feature selection is a crucial problem in efficient machine learning,and it also greatly contributes to the explainability of machine-driven decisions.Methods,like decision trees and Least Absolute Shrinkage and Selection Operator(LASSO),can select features during training.However,these embedded approaches can only be applied to a small subset of machine learning models.Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost.To enhance their efficiency,many randomized algorithms have been designed.In this paper,we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection.We conduct theoretical computational complexity analysis and further explain our algorithms’generic parallelizability.We conduct experiments on both synthetic and real datasets with different machine learning base models.Results show that,compared with existing approaches,our proposed techniques can locate a more meaningful set of features with a high efficiency.展开更多
This paper presents an improved Randomized Circle Detection (RCD) algorithm with the characteristic of circularity to detect randomized circle in images with complex background, which is not based on the Hough Transfo...This paper presents an improved Randomized Circle Detection (RCD) algorithm with the characteristic of circularity to detect randomized circle in images with complex background, which is not based on the Hough Transform. The experimental results denote that this algorithm can locate the circular mark of Printed Circuit Board (PCB).展开更多
The quality of hot-rolled steel strip is directly affected by the strip crown.Traditional machine learning models have shown limitations in accurately predicting the strip crown,particularly when dealing with imbalanc...The quality of hot-rolled steel strip is directly affected by the strip crown.Traditional machine learning models have shown limitations in accurately predicting the strip crown,particularly when dealing with imbalanced data.This limitation results in poor production quality and efficiency,leading to increased production costs.Thus,a novel strip crown prediction model that uses the Boruta and extremely randomized trees(Boruta-ERT)algorithms to address this issue was proposed.To improve the accuracy of our model,we utilized the synthetic minority over-sampling technique to balance the imbalance data sets.The Boruta-ERT prediction model was then used to select features and predict the strip crown.With the 2160 mm hot rolling production lines of a steel plant serving as the research object,the experimental results showed that 97.01% of prediction data have an absolute error of less than 8 lm.This level of accuracy met the control requirements for strip crown and demonstrated significant benefits for the improvement in production quality of steel strip.展开更多
Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face ...Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.展开更多
Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous r...Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.展开更多
Tikhonov regularization is a powerful tool for solving linear discrete ill-posed problems.However,effective methods for dealing with large-scale ill-posed problems are still lacking.The Kaczmarz method is an effective...Tikhonov regularization is a powerful tool for solving linear discrete ill-posed problems.However,effective methods for dealing with large-scale ill-posed problems are still lacking.The Kaczmarz method is an effective iterative projection algorithm for solving large linear equations due to its simplicity.We propose a regularized randomized extended Kaczmarz(RREK)algorithm for solving large discrete ill-posed problems via combining the Tikhonov regularization and the randomized Kaczmarz method.The convergence of the algorithm is proved.Numerical experiments illustrate that the proposed algorithm has higher accuracy and better image restoration quality compared with the existing randomized extended Kaczmarz(REK)method.展开更多
In the Internet, a group of replicated servers is commonly used in order to improve the scalability of network service. Anycast service is a new network service that can improve network load distribution and simplify ...In the Internet, a group of replicated servers is commonly used in order to improve the scalability of network service. Anycast service is a new network service that can improve network load distribution and simplify certain applications. In this paper, the authors described a simple anycast service model in the Internet without significant affecting the routing and protocol processing infrastructure that was already in place, and proposed an anycast QoS routing algorithm for this model. The algorithm used randomized method to balance network load and improve its performance. Several new techniques are proposed in the algorithm, first, theminimum hops for each node are used in the algorithm, which are used as metric for computing the probability of possible out links. The metric is pre computed for each node in the network, which can simplify the network complexity and provide the routing process with useful information. Second, randomness is used at the link level and depends dynamically on the routing configuration. This provides great flexibility for the routing process, prevents the routing process from overusing certain fixed routing paths, and adequately balances the delay of the routing path. the authors assess the quality of QoS algorithm in terms of the acceptance ratio on anycast QoS requests, and the simulation results on a variety of network topologies and on various parameters show that the algorithm has good performances and can balance network load effectively.展开更多
The generalized singular value decomposition(GSVD)of two matrices with the same number of columns is a very useful tool in many practical applications.However,the GSVD may suffer from heavy computational time and memo...The generalized singular value decomposition(GSVD)of two matrices with the same number of columns is a very useful tool in many practical applications.However,the GSVD may suffer from heavy computational time and memory requirement when the scale of the matrices is quite large.In this paper,we use random projections to capture the most of the action of the matrices and propose randomized algorithms for computing a low-rank approximation of the GSVD.Serval error bounds of the approximation are also presented for the proposed randomized algorithms.Finally,some experimental results show that the proposed randomized algorithms can achieve a good accuracy with less computational cost and storage requirement.展开更多
This paper presents a framework for constructing surrogate models for sensitivity analysis of structural dynamics behavior.Physical models involving deformation,such as collisions,vibrations,and penetration,are devel-...This paper presents a framework for constructing surrogate models for sensitivity analysis of structural dynamics behavior.Physical models involving deformation,such as collisions,vibrations,and penetration,are devel-oped using the material point method.To reduce the computational cost of Monte Carlo simulations,response surface models are created as surrogate models for the material point system to approximate its dynamic behavior.An adaptive randomized greedy algorithm is employed to construct a sparse polynomial chaos expansion model with a fixed order,effectively balancing the accuracy and computational efficiency of the surrogate model.Based on the sparse polynomial chaos expansion,sensitivity analysis is conducted using the global finite difference and Sobol methods.Several examples of structural dynamics are provided to demonstrate the effectiveness of the proposed method in addressing structural dynamics problems.展开更多
Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in ...Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in different database repositories every day. Most of the review data are useful to new customers for theier further purchases as well as existing companies to view customers feedback about various products. Data Mining and Machine Leaning techniques are familiar to analyse such kind of data to visualise and know the potential use of the purchased items through online. The customers are making quality of products through their sentiments about the purchased items from different online companies. In this research work, it is analysed sentiments of Headphone review data, which is collected from online repositories. For the analysis of Headphone review data, some of the Machine Learning techniques like Support Vector Machines, Naive Bayes, Decision Trees and Random Forest Algorithms and a Hybrid method are applied to find the quality via the customers’ sentiments. The accuracy and performance of the taken algorithms are also analysed based on the three types of sentiments such as positive, negative and neutral.展开更多
Characterizing the petrophysical properties holds significant importance in shale oil reservoirs.Twodimensional(2-D)nuclear magnetic resonance(NMR),a nondestructive and noninvasive technique,has numerous applications ...Characterizing the petrophysical properties holds significant importance in shale oil reservoirs.Twodimensional(2-D)nuclear magnetic resonance(NMR),a nondestructive and noninvasive technique,has numerous applications in petrophysical characterization.However,the complex occurrence states of the fluids and the highly non-uniform distributions of minerals and organic matter pose challenges in the NMR-based petrophysical characterization.A novel T_(1)-T_(2)relaxation theory is introduced for the first time in this study.The transverse and longitudinal relaxivities of pore fluids are determined based on numerical investigation and experimental analysis.Additionally,an improved random walk algorithm is proposed to,on the basis of digital shale core,simulate the effects of the hydrogen index(HI)for the organic matter,echo spacing(T_(E)),pyrite content,clay mineral type,and clay content on T_(1)-T_(2)spectra at different NMR frequencies.Furthermore,the frequency conversion cross-plots for various petrophysical parameters influenced by the above factors are established.This study provides new insights into NMRbased petrophysical characterization and the frequency conversion of petrophysical parameters measured by laboratory NMR instruments and NMR logging in shale oil reservoirs.It is of great significance for the efficient exploration and environmentally friendly production of shale oil.展开更多
The prediction of slope stability is a complex nonlinear problem.This paper proposes a new method based on the random forest(RF)algorithm to study the rocky slopes stability.Taking the Bukit Merah,Perak and Twin Peak(...The prediction of slope stability is a complex nonlinear problem.This paper proposes a new method based on the random forest(RF)algorithm to study the rocky slopes stability.Taking the Bukit Merah,Perak and Twin Peak(Kuala Lumpur)as the study area,the slope characteristics of geometrical parameters are obtained from a multidisciplinary approach(consisting of geological,geotechnical,and remote sensing analyses).18 factors,including rock strength,rock quality designation(RQD),joint spacing,continuity,openness,roughness,filling,weathering,water seepage,temperature,vegetation index,water index,and orientation,are selected to construct model input variables while the factor of safety(FOS)functions as an output.The area under the curve(AUC)value of the receiver operating characteristic(ROC)curve is obtained with precision and accuracy and used to analyse the predictive model ability.With a large training set and predicted parameters,an area under the ROC curve(the AUC)of 0.95 is achieved.A precision score of 0.88 is obtained,indicating that the model has a low false positive rate and correctly identifies a substantial number of true positives.The findings emphasise the importance of using a variety of terrain characteristics and different approaches to characterise the rock slope.展开更多
This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,...This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.展开更多
The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. ...The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P 〈 0.001), as well as in all transrectal ultrasound characteristics (P 〈 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.展开更多
This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algori...This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.展开更多
Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth diff...Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.展开更多
Secure storage yard is one of the optimal core goals of container transportation;thus,making the necessary storage arrangements has become the most crucial part of the container terminal management systems(CTMS).Thi...Secure storage yard is one of the optimal core goals of container transportation;thus,making the necessary storage arrangements has become the most crucial part of the container terminal management systems(CTMS).This paper investigates a random hybrid stacking algorithm(RHSA) for outbound containers that randomly enter the yard.In the first stage of RHSA,the distribution among blocks was analyzed with respect to the utilization ratio.In the second stage,the optimization of bay configuration was carried out by using the hybrid genetic algorithm.Moreover,an experiment was performed to test the RHSA.The results show that the explored algorithm is useful to increase the efficiency.展开更多
文摘Polynomial-time randomized algorithms were constructed to approximately solve optimal robust performance controller design problems in probabilistic sense and the rigorous mathematical justification of the approach was given. The randomized algorithms here were based on a property from statistical learning theory known as (uniform) convergence of empirical means (UCEM). It is argued that in order to assess the performance of a controller as the plant varies over a pre-specified family, it is better to use the average performance of the controller as the objective function to be optimized, rather than its worst-case performance. The approach is illustrated to be efficient through an example.
文摘The sampling problem for input-queued (IQ) randomized scheduling algorithms is analyzed.We observe that if the current scheduling decision is a maximum weighted matching (MWM),the MWM for the next slot mostly falls in those matchings whose weight is closed to the current MWM.Using this heuristic,a novel randomized algorithm for IQ scheduling,named genetic algorithm-like scheduling algorithm (GALSA),is proposed.Evolutionary strategy is used for choosing sampling points in GALSA.GALSA works with only O(N) samples which means that GALSA has lower complexity than the famous randomized scheduling algorithm,APSARA.Simulation results show that the delay performance of GALSA is quite competitive with respect to that of APSARA.
基金supported by the National Natural Science Foundation of China(32273037 and 32102636)the Guangdong Major Project of Basic and Applied Basic Research(2020B0301030007)+4 种基金Laboratory of Lingnan Modern Agriculture Project(NT2021007)the Guangdong Science and Technology Innovation Leading Talent Program(2019TX05N098)the 111 Center(D20008)the double first-class discipline promotion project(2023B10564003)the Department of Education of Guangdong Province(2019KZDXM004 and 2019KCXTD001).
文摘A switch from avian-typeα-2,3 to human-typeα-2,6 receptors is an essential element for the initiation of a pandemic from an avian influenza virus.Some H9N2 viruses exhibit a preference for binding to human-typeα-2,6 receptors.This identifies their potential threat to public health.However,our understanding of the molecular basis for the switch of receptor preference is still limited.In this study,we employed the random forest algorithm to identify the potentially key amino acid sites within hemagglutinin(HA),which are associated with the receptor binding ability of H9N2 avian influenza virus(AIV).Subsequently,these sites were further verified by receptor binding assays.A total of 12 substitutions in the HA protein(N158D,N158S,A160 N,A160D,A160T,T163I,T163V,V190T,V190A,D193 N,D193G,and N231D)were predicted to prefer binding toα-2,6 receptors.Except for the V190T substitution,the other substitutions were demonstrated to display an affinity for preferential binding toα-2,6 receptors by receptor binding assays.Especially,the A160T substitution caused a significant upregulation of immune-response genes and an increased mortality rate in mice.Our findings provide novel insights into understanding the genetic basis of receptor preference of the H9N2 AIV.
基金supported in part by the National Science Foundation(NSF)(Nos.1447711,1743418,and 1843025)
文摘Feature selection is a crucial problem in efficient machine learning,and it also greatly contributes to the explainability of machine-driven decisions.Methods,like decision trees and Least Absolute Shrinkage and Selection Operator(LASSO),can select features during training.However,these embedded approaches can only be applied to a small subset of machine learning models.Wrapper based methods can select features independently from machine learning models but they often suffer from a high computational cost.To enhance their efficiency,many randomized algorithms have been designed.In this paper,we propose automatic breadth searching and attention searching adjustment approaches to further speedup randomized wrapper based feature selection.We conduct theoretical computational complexity analysis and further explain our algorithms’generic parallelizability.We conduct experiments on both synthetic and real datasets with different machine learning base models.Results show that,compared with existing approaches,our proposed techniques can locate a more meaningful set of features with a high efficiency.
基金supported by Science and Technology Project of Fujian Provincial Department of Education under contract JAT170917Youth Science and Research Foundation of Chengyi College Jimei University under contract C16005.
文摘This paper presents an improved Randomized Circle Detection (RCD) algorithm with the characteristic of circularity to detect randomized circle in images with complex background, which is not based on the Hough Transform. The experimental results denote that this algorithm can locate the circular mark of Printed Circuit Board (PCB).
基金supported by the National Natural Science Foundation of China(Grant Nos.52074085,U21A20117 and U21A20475)the Fundamental Research Funds for the Central Universities(Grant No.N2004010)the Liaoning Revitalization Talents Program(XLYC1907065).
文摘The quality of hot-rolled steel strip is directly affected by the strip crown.Traditional machine learning models have shown limitations in accurately predicting the strip crown,particularly when dealing with imbalanced data.This limitation results in poor production quality and efficiency,leading to increased production costs.Thus,a novel strip crown prediction model that uses the Boruta and extremely randomized trees(Boruta-ERT)algorithms to address this issue was proposed.To improve the accuracy of our model,we utilized the synthetic minority over-sampling technique to balance the imbalance data sets.The Boruta-ERT prediction model was then used to select features and predict the strip crown.With the 2160 mm hot rolling production lines of a steel plant serving as the research object,the experimental results showed that 97.01% of prediction data have an absolute error of less than 8 lm.This level of accuracy met the control requirements for strip crown and demonstrated significant benefits for the improvement in production quality of steel strip.
基金financially supported by the National Natural Science Foundation of China(No.52174001)the National Natural Science Foundation of China(No.52004064)+1 种基金the Hainan Province Science and Technology Special Fund “Research on Real-time Intelligent Sensing Technology for Closed-loop Drilling of Oil and Gas Reservoirs in Deepwater Drilling”(ZDYF2023GXJS012)Heilongjiang Provincial Government and Daqing Oilfield's first batch of the scientific and technological key project “Research on the Construction Technology of Gulong Shale Oil Big Data Analysis System”(DQYT-2022-JS-750)。
文摘Real-time intelligent lithology identification while drilling is vital to realizing downhole closed-loop drilling. The complex and changeable geological environment in the drilling makes lithology identification face many challenges. This paper studies the problems of difficult feature information extraction,low precision of thin-layer identification and limited applicability of the model in intelligent lithologic identification. The author tries to improve the comprehensive performance of the lithology identification model from three aspects: data feature extraction, class balance, and model design. A new real-time intelligent lithology identification model of dynamic felling strategy weighted random forest algorithm(DFW-RF) is proposed. According to the feature selection results, gamma ray and 2 MHz phase resistivity are the logging while drilling(LWD) parameters that significantly influence lithology identification. The comprehensive performance of the DFW-RF lithology identification model has been verified in the application of 3 wells in different areas. By comparing the prediction results of five typical lithology identification algorithms, the DFW-RF model has a higher lithology identification accuracy rate and F1 score. This model improves the identification accuracy of thin-layer lithology and is effective and feasible in different geological environments. The DFW-RF model plays a truly efficient role in the realtime intelligent identification of lithologic information in closed-loop drilling and has greater applicability, which is worthy of being widely used in logging interpretation.
基金Under the auspices of National Natural Science Foundation of China(No.52079103)。
文摘Precise and timely prediction of crop yields is crucial for food security and the development of agricultural policies.However,crop yield is influenced by multiple factors within complex growth environments.Previous research has paid relatively little attention to the interference of environmental factors and drought on the growth of winter wheat.Therefore,there is an urgent need for more effective methods to explore the inherent relationship between these factors and crop yield,making precise yield prediction increasingly important.This study was based on four type of indicators including meteorological,crop growth status,environmental,and drought index,from October 2003 to June 2019 in Henan Province as the basic data for predicting winter wheat yield.Using the sparrow search al-gorithm combined with random forest(SSA-RF)under different input indicators,accuracy of winter wheat yield estimation was calcu-lated.The estimation accuracy of SSA-RF was compared with partial least squares regression(PLSR),extreme gradient boosting(XG-Boost),and random forest(RF)models.Finally,the determined optimal yield estimation method was used to predict winter wheat yield in three typical years.Following are the findings:1)the SSA-RF demonstrates superior performance in estimating winter wheat yield compared to other algorithms.The best yield estimation method is achieved by four types indicators’composition with SSA-RF)(R^(2)=0.805,RRMSE=9.9%.2)Crops growth status and environmental indicators play significant roles in wheat yield estimation,accounting for 46%and 22%of the yield importance among all indicators,respectively.3)Selecting indicators from October to April of the follow-ing year yielded the highest accuracy in winter wheat yield estimation,with an R^(2)of 0.826 and an RMSE of 9.0%.Yield estimates can be completed two months before the winter wheat harvest in June.4)The predicted performance will be slightly affected by severe drought.Compared with severe drought year(2011)(R^(2)=0.680)and normal year(2017)(R^(2)=0.790),the SSA-RF model has higher prediction accuracy for wet year(2018)(R^(2)=0.820).This study could provide an innovative approach for remote sensing estimation of winter wheat yield.yield.
基金supported by the National Natural Science Foundations of China(Nos.11571171,62073161,and 61473148)。
文摘Tikhonov regularization is a powerful tool for solving linear discrete ill-posed problems.However,effective methods for dealing with large-scale ill-posed problems are still lacking.The Kaczmarz method is an effective iterative projection algorithm for solving large linear equations due to its simplicity.We propose a regularized randomized extended Kaczmarz(RREK)algorithm for solving large discrete ill-posed problems via combining the Tikhonov regularization and the randomized Kaczmarz method.The convergence of the algorithm is proved.Numerical experiments illustrate that the proposed algorithm has higher accuracy and better image restoration quality compared with the existing randomized extended Kaczmarz(REK)method.
基金TheNationalScienceFundforOverseasDistinguishedYoungScholars (No .6 992 82 0 1)FoundationforUniversityKeyTeacherbytheMinist
文摘In the Internet, a group of replicated servers is commonly used in order to improve the scalability of network service. Anycast service is a new network service that can improve network load distribution and simplify certain applications. In this paper, the authors described a simple anycast service model in the Internet without significant affecting the routing and protocol processing infrastructure that was already in place, and proposed an anycast QoS routing algorithm for this model. The algorithm used randomized method to balance network load and improve its performance. Several new techniques are proposed in the algorithm, first, theminimum hops for each node are used in the algorithm, which are used as metric for computing the probability of possible out links. The metric is pre computed for each node in the network, which can simplify the network complexity and provide the routing process with useful information. Second, randomness is used at the link level and depends dynamically on the routing configuration. This provides great flexibility for the routing process, prevents the routing process from overusing certain fixed routing paths, and adequately balances the delay of the routing path. the authors assess the quality of QoS algorithm in terms of the acceptance ratio on anycast QoS requests, and the simulation results on a variety of network topologies and on various parameters show that the algorithm has good performances and can balance network load effectively.
基金The research is supported by the National Natural Science Foundation of China under Grant nos.11701409 and 11571171the Natural Science Foundation of Jiangsu Province of China under Grant BK20170591the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant 17KJB110018.
文摘The generalized singular value decomposition(GSVD)of two matrices with the same number of columns is a very useful tool in many practical applications.However,the GSVD may suffer from heavy computational time and memory requirement when the scale of the matrices is quite large.In this paper,we use random projections to capture the most of the action of the matrices and propose randomized algorithms for computing a low-rank approximation of the GSVD.Serval error bounds of the approximation are also presented for the proposed randomized algorithms.Finally,some experimental results show that the proposed randomized algorithms can achieve a good accuracy with less computational cost and storage requirement.
基金support from the National Natural Science Foundation of China(Grant Nos.52174123&52274222).
文摘This paper presents a framework for constructing surrogate models for sensitivity analysis of structural dynamics behavior.Physical models involving deformation,such as collisions,vibrations,and penetration,are devel-oped using the material point method.To reduce the computational cost of Monte Carlo simulations,response surface models are created as surrogate models for the material point system to approximate its dynamic behavior.An adaptive randomized greedy algorithm is employed to construct a sparse polynomial chaos expansion model with a fixed order,effectively balancing the accuracy and computational efficiency of the surrogate model.Based on the sparse polynomial chaos expansion,sensitivity analysis is conducted using the global finite difference and Sobol methods.Several examples of structural dynamics are provided to demonstrate the effectiveness of the proposed method in addressing structural dynamics problems.
文摘Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in different database repositories every day. Most of the review data are useful to new customers for theier further purchases as well as existing companies to view customers feedback about various products. Data Mining and Machine Leaning techniques are familiar to analyse such kind of data to visualise and know the potential use of the purchased items through online. The customers are making quality of products through their sentiments about the purchased items from different online companies. In this research work, it is analysed sentiments of Headphone review data, which is collected from online repositories. For the analysis of Headphone review data, some of the Machine Learning techniques like Support Vector Machines, Naive Bayes, Decision Trees and Random Forest Algorithms and a Hybrid method are applied to find the quality via the customers’ sentiments. The accuracy and performance of the taken algorithms are also analysed based on the three types of sentiments such as positive, negative and neutral.
基金funded by the National Natural Science Foundation of China(42174131).
文摘Characterizing the petrophysical properties holds significant importance in shale oil reservoirs.Twodimensional(2-D)nuclear magnetic resonance(NMR),a nondestructive and noninvasive technique,has numerous applications in petrophysical characterization.However,the complex occurrence states of the fluids and the highly non-uniform distributions of minerals and organic matter pose challenges in the NMR-based petrophysical characterization.A novel T_(1)-T_(2)relaxation theory is introduced for the first time in this study.The transverse and longitudinal relaxivities of pore fluids are determined based on numerical investigation and experimental analysis.Additionally,an improved random walk algorithm is proposed to,on the basis of digital shale core,simulate the effects of the hydrogen index(HI)for the organic matter,echo spacing(T_(E)),pyrite content,clay mineral type,and clay content on T_(1)-T_(2)spectra at different NMR frequencies.Furthermore,the frequency conversion cross-plots for various petrophysical parameters influenced by the above factors are established.This study provides new insights into NMRbased petrophysical characterization and the frequency conversion of petrophysical parameters measured by laboratory NMR instruments and NMR logging in shale oil reservoirs.It is of great significance for the efficient exploration and environmentally friendly production of shale oil.
基金support in providing the data and the Universiti Teknologi Malaysia supported this work under UTM Flagship CoE/RG-Coe/RG 5.2:Evaluating Surface PGA with Global Ground Motion Site Response Analyses for the highest seismic activity location in Peninsular Malaysia(Q.J130000.5022.10G47)Universiti Teknologi Malaysia-Earthquake Hazard Assessment in Peninsular Malaysia Using Probabilistic Seismic Hazard Analysis(PSHA)Method(Q.J130000.21A2.06E9).
文摘The prediction of slope stability is a complex nonlinear problem.This paper proposes a new method based on the random forest(RF)algorithm to study the rocky slopes stability.Taking the Bukit Merah,Perak and Twin Peak(Kuala Lumpur)as the study area,the slope characteristics of geometrical parameters are obtained from a multidisciplinary approach(consisting of geological,geotechnical,and remote sensing analyses).18 factors,including rock strength,rock quality designation(RQD),joint spacing,continuity,openness,roughness,filling,weathering,water seepage,temperature,vegetation index,water index,and orientation,are selected to construct model input variables while the factor of safety(FOS)functions as an output.The area under the curve(AUC)value of the receiver operating characteristic(ROC)curve is obtained with precision and accuracy and used to analyse the predictive model ability.With a large training set and predicted parameters,an area under the ROC curve(the AUC)of 0.95 is achieved.A precision score of 0.88 is obtained,indicating that the model has a low false positive rate and correctly identifies a substantial number of true positives.The findings emphasise the importance of using a variety of terrain characteristics and different approaches to characterise the rock slope.
基金supported by a project entitled Loess Plateau Region-Watershed-Slope Geological Hazard Multi-Scale Collaborative Intelligent Early Warning System of the National Key R&D Program of China(2022YFC3003404)a project of the Shaanxi Youth Science and Technology Star(2021KJXX-87)public welfare geological survey projects of Shaanxi Institute of Geologic Survey(20180301,201918,202103,and 202413).
文摘This study investigated the impacts of random negative training datasets(NTDs)on the uncertainty of machine learning models for geologic hazard susceptibility assessment of the Loess Plateau,northern Shaanxi Province,China.Based on randomly generated 40 NTDs,the study developed models for the geologic hazard susceptibility assessment using the random forest algorithm and evaluated their performances using the area under the receiver operating characteristic curve(AUC).Specifically,the means and standard deviations of the AUC values from all models were then utilized to assess the overall spatial correlation between the conditioning factors and the susceptibility assessment,as well as the uncertainty introduced by the NTDs.A risk and return methodology was thus employed to quantify and mitigate the uncertainty,with log odds ratios used to characterize the susceptibility assessment levels.The risk and return values were calculated based on the standard deviations and means of the log odds ratios of various locations.After the mean log odds ratios were converted into probability values,the final susceptibility map was plotted,which accounts for the uncertainty induced by random NTDs.The results indicate that the AUC values of the models ranged from 0.810 to 0.963,with an average of 0.852 and a standard deviation of 0.035,indicating encouraging prediction effects and certain uncertainty.The risk and return analysis reveals that low-risk and high-return areas suggest lower standard deviations and higher means across multiple model-derived assessments.Overall,this study introduces a new framework for quantifying the uncertainty of multiple training and evaluation models,aimed at improving their robustness and reliability.Additionally,by identifying low-risk and high-return areas,resource allocation for geologic hazard prevention and control can be optimized,thus ensuring that limited resources are directed toward the most effective prevention and control measures.
文摘The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P 〈 0.001), as well as in all transrectal ultrasound characteristics (P 〈 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.
文摘This paper presents a new framework for object-based classification of high-resolution hyperspectral data.This multi-step framework is based on multi-resolution segmentation(MRS)and Random Forest classifier(RFC)algorithms.The first step is to determine of weights of the input features while using the object-based approach with MRS to processing such images.Given the high number of input features,an automatic method is needed for estimation of this parameter.Moreover,we used the Variable Importance(VI),one of the outputs of the RFC,to determine the importance of each image band.Then,based on this parameter and other required parameters,the image is segmented into some homogenous regions.Finally,the RFC is carried out based on the characteristics of segments for converting them into meaningful objects.The proposed method,as well as,the conventional pixel-based RFC and Support Vector Machine(SVM)method was applied to three different hyperspectral data-sets with various spectral and spatial characteristics.These data were acquired by the HyMap,the Airborne Prism Experiment(APEX),and the Compact Airborne Spectrographic Imager(CASI)hyperspectral sensors.The experimental results show that the proposed method is more consistent for land cover mapping in various areas.The overall classification accuracy(OA),obtained by the proposed method was 95.48,86.57,and 84.29%for the HyMap,the APEX,and the CASI datasets,respectively.Moreover,this method showed better efficiency in comparison to the spectralbased classifications because the OAs of the proposed method was 5.67 and 3.75%higher than the conventional RFC and SVM classifiers,respectively.
基金supported by the Major Program of the National Natural Science Foundation of China(No.32192434)the Fundamental Research Funds of Chinese Academy of Forestry(No.CAFYBB2019ZD001)the National Key Research and Development Program of China(2016YFD060020602).
文摘Estimating the volume growth of forest ecosystems accurately is important for understanding carbon sequestration and achieving carbon neutrality goals.However,the key environmental factors affecting volume growth differ across various scales and plant functional types.This study was,therefore,conducted to estimate the volume growth of Larix and Quercus forests based on national-scale forestry inventory data in China and its influencing factors using random forest algorithms.The results showed that the model performances of volume growth in natural forests(R^(2)=0.65 for Larix and 0.66 for Quercus,respectively)were better than those in planted forests(R^(2)=0.44 for Larix and 0.40 for Quercus,respectively).In both natural and planted forests,the stand age showed a strong relative importance for volume growth(8.6%–66.2%),while the edaphic and climatic variables had a limited relative importance(<6.0%).The relationship between stand age and volume growth was unimodal in natural forests and linear increase in planted Quercus forests.And the specific locations(i.e.,altitude and aspect)of sampling plots exhibited high relative importance for volume growth in planted forests(4.1%–18.2%).Altitude positively affected volume growth in planted Larix forests but controlled volume growth negatively in planted Quercus forests.Similarly,the effects of other environmental factors on volume growth also differed in both stand origins(planted versus natural)and plant functional types(Larix versus Quercus).These results highlighted that the stand age was the most important predictor for volume growth and there were diverse effects of environmental factors on volume growth among stand origins and plant functional types.Our findings will provide a good framework for site-specific recommendations regarding the management practices necessary to maintain the volume growth in China's forest ecosystems.
基金Supported by the Research Grants from Shanghai Municipal Natural Science Foundation(No.10190502500) Shanghai Maritime University Start-up Funds,Shanghai Science&Technology Commission Projects(No.09DZ2250400) Shanghai Education Commission Project(No.J50604)
文摘Secure storage yard is one of the optimal core goals of container transportation;thus,making the necessary storage arrangements has become the most crucial part of the container terminal management systems(CTMS).This paper investigates a random hybrid stacking algorithm(RHSA) for outbound containers that randomly enter the yard.In the first stage of RHSA,the distribution among blocks was analyzed with respect to the utilization ratio.In the second stage,the optimization of bay configuration was carried out by using the hybrid genetic algorithm.Moreover,an experiment was performed to test the RHSA.The results show that the explored algorithm is useful to increase the efficiency.