Consider the nonparametric regression model Yni = g(xni) + eni, 1≤i≤n, where g is an unknown function to be estimated on [0,1], xni (1≤i≤n) are the fixed design points in the interval [0,1] and {eni,1≤i≤n} is a ...Consider the nonparametric regression model Yni = g(xni) + eni, 1≤i≤n, where g is an unknown function to be estimated on [0,1], xni (1≤i≤n) are the fixed design points in the interval [0,1] and {eni,1≤i≤n} is a triangular array of row iid random variables having median zero. The nearest neighbor median estimator gn. h (xni )=m (Yi(1)(n)…Yi(h)(n) is taken as the estimator of the unknown function g(x). Median cross validation (mcv) criterion is employed to select the smoothing parameter h . Let hn * be the smoothing parameter chosen by mcv criterion. Under mild regularity conditions, the upper and lower bounds of hn* , the rate of convergence and the weak consistency of the median cross-validated estimate gn,hn* (xni) are obtained.展开更多
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues....In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.展开更多
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach...Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.展开更多
Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,pre...Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.展开更多
For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold ...For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.展开更多
Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban citi...Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban cities with heavy traffic flow,these challenges can result in traffic disruptions,rear-end collisions,sideswipes,and congestion as drivers struggle to make decisions.We propose a real-time detection system for on-street parking spaces using YOLO models and recommend the most suitable space based on KD-tree search.Lightweight versions of YOLOv5,YOLOv7-tiny,and YOLOv8 with different architectures are trained.Among the models,YOLOv5s with SPPF at the backbone achieved an F1-score of 0.89,which was selected for validation using k-fold cross-validation on our dataset.The Low variance and standard deviation recorded across folds indicate the model’s generalizability,reliability,and stability.Inference with KD-tree using predictions from the YOLO models recorded FPS of 37.9 for YOLOv5,67.2 for YOLOv7-tiny,and 67.0 for YOLOv8.The models successfully detect both marked and unmarked empty parking spaces on test data with varying inference speeds and FPS.These models can be efficiently deployed for real-time applications due to their high FPS,inference speed,and lightweight nature.In comparison with other state-of-the-art models,our models outperform them,further demonstrating their effectiveness.展开更多
The 91 measured values of the development height of the water-conducting fracture zone(WCFZ)in deep and thick coal seam mining faces under thick loose layer conditions were collected.Five key characteristic variables ...The 91 measured values of the development height of the water-conducting fracture zone(WCFZ)in deep and thick coal seam mining faces under thick loose layer conditions were collected.Five key characteristic variables influencing the WCFZ height were identified.After removing outliers from the dataset,a Random Forest(RF)regression model optimized by the Sparrow Search Algorithm(SSA)was constructed.The hyperparameters of the RF model were iteratively optimized by minimizing the Out-of-Bag(OOB)error,resulting in the rapid deter-mination of optimal parameters.Specifically,the SSA-RF model achieved an OOB error of 0.148,with 20 de-cision trees,a maximum depth of 8,a minimum split sample size of 2,and a minimum leaf node sample size of 1.Cross-validation experiments were performed using the trained optimal model and compared against other prediction methods.The results showed that the mining height had the most significant correlation with the development height of the WCFZ.The SSA-RF model outperformed all other models,with R2 values exceeding 0.9 across the training,validation,and test datasets.Compared to other models,the SSA-RF model demonstrates a simpler structure,stronger fitting capacity,higher predictive accuracy,and superior stability and generaliza-tion ability.It also exhibits the smallest variation in relative error across datasets,indicating excellent adapt-ability to different data conditions.Furthermore,a numerical model was developed using the hydrogeological data from the 1305 working face at Wanfukou Coal Mine,Shandong Province,China,to simulate the dynamic development of the WCFZ during mining.The SSA-RF model predicted the WCFZ height to be 69.7 m,closely aligning with the PFC2D simulation result of 65 m,with an error of less than 5%.Compared to traditional methods and numerical simulations,the SSA-RF model provides more accurate predictions,showing only a 7.23% deviation from the PFC2D simulation,while traditional empirical formulas yield deviations as large as 19.97%.These results demonstrate the SSA-RF model’s superior predictive capability,reinforcing its reliability and engineering applicability for real-world mining operations.This model holds significant potential for enhancing mining safety and optimizing planning processes,offering a more accurate and efficient approach for WCFZ height prediction.展开更多
Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-...Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-scale monitoring of Spartina alterniflora,but they require large datasets and have poor interpretability.A new method is proposed to detect Spartina alterniflora from Sentinel-2 imagery.Firstly,to get the high canopy cover and dense community characteristics of Spartina alterniflora,multi-dimensional shallow features are extracted from the imagery.Secondly,to detect different objects from satellite imagery,index features are extracted,and the statistical features of the Gray-Level Co-occurrence Matrix(GLCM)are derived using principal component analysis.Then,ensemble learning methods,including random forest,extreme gradient boosting,and light gradient boosting machine models,are employed for image classification.Meanwhile,Recursive Feature Elimination with Cross-Validation(RFECV)is used to select the best feature subset.Finally,to enhance the interpretability of the models,the best features are utilized to classify multi-temporal images and SHapley Additive exPlanations(SHAP)is combined with these classifications to explain the model prediction process.The method is validated by using Sentinel-2 imageries and previous observations of Spartina alterniflora in Chongming Island,it is found that the model combining image texture features such as GLCM covariance can significantly improve the detection accuracy of Spartina alterniflora by about 8%compared with the model without image texture features.Through multiple model comparisons and feature selection via RFECV,the selected model and eight features demonstrated good classification accuracy when applied to data from different time periods,proving that feature reduction can effectively enhance model generalization.Additionally,visualizing model decisions using SHAP revealed that the image texture feature component_1_GLCMVariance is particularly important for identifying each land cover type.展开更多
Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and ...Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and leads to fatigue and an increase in the error rate. A rice light-trap insect imaging system is developed to automate rice pest identification. This system can capture the top and bottom images of each insect by two cameras to obtain more image features. A method is proposed for removing the background by color difference of two images with pests and non-pests. 156 features including color, shape and texture features of each pest are extracted into an support vector machine (SVM) classifier with radial basis kernel function. The seven-fold cross-validation is used to improve the accurate rate of pest identification. Four species of Lepidoptera rice pests are tested and achieved 97.5% average accurate rate.展开更多
Topomer CoMFA models have been used to optimize the potency of 15 biologically active acridone derivatives se- lected from the literature. Their 3D chemical structures were sliced into three acyclic R groups, to produ...Topomer CoMFA models have been used to optimize the potency of 15 biologically active acridone derivatives se- lected from the literature. Their 3D chemical structures were sliced into three acyclic R groups, to produce a fragment that is present in each training set. The analysis was successful with 3 as the number of components that provided the highest q2 results: q2 is 0.56, which is the cross-validated coefficient for the specified number of components, giving rise to 0.37 standard error of estimate (q2 stderr), and a conventional coefficient (r2) of 0.82, whose standard error of estimate is 0.24. These results provide structure-activity relationship (sar) among the compounds. The result of the To-pomer CoMFA studies was used to design novel derivatives for future studies.展开更多
Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence...Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence matrix,twenty-two texture features were extracted from the images of coal and rock.Data dimension of the feature space reduced to four by feature selection,which was according to a separability criterion based on inter-class mean difference and within-class scatter.The experimental results show that the optimized features were effective in improving the separability of the samples and reducing the time complexity of the algorithm.In the optimized low-dimensional feature space,the coal–rock classifer was set up using the fsher discriminant method.Using the 10-fold cross-validation technique,the performance of the classifer was evaluated,and an average recognition rate of 94.12%was obtained.The results of comparative experiments show that the identifcation performance of the proposed method was superior to the texture description method based on gray histogram and gradient histogram.展开更多
This study integrates different machine learning(ML) methods and 5-fold cross-validation(CV) method to estimate the ground maximal surface settlement(MSS) induced by tunneling.We further investigate the applicability ...This study integrates different machine learning(ML) methods and 5-fold cross-validation(CV) method to estimate the ground maximal surface settlement(MSS) induced by tunneling.We further investigate the applicability of artificial intelligent(AI) based prediction through a comparative study of two tunnelling datasets with different sizes and features.Four different ML approaches,including support vector machine(SVM),random forest(RF),back-propagation neural network(BPNN),and deep neural network(DNN),are utilized.Two techniques,i.e.particle swarm optimization(PSO) and grid search(GS)methods,are adopted for hyperparameter optimization.To assess the reliability and efficiency of the predictions,three performance evaluation indicators,including the mean absolute error(MAE),root mean square error(RMSE),and Pearson correlation coefficient(R),are calculated.Our results indicate that proposed models can accurately and efficiently predict the settlement,while the RF model outperforms the other three methods on both datasets.The difference in model performance on two datasets(Datasets A and B) reveals the importance of data quality and quantity.Sensitivity analysis indicates that Dataset A is more significantly affected by geological conditions,while geometric characteristics play a more dominant role on Dataset B.展开更多
基金Project supported by the National Natural Science Foundation of China and the Doctoral Foundation of Education of China.
文摘Consider the nonparametric regression model Yni = g(xni) + eni, 1≤i≤n, where g is an unknown function to be estimated on [0,1], xni (1≤i≤n) are the fixed design points in the interval [0,1] and {eni,1≤i≤n} is a triangular array of row iid random variables having median zero. The nearest neighbor median estimator gn. h (xni )=m (Yi(1)(n)…Yi(h)(n) is taken as the estimator of the unknown function g(x). Median cross validation (mcv) criterion is employed to select the smoothing parameter h . Let hn * be the smoothing parameter chosen by mcv criterion. Under mild regularity conditions, the upper and lower bounds of hn* , the rate of convergence and the weak consistency of the median cross-validated estimate gn,hn* (xni) are obtained.
文摘In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.
基金supported by the National Natural Science Foundation of China Civil Aviation Joint Fund (U1833110)Research on the Dual Prevention Mechanism and Intelligent Management Technology f or Civil Aviation Safety Risks (YK23-03-05)。
文摘Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
基金supported in part by National Sciences Foundation of China grant ( 11672001)Jiangsu Province Science and Technology Agency grant ( BE2016785)supported in part by Postgraduate Research & Practice Innovation Program of Jiangsu Province grant ( KYCX18_0156)
文摘Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.
文摘For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.
基金supports this paper.Project Nos.NSTC-112-2221-E-324-003 MY3,NSTC-111-2622-E-324-002 and NSTC-112-2221-E-324-011-MY2.
文摘Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban cities with heavy traffic flow,these challenges can result in traffic disruptions,rear-end collisions,sideswipes,and congestion as drivers struggle to make decisions.We propose a real-time detection system for on-street parking spaces using YOLO models and recommend the most suitable space based on KD-tree search.Lightweight versions of YOLOv5,YOLOv7-tiny,and YOLOv8 with different architectures are trained.Among the models,YOLOv5s with SPPF at the backbone achieved an F1-score of 0.89,which was selected for validation using k-fold cross-validation on our dataset.The Low variance and standard deviation recorded across folds indicate the model’s generalizability,reliability,and stability.Inference with KD-tree using predictions from the YOLO models recorded FPS of 37.9 for YOLOv5,67.2 for YOLOv7-tiny,and 67.0 for YOLOv8.The models successfully detect both marked and unmarked empty parking spaces on test data with varying inference speeds and FPS.These models can be efficiently deployed for real-time applications due to their high FPS,inference speed,and lightweight nature.In comparison with other state-of-the-art models,our models outperform them,further demonstrating their effectiveness.
基金supported by the National Natural Science Foundation of China(51774199)the project of the educational department of Liaoning Province(No LJKMZ20220825).
文摘The 91 measured values of the development height of the water-conducting fracture zone(WCFZ)in deep and thick coal seam mining faces under thick loose layer conditions were collected.Five key characteristic variables influencing the WCFZ height were identified.After removing outliers from the dataset,a Random Forest(RF)regression model optimized by the Sparrow Search Algorithm(SSA)was constructed.The hyperparameters of the RF model were iteratively optimized by minimizing the Out-of-Bag(OOB)error,resulting in the rapid deter-mination of optimal parameters.Specifically,the SSA-RF model achieved an OOB error of 0.148,with 20 de-cision trees,a maximum depth of 8,a minimum split sample size of 2,and a minimum leaf node sample size of 1.Cross-validation experiments were performed using the trained optimal model and compared against other prediction methods.The results showed that the mining height had the most significant correlation with the development height of the WCFZ.The SSA-RF model outperformed all other models,with R2 values exceeding 0.9 across the training,validation,and test datasets.Compared to other models,the SSA-RF model demonstrates a simpler structure,stronger fitting capacity,higher predictive accuracy,and superior stability and generaliza-tion ability.It also exhibits the smallest variation in relative error across datasets,indicating excellent adapt-ability to different data conditions.Furthermore,a numerical model was developed using the hydrogeological data from the 1305 working face at Wanfukou Coal Mine,Shandong Province,China,to simulate the dynamic development of the WCFZ during mining.The SSA-RF model predicted the WCFZ height to be 69.7 m,closely aligning with the PFC2D simulation result of 65 m,with an error of less than 5%.Compared to traditional methods and numerical simulations,the SSA-RF model provides more accurate predictions,showing only a 7.23% deviation from the PFC2D simulation,while traditional empirical formulas yield deviations as large as 19.97%.These results demonstrate the SSA-RF model’s superior predictive capability,reinforcing its reliability and engineering applicability for real-world mining operations.This model holds significant potential for enhancing mining safety and optimizing planning processes,offering a more accurate and efficient approach for WCFZ height prediction.
基金The National Key Research and Development Program of China under contract No.2023YFC3008204the National Natural Science Foundation of China under contract Nos 41977302 and 42476217.
文摘Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-scale monitoring of Spartina alterniflora,but they require large datasets and have poor interpretability.A new method is proposed to detect Spartina alterniflora from Sentinel-2 imagery.Firstly,to get the high canopy cover and dense community characteristics of Spartina alterniflora,multi-dimensional shallow features are extracted from the imagery.Secondly,to detect different objects from satellite imagery,index features are extracted,and the statistical features of the Gray-Level Co-occurrence Matrix(GLCM)are derived using principal component analysis.Then,ensemble learning methods,including random forest,extreme gradient boosting,and light gradient boosting machine models,are employed for image classification.Meanwhile,Recursive Feature Elimination with Cross-Validation(RFECV)is used to select the best feature subset.Finally,to enhance the interpretability of the models,the best features are utilized to classify multi-temporal images and SHapley Additive exPlanations(SHAP)is combined with these classifications to explain the model prediction process.The method is validated by using Sentinel-2 imageries and previous observations of Spartina alterniflora in Chongming Island,it is found that the model combining image texture features such as GLCM covariance can significantly improve the detection accuracy of Spartina alterniflora by about 8%compared with the model without image texture features.Through multiple model comparisons and feature selection via RFECV,the selected model and eight features demonstrated good classification accuracy when applied to data from different time periods,proving that feature reduction can effectively enhance model generalization.Additionally,visualizing model decisions using SHAP revealed that the image texture feature component_1_GLCMVariance is particularly important for identifying each land cover type.
基金support of the National Natural Science Foundation of China (31071678)the Major Scientific and Technological Special of Zhejiang Province, China (2010C12026)+1 种基金the Ningbo Science and Technology Project, China (201002C1011001)Xiangshan Science and Technology Project, China(2010C0001)
文摘Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and leads to fatigue and an increase in the error rate. A rice light-trap insect imaging system is developed to automate rice pest identification. This system can capture the top and bottom images of each insect by two cameras to obtain more image features. A method is proposed for removing the background by color difference of two images with pests and non-pests. 156 features including color, shape and texture features of each pest are extracted into an support vector machine (SVM) classifier with radial basis kernel function. The seven-fold cross-validation is used to improve the accurate rate of pest identification. Four species of Lepidoptera rice pests are tested and achieved 97.5% average accurate rate.
文摘Topomer CoMFA models have been used to optimize the potency of 15 biologically active acridone derivatives se- lected from the literature. Their 3D chemical structures were sliced into three acyclic R groups, to produce a fragment that is present in each training set. The analysis was successful with 3 as the number of components that provided the highest q2 results: q2 is 0.56, which is the cross-validated coefficient for the specified number of components, giving rise to 0.37 standard error of estimate (q2 stderr), and a conventional coefficient (r2) of 0.82, whose standard error of estimate is 0.24. These results provide structure-activity relationship (sar) among the compounds. The result of the To-pomer CoMFA studies was used to design novel derivatives for future studies.
基金the National Natural Science Foundation of China(No.51134024/E0422)for the financial support
文摘Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence matrix,twenty-two texture features were extracted from the images of coal and rock.Data dimension of the feature space reduced to four by feature selection,which was according to a separability criterion based on inter-class mean difference and within-class scatter.The experimental results show that the optimized features were effective in improving the separability of the samples and reducing the time complexity of the algorithm.In the optimized low-dimensional feature space,the coal–rock classifer was set up using the fsher discriminant method.Using the 10-fold cross-validation technique,the performance of the classifer was evaluated,and an average recognition rate of 94.12%was obtained.The results of comparative experiments show that the identifcation performance of the proposed method was superior to the texture description method based on gray histogram and gradient histogram.
基金supported by the Natural Sciences and Engineering Research Council of Canada(NSERC)-Discovery Grant(Grant No.RGPIN-2019-06471)the McMaster University Engineering Life Event Fund。
文摘This study integrates different machine learning(ML) methods and 5-fold cross-validation(CV) method to estimate the ground maximal surface settlement(MSS) induced by tunneling.We further investigate the applicability of artificial intelligent(AI) based prediction through a comparative study of two tunnelling datasets with different sizes and features.Four different ML approaches,including support vector machine(SVM),random forest(RF),back-propagation neural network(BPNN),and deep neural network(DNN),are utilized.Two techniques,i.e.particle swarm optimization(PSO) and grid search(GS)methods,are adopted for hyperparameter optimization.To assess the reliability and efficiency of the predictions,three performance evaluation indicators,including the mean absolute error(MAE),root mean square error(RMSE),and Pearson correlation coefficient(R),are calculated.Our results indicate that proposed models can accurately and efficiently predict the settlement,while the RF model outperforms the other three methods on both datasets.The difference in model performance on two datasets(Datasets A and B) reveals the importance of data quality and quantity.Sensitivity analysis indicates that Dataset A is more significantly affected by geological conditions,while geometric characteristics play a more dominant role on Dataset B.