In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues....In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.展开更多
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach...Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.展开更多
Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,pre...Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.展开更多
For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold ...For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.展开更多
Sustainable forecasting of home energy demand(SFHED)is crucial for promoting energy efficiency,minimizing environmental impact,and optimizing resource allocation.Machine learning(ML)supports SFHED by identifying patte...Sustainable forecasting of home energy demand(SFHED)is crucial for promoting energy efficiency,minimizing environmental impact,and optimizing resource allocation.Machine learning(ML)supports SFHED by identifying patterns and forecasting demand.However,conventional hyperparameter tuning methods often rely solely on minimizing average prediction errors,typically through fixed k-fold cross-validation,which overlooks error variability and limits model robustness.To address this limitation,we propose the Optimized Robust Hyperparameter Tuning for Machine Learning with Enhanced Multi-fold Cross-Validation(ORHT-ML-EMCV)framework.This method integrates statistical analysis of k-fold validation errors by incorporating their mean and variance into the optimization objective,enhancing robustness and generalizability.A weighting factor is introduced to balance accuracy and robustness,and its impact is evaluated across a range of values.A novel Enhanced Multi-Fold Cross-Validation(EMCV)technique is employed to automatically evaluate model performance across varying fold configurations without requiring a predefined k value,thereby reducing sensitivity to data splits.Using three evolutionary algorithms Genetic Algorithm(GA),Particle Swarm Optimization(PSO),and Differential Evolution(DE)we optimize two ensemble models:XGBoost and LightGBM.The optimization process minimizes both mean error and variance,with robustness assessed through cumulative distribution function(CDF)analyses.Experiments on three real-world residential datasets show the proposed method reduces worst-case Root Mean Square Error(RMSE)by up to 19.8%and narrows confidence intervals by up to 25%.Cross-household validations confirm strong generalization,achieving coefficient of determination(R²)of 0.946 and 0.972 on unseen homes.The framework offers a statistically grounded and efficient solution for robust energy forecasting.展开更多
The 91 measured values of the development height of the water-conducting fracture zone(WCFZ)in deep and thick coal seam mining faces under thick loose layer conditions were collected.Five key characteristic variables ...The 91 measured values of the development height of the water-conducting fracture zone(WCFZ)in deep and thick coal seam mining faces under thick loose layer conditions were collected.Five key characteristic variables influencing the WCFZ height were identified.After removing outliers from the dataset,a Random Forest(RF)regression model optimized by the Sparrow Search Algorithm(SSA)was constructed.The hyperparameters of the RF model were iteratively optimized by minimizing the Out-of-Bag(OOB)error,resulting in the rapid deter-mination of optimal parameters.Specifically,the SSA-RF model achieved an OOB error of 0.148,with 20 de-cision trees,a maximum depth of 8,a minimum split sample size of 2,and a minimum leaf node sample size of 1.Cross-validation experiments were performed using the trained optimal model and compared against other prediction methods.The results showed that the mining height had the most significant correlation with the development height of the WCFZ.The SSA-RF model outperformed all other models,with R2 values exceeding 0.9 across the training,validation,and test datasets.Compared to other models,the SSA-RF model demonstrates a simpler structure,stronger fitting capacity,higher predictive accuracy,and superior stability and generaliza-tion ability.It also exhibits the smallest variation in relative error across datasets,indicating excellent adapt-ability to different data conditions.Furthermore,a numerical model was developed using the hydrogeological data from the 1305 working face at Wanfukou Coal Mine,Shandong Province,China,to simulate the dynamic development of the WCFZ during mining.The SSA-RF model predicted the WCFZ height to be 69.7 m,closely aligning with the PFC2D simulation result of 65 m,with an error of less than 5%.Compared to traditional methods and numerical simulations,the SSA-RF model provides more accurate predictions,showing only a 7.23% deviation from the PFC2D simulation,while traditional empirical formulas yield deviations as large as 19.97%.These results demonstrate the SSA-RF model’s superior predictive capability,reinforcing its reliability and engineering applicability for real-world mining operations.This model holds significant potential for enhancing mining safety and optimizing planning processes,offering a more accurate and efficient approach for WCFZ height prediction.展开更多
Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban citi...Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban cities with heavy traffic flow,these challenges can result in traffic disruptions,rear-end collisions,sideswipes,and congestion as drivers struggle to make decisions.We propose a real-time detection system for on-street parking spaces using YOLO models and recommend the most suitable space based on KD-tree search.Lightweight versions of YOLOv5,YOLOv7-tiny,and YOLOv8 with different architectures are trained.Among the models,YOLOv5s with SPPF at the backbone achieved an F1-score of 0.89,which was selected for validation using k-fold cross-validation on our dataset.The Low variance and standard deviation recorded across folds indicate the model’s generalizability,reliability,and stability.Inference with KD-tree using predictions from the YOLO models recorded FPS of 37.9 for YOLOv5,67.2 for YOLOv7-tiny,and 67.0 for YOLOv8.The models successfully detect both marked and unmarked empty parking spaces on test data with varying inference speeds and FPS.These models can be efficiently deployed for real-time applications due to their high FPS,inference speed,and lightweight nature.In comparison with other state-of-the-art models,our models outperform them,further demonstrating their effectiveness.展开更多
Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-...Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-scale monitoring of Spartina alterniflora,but they require large datasets and have poor interpretability.A new method is proposed to detect Spartina alterniflora from Sentinel-2 imagery.Firstly,to get the high canopy cover and dense community characteristics of Spartina alterniflora,multi-dimensional shallow features are extracted from the imagery.Secondly,to detect different objects from satellite imagery,index features are extracted,and the statistical features of the Gray-Level Co-occurrence Matrix(GLCM)are derived using principal component analysis.Then,ensemble learning methods,including random forest,extreme gradient boosting,and light gradient boosting machine models,are employed for image classification.Meanwhile,Recursive Feature Elimination with Cross-Validation(RFECV)is used to select the best feature subset.Finally,to enhance the interpretability of the models,the best features are utilized to classify multi-temporal images and SHapley Additive exPlanations(SHAP)is combined with these classifications to explain the model prediction process.The method is validated by using Sentinel-2 imageries and previous observations of Spartina alterniflora in Chongming Island,it is found that the model combining image texture features such as GLCM covariance can significantly improve the detection accuracy of Spartina alterniflora by about 8%compared with the model without image texture features.Through multiple model comparisons and feature selection via RFECV,the selected model and eight features demonstrated good classification accuracy when applied to data from different time periods,proving that feature reduction can effectively enhance model generalization.Additionally,visualizing model decisions using SHAP revealed that the image texture feature component_1_GLCMVariance is particularly important for identifying each land cover type.展开更多
In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested.This class of models,with the proposed estimation meth...In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested.This class of models,with the proposed estimation method,is a powerful means for exploratory data analysis.展开更多
Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence...Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence matrix,twenty-two texture features were extracted from the images of coal and rock.Data dimension of the feature space reduced to four by feature selection,which was according to a separability criterion based on inter-class mean difference and within-class scatter.The experimental results show that the optimized features were effective in improving the separability of the samples and reducing the time complexity of the algorithm.In the optimized low-dimensional feature space,the coal–rock classifer was set up using the fsher discriminant method.Using the 10-fold cross-validation technique,the performance of the classifer was evaluated,and an average recognition rate of 94.12%was obtained.The results of comparative experiments show that the identifcation performance of the proposed method was superior to the texture description method based on gray histogram and gradient histogram.展开更多
Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and ...Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and leads to fatigue and an increase in the error rate. A rice light-trap insect imaging system is developed to automate rice pest identification. This system can capture the top and bottom images of each insect by two cameras to obtain more image features. A method is proposed for removing the background by color difference of two images with pests and non-pests. 156 features including color, shape and texture features of each pest are extracted into an support vector machine (SVM) classifier with radial basis kernel function. The seven-fold cross-validation is used to improve the accurate rate of pest identification. Four species of Lepidoptera rice pests are tested and achieved 97.5% average accurate rate.展开更多
文摘In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.
基金supported by the National Natural Science Foundation of China Civil Aviation Joint Fund (U1833110)Research on the Dual Prevention Mechanism and Intelligent Management Technology f or Civil Aviation Safety Risks (YK23-03-05)。
文摘Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
基金supported in part by National Sciences Foundation of China grant ( 11672001)Jiangsu Province Science and Technology Agency grant ( BE2016785)supported in part by Postgraduate Research & Practice Innovation Program of Jiangsu Province grant ( KYCX18_0156)
文摘Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.
文摘For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.
文摘Sustainable forecasting of home energy demand(SFHED)is crucial for promoting energy efficiency,minimizing environmental impact,and optimizing resource allocation.Machine learning(ML)supports SFHED by identifying patterns and forecasting demand.However,conventional hyperparameter tuning methods often rely solely on minimizing average prediction errors,typically through fixed k-fold cross-validation,which overlooks error variability and limits model robustness.To address this limitation,we propose the Optimized Robust Hyperparameter Tuning for Machine Learning with Enhanced Multi-fold Cross-Validation(ORHT-ML-EMCV)framework.This method integrates statistical analysis of k-fold validation errors by incorporating their mean and variance into the optimization objective,enhancing robustness and generalizability.A weighting factor is introduced to balance accuracy and robustness,and its impact is evaluated across a range of values.A novel Enhanced Multi-Fold Cross-Validation(EMCV)technique is employed to automatically evaluate model performance across varying fold configurations without requiring a predefined k value,thereby reducing sensitivity to data splits.Using three evolutionary algorithms Genetic Algorithm(GA),Particle Swarm Optimization(PSO),and Differential Evolution(DE)we optimize two ensemble models:XGBoost and LightGBM.The optimization process minimizes both mean error and variance,with robustness assessed through cumulative distribution function(CDF)analyses.Experiments on three real-world residential datasets show the proposed method reduces worst-case Root Mean Square Error(RMSE)by up to 19.8%and narrows confidence intervals by up to 25%.Cross-household validations confirm strong generalization,achieving coefficient of determination(R²)of 0.946 and 0.972 on unseen homes.The framework offers a statistically grounded and efficient solution for robust energy forecasting.
基金supported by the National Natural Science Foundation of China(51774199)the project of the educational department of Liaoning Province(No LJKMZ20220825).
文摘The 91 measured values of the development height of the water-conducting fracture zone(WCFZ)in deep and thick coal seam mining faces under thick loose layer conditions were collected.Five key characteristic variables influencing the WCFZ height were identified.After removing outliers from the dataset,a Random Forest(RF)regression model optimized by the Sparrow Search Algorithm(SSA)was constructed.The hyperparameters of the RF model were iteratively optimized by minimizing the Out-of-Bag(OOB)error,resulting in the rapid deter-mination of optimal parameters.Specifically,the SSA-RF model achieved an OOB error of 0.148,with 20 de-cision trees,a maximum depth of 8,a minimum split sample size of 2,and a minimum leaf node sample size of 1.Cross-validation experiments were performed using the trained optimal model and compared against other prediction methods.The results showed that the mining height had the most significant correlation with the development height of the WCFZ.The SSA-RF model outperformed all other models,with R2 values exceeding 0.9 across the training,validation,and test datasets.Compared to other models,the SSA-RF model demonstrates a simpler structure,stronger fitting capacity,higher predictive accuracy,and superior stability and generaliza-tion ability.It also exhibits the smallest variation in relative error across datasets,indicating excellent adapt-ability to different data conditions.Furthermore,a numerical model was developed using the hydrogeological data from the 1305 working face at Wanfukou Coal Mine,Shandong Province,China,to simulate the dynamic development of the WCFZ during mining.The SSA-RF model predicted the WCFZ height to be 69.7 m,closely aligning with the PFC2D simulation result of 65 m,with an error of less than 5%.Compared to traditional methods and numerical simulations,the SSA-RF model provides more accurate predictions,showing only a 7.23% deviation from the PFC2D simulation,while traditional empirical formulas yield deviations as large as 19.97%.These results demonstrate the SSA-RF model’s superior predictive capability,reinforcing its reliability and engineering applicability for real-world mining operations.This model holds significant potential for enhancing mining safety and optimizing planning processes,offering a more accurate and efficient approach for WCFZ height prediction.
基金supports this paper.Project Nos.NSTC-112-2221-E-324-003 MY3,NSTC-111-2622-E-324-002 and NSTC-112-2221-E-324-011-MY2.
文摘Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban cities with heavy traffic flow,these challenges can result in traffic disruptions,rear-end collisions,sideswipes,and congestion as drivers struggle to make decisions.We propose a real-time detection system for on-street parking spaces using YOLO models and recommend the most suitable space based on KD-tree search.Lightweight versions of YOLOv5,YOLOv7-tiny,and YOLOv8 with different architectures are trained.Among the models,YOLOv5s with SPPF at the backbone achieved an F1-score of 0.89,which was selected for validation using k-fold cross-validation on our dataset.The Low variance and standard deviation recorded across folds indicate the model’s generalizability,reliability,and stability.Inference with KD-tree using predictions from the YOLO models recorded FPS of 37.9 for YOLOv5,67.2 for YOLOv7-tiny,and 67.0 for YOLOv8.The models successfully detect both marked and unmarked empty parking spaces on test data with varying inference speeds and FPS.These models can be efficiently deployed for real-time applications due to their high FPS,inference speed,and lightweight nature.In comparison with other state-of-the-art models,our models outperform them,further demonstrating their effectiveness.
基金The National Key Research and Development Program of China under contract No.2023YFC3008204the National Natural Science Foundation of China under contract Nos 41977302 and 42476217.
文摘Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-scale monitoring of Spartina alterniflora,but they require large datasets and have poor interpretability.A new method is proposed to detect Spartina alterniflora from Sentinel-2 imagery.Firstly,to get the high canopy cover and dense community characteristics of Spartina alterniflora,multi-dimensional shallow features are extracted from the imagery.Secondly,to detect different objects from satellite imagery,index features are extracted,and the statistical features of the Gray-Level Co-occurrence Matrix(GLCM)are derived using principal component analysis.Then,ensemble learning methods,including random forest,extreme gradient boosting,and light gradient boosting machine models,are employed for image classification.Meanwhile,Recursive Feature Elimination with Cross-Validation(RFECV)is used to select the best feature subset.Finally,to enhance the interpretability of the models,the best features are utilized to classify multi-temporal images and SHapley Additive exPlanations(SHAP)is combined with these classifications to explain the model prediction process.The method is validated by using Sentinel-2 imageries and previous observations of Spartina alterniflora in Chongming Island,it is found that the model combining image texture features such as GLCM covariance can significantly improve the detection accuracy of Spartina alterniflora by about 8%compared with the model without image texture features.Through multiple model comparisons and feature selection via RFECV,the selected model and eight features demonstrated good classification accuracy when applied to data from different time periods,proving that feature reduction can effectively enhance model generalization.Additionally,visualizing model decisions using SHAP revealed that the image texture feature component_1_GLCMVariance is particularly important for identifying each land cover type.
文摘In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested.This class of models,with the proposed estimation method,is a powerful means for exploratory data analysis.
基金the National Natural Science Foundation of China(No.51134024/E0422)for the financial support
文摘Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence matrix,twenty-two texture features were extracted from the images of coal and rock.Data dimension of the feature space reduced to four by feature selection,which was according to a separability criterion based on inter-class mean difference and within-class scatter.The experimental results show that the optimized features were effective in improving the separability of the samples and reducing the time complexity of the algorithm.In the optimized low-dimensional feature space,the coal–rock classifer was set up using the fsher discriminant method.Using the 10-fold cross-validation technique,the performance of the classifer was evaluated,and an average recognition rate of 94.12%was obtained.The results of comparative experiments show that the identifcation performance of the proposed method was superior to the texture description method based on gray histogram and gradient histogram.
基金support of the National Natural Science Foundation of China (31071678)the Major Scientific and Technological Special of Zhejiang Province, China (2010C12026)+1 种基金the Ningbo Science and Technology Project, China (201002C1011001)Xiangshan Science and Technology Project, China(2010C0001)
文摘Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and leads to fatigue and an increase in the error rate. A rice light-trap insect imaging system is developed to automate rice pest identification. This system can capture the top and bottom images of each insect by two cameras to obtain more image features. A method is proposed for removing the background by color difference of two images with pests and non-pests. 156 features including color, shape and texture features of each pest are extracted into an support vector machine (SVM) classifier with radial basis kernel function. The seven-fold cross-validation is used to improve the accurate rate of pest identification. Four species of Lepidoptera rice pests are tested and achieved 97.5% average accurate rate.