Consider the nonparametric regression model Yni = g(xni) + eni, 1≤i≤n, where g is an unknown function to be estimated on [0,1], xni (1≤i≤n) are the fixed design points in the interval [0,1] and {eni,1≤i≤n} is a ...Consider the nonparametric regression model Yni = g(xni) + eni, 1≤i≤n, where g is an unknown function to be estimated on [0,1], xni (1≤i≤n) are the fixed design points in the interval [0,1] and {eni,1≤i≤n} is a triangular array of row iid random variables having median zero. The nearest neighbor median estimator gn. h (xni )=m (Yi(1)(n)…Yi(h)(n) is taken as the estimator of the unknown function g(x). Median cross validation (mcv) criterion is employed to select the smoothing parameter h . Let hn * be the smoothing parameter chosen by mcv criterion. Under mild regularity conditions, the upper and lower bounds of hn* , the rate of convergence and the weak consistency of the median cross-validated estimate gn,hn* (xni) are obtained.展开更多
To counteract small sample size,severe class imbalance and high feature redundancy in 90-day mRS prediction after stroke,this study proposes a four-stage pipeline-“ADASYN re-sampling→clinical+statistical feature scr...To counteract small sample size,severe class imbalance and high feature redundancy in 90-day mRS prediction after stroke,this study proposes a four-stage pipeline-“ADASYN re-sampling→clinical+statistical feature screening→dimensionality reduction→5-fold cross-validation”-and benchmark composite deep-learning architectures.ADASYN first balances the minority classes in the original feature space.Next,a tri-level filter(clinical domain knowledge,variance threshold,mutual information)removes clinically meaningless or redundant variables,after which PCA compresses the remaining features while preserving critical neurological signatures(e.g.,brain-herniation history).Four hybrid CNN-RNN models are trained and compared under strict 5-fold cross-validation;the optimal ensemble yields stable,clinically interpretable probabilities that can support individualized rehabilitation planning.展开更多
Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learni...Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learning(DL)approaches often face several limitations,including inefficient feature extraction,class imbalance,suboptimal classification performance,and limited interpretability,which collectively hinder their deployment in clinical settings.To address these challenges,we propose a novel DL framework for heart disease prediction that integrates a comprehensive preprocessing pipeline with an advanced classification architecture.The preprocessing stage involves label encoding and feature scaling.To address the issue of class imbalance inherent in the personal key indicators of the heart disease dataset,the localized random affine shadowsampling technique is employed,which enhances minority class representation while minimizing overfitting.At the core of the framework lies the Deep Residual Network(DeepResNet),which employs hierarchical residual transformations to facilitate efficient feature extraction and capture complex,non-linear relationships in the data.Experimental results demonstrate that the proposed model significantly outperforms existing techniques,achieving improvements of 3.26%in accuracy,3.16%in area under the receiver operating characteristics,1.09%in recall,and 1.07%in F1-score.Furthermore,robustness is validated using 10-fold crossvalidation,confirming the model’s generalizability across diverse data distributions.Moreover,model interpretability is ensured through the integration of Shapley additive explanations and local interpretable model-agnostic explanations,offering valuable insights into the contribution of individual features to model predictions.Overall,the proposed DL framework presents a robust,interpretable,and clinically applicable solution for heart disease prediction.展开更多
In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues....In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.展开更多
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach...Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.展开更多
Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,pre...Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.展开更多
For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold ...For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.展开更多
Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban citi...Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban cities with heavy traffic flow,these challenges can result in traffic disruptions,rear-end collisions,sideswipes,and congestion as drivers struggle to make decisions.We propose a real-time detection system for on-street parking spaces using YOLO models and recommend the most suitable space based on KD-tree search.Lightweight versions of YOLOv5,YOLOv7-tiny,and YOLOv8 with different architectures are trained.Among the models,YOLOv5s with SPPF at the backbone achieved an F1-score of 0.89,which was selected for validation using k-fold cross-validation on our dataset.The Low variance and standard deviation recorded across folds indicate the model’s generalizability,reliability,and stability.Inference with KD-tree using predictions from the YOLO models recorded FPS of 37.9 for YOLOv5,67.2 for YOLOv7-tiny,and 67.0 for YOLOv8.The models successfully detect both marked and unmarked empty parking spaces on test data with varying inference speeds and FPS.These models can be efficiently deployed for real-time applications due to their high FPS,inference speed,and lightweight nature.In comparison with other state-of-the-art models,our models outperform them,further demonstrating their effectiveness.展开更多
Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-...Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-scale monitoring of Spartina alterniflora,but they require large datasets and have poor interpretability.A new method is proposed to detect Spartina alterniflora from Sentinel-2 imagery.Firstly,to get the high canopy cover and dense community characteristics of Spartina alterniflora,multi-dimensional shallow features are extracted from the imagery.Secondly,to detect different objects from satellite imagery,index features are extracted,and the statistical features of the Gray-Level Co-occurrence Matrix(GLCM)are derived using principal component analysis.Then,ensemble learning methods,including random forest,extreme gradient boosting,and light gradient boosting machine models,are employed for image classification.Meanwhile,Recursive Feature Elimination with Cross-Validation(RFECV)is used to select the best feature subset.Finally,to enhance the interpretability of the models,the best features are utilized to classify multi-temporal images and SHapley Additive exPlanations(SHAP)is combined with these classifications to explain the model prediction process.The method is validated by using Sentinel-2 imageries and previous observations of Spartina alterniflora in Chongming Island,it is found that the model combining image texture features such as GLCM covariance can significantly improve the detection accuracy of Spartina alterniflora by about 8%compared with the model without image texture features.Through multiple model comparisons and feature selection via RFECV,the selected model and eight features demonstrated good classification accuracy when applied to data from different time periods,proving that feature reduction can effectively enhance model generalization.Additionally,visualizing model decisions using SHAP revealed that the image texture feature component_1_GLCMVariance is particularly important for identifying each land cover type.展开更多
Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and ...Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and leads to fatigue and an increase in the error rate. A rice light-trap insect imaging system is developed to automate rice pest identification. This system can capture the top and bottom images of each insect by two cameras to obtain more image features. A method is proposed for removing the background by color difference of two images with pests and non-pests. 156 features including color, shape and texture features of each pest are extracted into an support vector machine (SVM) classifier with radial basis kernel function. The seven-fold cross-validation is used to improve the accurate rate of pest identification. Four species of Lepidoptera rice pests are tested and achieved 97.5% average accurate rate.展开更多
Topomer CoMFA models have been used to optimize the potency of 15 biologically active acridone derivatives se- lected from the literature. Their 3D chemical structures were sliced into three acyclic R groups, to produ...Topomer CoMFA models have been used to optimize the potency of 15 biologically active acridone derivatives se- lected from the literature. Their 3D chemical structures were sliced into three acyclic R groups, to produce a fragment that is present in each training set. The analysis was successful with 3 as the number of components that provided the highest q2 results: q2 is 0.56, which is the cross-validated coefficient for the specified number of components, giving rise to 0.37 standard error of estimate (q2 stderr), and a conventional coefficient (r2) of 0.82, whose standard error of estimate is 0.24. These results provide structure-activity relationship (sar) among the compounds. The result of the To-pomer CoMFA studies was used to design novel derivatives for future studies.展开更多
Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence...Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence matrix,twenty-two texture features were extracted from the images of coal and rock.Data dimension of the feature space reduced to four by feature selection,which was according to a separability criterion based on inter-class mean difference and within-class scatter.The experimental results show that the optimized features were effective in improving the separability of the samples and reducing the time complexity of the algorithm.In the optimized low-dimensional feature space,the coal–rock classifer was set up using the fsher discriminant method.Using the 10-fold cross-validation technique,the performance of the classifer was evaluated,and an average recognition rate of 94.12%was obtained.The results of comparative experiments show that the identifcation performance of the proposed method was superior to the texture description method based on gray histogram and gradient histogram.展开更多
基金Project supported by the National Natural Science Foundation of China and the Doctoral Foundation of Education of China.
文摘Consider the nonparametric regression model Yni = g(xni) + eni, 1≤i≤n, where g is an unknown function to be estimated on [0,1], xni (1≤i≤n) are the fixed design points in the interval [0,1] and {eni,1≤i≤n} is a triangular array of row iid random variables having median zero. The nearest neighbor median estimator gn. h (xni )=m (Yi(1)(n)…Yi(h)(n) is taken as the estimator of the unknown function g(x). Median cross validation (mcv) criterion is employed to select the smoothing parameter h . Let hn * be the smoothing parameter chosen by mcv criterion. Under mild regularity conditions, the upper and lower bounds of hn* , the rate of convergence and the weak consistency of the median cross-validated estimate gn,hn* (xni) are obtained.
基金Shanghai University of Engineering Science Undergraduate Innovation Training Program(Project No.:cx2521005)。
文摘To counteract small sample size,severe class imbalance and high feature redundancy in 90-day mRS prediction after stroke,this study proposes a four-stage pipeline-“ADASYN re-sampling→clinical+statistical feature screening→dimensionality reduction→5-fold cross-validation”-and benchmark composite deep-learning architectures.ADASYN first balances the minority classes in the original feature space.Next,a tri-level filter(clinical domain knowledge,variance threshold,mutual information)removes clinically meaningless or redundant variables,after which PCA compresses the remaining features while preserving critical neurological signatures(e.g.,brain-herniation history).Four hybrid CNN-RNN models are trained and compared under strict 5-fold cross-validation;the optimal ensemble yields stable,clinically interpretable probabilities that can support individualized rehabilitation planning.
基金funded by Ongoing Research Funding Program for Project number(ORF-2025-648),King Saud University,Riyadh,Saudi Arabia.
文摘Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learning(DL)approaches often face several limitations,including inefficient feature extraction,class imbalance,suboptimal classification performance,and limited interpretability,which collectively hinder their deployment in clinical settings.To address these challenges,we propose a novel DL framework for heart disease prediction that integrates a comprehensive preprocessing pipeline with an advanced classification architecture.The preprocessing stage involves label encoding and feature scaling.To address the issue of class imbalance inherent in the personal key indicators of the heart disease dataset,the localized random affine shadowsampling technique is employed,which enhances minority class representation while minimizing overfitting.At the core of the framework lies the Deep Residual Network(DeepResNet),which employs hierarchical residual transformations to facilitate efficient feature extraction and capture complex,non-linear relationships in the data.Experimental results demonstrate that the proposed model significantly outperforms existing techniques,achieving improvements of 3.26%in accuracy,3.16%in area under the receiver operating characteristics,1.09%in recall,and 1.07%in F1-score.Furthermore,robustness is validated using 10-fold crossvalidation,confirming the model’s generalizability across diverse data distributions.Moreover,model interpretability is ensured through the integration of Shapley additive explanations and local interpretable model-agnostic explanations,offering valuable insights into the contribution of individual features to model predictions.Overall,the proposed DL framework presents a robust,interpretable,and clinically applicable solution for heart disease prediction.
文摘In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation structure in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corresponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and backward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or parameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure. Beside of MSE we will use model sparsity and further criteria for model assessment. The amount of information in the data has an influence on the selected models and the comparison of the procedures. None of the approaches was best in all scenarios. The performance of backward elimination with a suitably chosen significance level was not worse compared to the LASSO and BE models selected were much sparser, an important advantage for interpretation and transportability. Compared to global shrinkage, PWSF had better performance. Provided that the amount of information is not too small, we conclude that BE followed by PWSF is a suitable approach when variable selection is a key part of data analysis.
基金supported by the National Natural Science Foundation of China Civil Aviation Joint Fund (U1833110)Research on the Dual Prevention Mechanism and Intelligent Management Technology f or Civil Aviation Safety Risks (YK23-03-05)。
文摘Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
基金supported in part by National Sciences Foundation of China grant ( 11672001)Jiangsu Province Science and Technology Agency grant ( BE2016785)supported in part by Postgraduate Research & Practice Innovation Program of Jiangsu Province grant ( KYCX18_0156)
文摘Background Cardiovascular diseases are closely linked to atherosclerotic plaque development and rupture.Plaque progression prediction is of fundamental significance to cardiovascular research and disease diagnosis,prevention,and treatment.Generalized linear mixed models(GLMM)is an extension of linear model for categorical responses while considering the correlation among observations.Methods Magnetic resonance image(MRI)data of carotid atheroscleroticplaques were acquired from 20 patients with consent obtained and 3D thin-layer models were constructed to calculate plaque stress and strain for plaque progression prediction.Data for ten morphological and biomechanical risk factors included wall thickness(WT),lipid percent(LP),minimum cap thickness(MinCT),plaque area(PA),plaque burden(PB),lumen area(LA),maximum plaque wall stress(MPWS),maximum plaque wall strain(MPWSn),average plaque wall stress(APWS),and average plaque wall strain(APWSn)were extracted from all slices for analysis.Wall thickness increase(WTI),plaque burden increase(PBI)and plaque area increase(PAI) were chosen as three measures for plaque progression.Generalized linear mixed models(GLMM)with 5-fold cross-validation strategy were used to calculate prediction accuracy for each predictor and identify optimal predictor with the highest prediction accuracy defined as sum of sensitivity and specificity.All 201 MRI slices were randomly divided into 4 training subgroups and 1 verification subgroup.The training subgroups were used for model fitting,and the verification subgroup was used to estimate the model.All combinations(total1023)of 10 risk factors were feed to GLMM and the prediction accuracy of each predictor were selected from the point on the ROC(receiver operating characteristic)curve with the highest sum of specificity and sensitivity.Results LA was the best single predictor for PBI with the highest prediction accuracy(1.360 1),and the area under of the ROC curve(AUC)is0.654 0,followed by APWSn(1.336 3)with AUC=0.6342.The optimal predictor among all possible combinations for PBI was the combination of LA,PA,LP,WT,MPWS and MPWSn with prediction accuracy=1.414 6(AUC=0.715 8).LA was once again the best single predictor for PAI with the highest prediction accuracy(1.184 6)with AUC=0.606 4,followed by MPWSn(1. 183 2)with AUC=0.6084.The combination of PA,PB,WT,MPWS,MPWSn and APWSn gave the best prediction accuracy(1.302 5)for PAI,and the AUC value is 0.6657.PA was the best single predictor for WTI with highest prediction accuracy(1.288 7)with AUC=0.641 5,followed by WT(1.254 0),with AUC=0.6097.The combination of PA,PB,WT,LP,MinCT,MPWS and MPWS was the best predictor for WTI with prediction accuracy as 1.314 0,with AUC=0.6552.This indicated that PBI was a more predictable measure than WTI and PAI. The combinational predictors improved prediction accuracy by 9.95%,4.01%and 1.96%over the best single predictors for PAI,PBI and WTI(AUC values improved by9.78%,9.45%,and 2.14%),respectively.Conclusions The use of GLMM with 5-fold cross-validation strategy combining both morphological and biomechanical risk factors could potentially improve the accuracy of carotid plaque progression prediction.This study suggests that a linear combination of multiple predictors can provide potential improvement to existing plaque assessment schemes.
文摘For the nonparametric regression model Y-ni = g(x(ni)) + epsilon(ni)i = 1, ..., n, with regularly spaced nonrandom design, the authors study the behavior of the nonlinear wavelet estimator of g(x). When the threshold and truncation parameters are chosen by cross-validation on the everage squared error, strong consistency for the case of dyadic sample size and moment consistency for arbitrary sample size are established under some regular conditions.
基金supports this paper.Project Nos.NSTC-112-2221-E-324-003 MY3,NSTC-111-2622-E-324-002 and NSTC-112-2221-E-324-011-MY2.
文摘Unlike the detection of marked on-street parking spaces,detecting unmarked spaces poses significant challenges due to the absence of clear physical demarcation and uneven gaps caused by irregular parking.In urban cities with heavy traffic flow,these challenges can result in traffic disruptions,rear-end collisions,sideswipes,and congestion as drivers struggle to make decisions.We propose a real-time detection system for on-street parking spaces using YOLO models and recommend the most suitable space based on KD-tree search.Lightweight versions of YOLOv5,YOLOv7-tiny,and YOLOv8 with different architectures are trained.Among the models,YOLOv5s with SPPF at the backbone achieved an F1-score of 0.89,which was selected for validation using k-fold cross-validation on our dataset.The Low variance and standard deviation recorded across folds indicate the model’s generalizability,reliability,and stability.Inference with KD-tree using predictions from the YOLO models recorded FPS of 37.9 for YOLOv5,67.2 for YOLOv7-tiny,and 67.0 for YOLOv8.The models successfully detect both marked and unmarked empty parking spaces on test data with varying inference speeds and FPS.These models can be efficiently deployed for real-time applications due to their high FPS,inference speed,and lightweight nature.In comparison with other state-of-the-art models,our models outperform them,further demonstrating their effectiveness.
基金The National Key Research and Development Program of China under contract No.2023YFC3008204the National Natural Science Foundation of China under contract Nos 41977302 and 42476217.
文摘Spartina alterniflora is now listed among the world’s 100 most dangerous invasive species,severely affecting the ecological balance of coastal wetlands.Remote sensing technologies based on deep learning enable large-scale monitoring of Spartina alterniflora,but they require large datasets and have poor interpretability.A new method is proposed to detect Spartina alterniflora from Sentinel-2 imagery.Firstly,to get the high canopy cover and dense community characteristics of Spartina alterniflora,multi-dimensional shallow features are extracted from the imagery.Secondly,to detect different objects from satellite imagery,index features are extracted,and the statistical features of the Gray-Level Co-occurrence Matrix(GLCM)are derived using principal component analysis.Then,ensemble learning methods,including random forest,extreme gradient boosting,and light gradient boosting machine models,are employed for image classification.Meanwhile,Recursive Feature Elimination with Cross-Validation(RFECV)is used to select the best feature subset.Finally,to enhance the interpretability of the models,the best features are utilized to classify multi-temporal images and SHapley Additive exPlanations(SHAP)is combined with these classifications to explain the model prediction process.The method is validated by using Sentinel-2 imageries and previous observations of Spartina alterniflora in Chongming Island,it is found that the model combining image texture features such as GLCM covariance can significantly improve the detection accuracy of Spartina alterniflora by about 8%compared with the model without image texture features.Through multiple model comparisons and feature selection via RFECV,the selected model and eight features demonstrated good classification accuracy when applied to data from different time periods,proving that feature reduction can effectively enhance model generalization.Additionally,visualizing model decisions using SHAP revealed that the image texture feature component_1_GLCMVariance is particularly important for identifying each land cover type.
基金support of the National Natural Science Foundation of China (31071678)the Major Scientific and Technological Special of Zhejiang Province, China (2010C12026)+1 种基金the Ningbo Science and Technology Project, China (201002C1011001)Xiangshan Science and Technology Project, China(2010C0001)
文摘Identification and counting of rice light-trap pests are important to monitor rice pest population dynamics and make pest forecast. Identification and counting of rice light-trap pests manually is time-consuming, and leads to fatigue and an increase in the error rate. A rice light-trap insect imaging system is developed to automate rice pest identification. This system can capture the top and bottom images of each insect by two cameras to obtain more image features. A method is proposed for removing the background by color difference of two images with pests and non-pests. 156 features including color, shape and texture features of each pest are extracted into an support vector machine (SVM) classifier with radial basis kernel function. The seven-fold cross-validation is used to improve the accurate rate of pest identification. Four species of Lepidoptera rice pests are tested and achieved 97.5% average accurate rate.
文摘Topomer CoMFA models have been used to optimize the potency of 15 biologically active acridone derivatives se- lected from the literature. Their 3D chemical structures were sliced into three acyclic R groups, to produce a fragment that is present in each training set. The analysis was successful with 3 as the number of components that provided the highest q2 results: q2 is 0.56, which is the cross-validated coefficient for the specified number of components, giving rise to 0.37 standard error of estimate (q2 stderr), and a conventional coefficient (r2) of 0.82, whose standard error of estimate is 0.24. These results provide structure-activity relationship (sar) among the compounds. The result of the To-pomer CoMFA studies was used to design novel derivatives for future studies.
基金the National Natural Science Foundation of China(No.51134024/E0422)for the financial support
文摘Based on the stability and inequality of texture features between coal and rock,this study used the digital image analysis technique to propose a coal–rock interface detection method.By using gray level co-occurrence matrix,twenty-two texture features were extracted from the images of coal and rock.Data dimension of the feature space reduced to four by feature selection,which was according to a separability criterion based on inter-class mean difference and within-class scatter.The experimental results show that the optimized features were effective in improving the separability of the samples and reducing the time complexity of the algorithm.In the optimized low-dimensional feature space,the coal–rock classifer was set up using the fsher discriminant method.Using the 10-fold cross-validation technique,the performance of the classifer was evaluated,and an average recognition rate of 94.12%was obtained.The results of comparative experiments show that the identifcation performance of the proposed method was superior to the texture description method based on gray histogram and gradient histogram.