Asparagus stem blight is a devastating crop disease,and the early detection of its pathogenic spores is essential for effective disease control and prevention.However,spore detection is still hindered by complex backg...Asparagus stem blight is a devastating crop disease,and the early detection of its pathogenic spores is essential for effective disease control and prevention.However,spore detection is still hindered by complex backgrounds,small target sizes,and high annotation costs,which limit its practical application and widespread adoption.To address these issues,a semi-supervised spore detection framework is proposed for use under complex background conditions.Firstly,a difficulty perception scoring function is designed to quantify the detection difficulty of each image region.For regions with higher difficulty scores,a masking strategy is applied,while the remaining regions are adversarial augmentation is applied to encourage the model to learn fromchallenging areasmore effectively.Secondly,a Gaussian Mixture Model is employed to dynamically adjust the allocation threshold for pseudo-labels,thereby reducing the influence of unreliable supervision signals and enhancing the stability of semi-supervised learning.Finally,the Wasserstein distance is introduced for object localization refinement,offering a more robust positioning approach.Experimental results demonstrate that the proposed framework achieves 88.9% mAP50 and 60.7% mAP50-95,surpassing the baseline method by 4.2% and 4.6%,respectively,using only 10% of labeled data.In comparison with other state-of-the-art semi-supervised detection models,the proposed method exhibits superior detection accuracy and robustness.In conclusion,the framework not only offers an efficient and reliable solution for plant pathogen spore detection but also provides strong algorithmic support for real-time spore detection and early disease warning systems,with significant engineering application potential.展开更多
This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consi...This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.展开更多
Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process.Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions.Mor...Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process.Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions.More recent approaches use predictiveness comparisons with flexible machine-learning model fitting procedures to yield algorithm-agnostic tests,yet they require large labeled samples.The authors introduce a prediction-powered,semi-supervised framework that:1)Imputes responses for unlabeled data via a pretrained model;2)Corrects imputation bias with a rectifier calibrated on labeled data;3)Adaptively balances these components through a data-driven power-tuning parameter.Building on algorithm-agnostic out-of-sample predictiveness comparisons,the proposed method integrates unlabeled information to enhance power.Theoretical analyses and numerical results demonstrate that the proposed test controls Type I error and substantially improves power over fully supervised counterparts,even under imputation-model misspecification.展开更多
In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In thi...In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.展开更多
This paper proposes a new search strategy using mutative scale chaos optimization algorithm (MSCO) for model selection of support vector machine (SVM). It searches the parameter space of SVM with a very high effic...This paper proposes a new search strategy using mutative scale chaos optimization algorithm (MSCO) for model selection of support vector machine (SVM). It searches the parameter space of SVM with a very high efficiency and finds the optimum parameter setting for a practical classification problem with very low time cost. To demonstrate the performance of the proposed method it is applied to model selection of SVM in ultrasonic flaw classification and compared with grid search for model selection. Experimental results show that MSCO is a very powerful tool for model selection of SVM, and outperforms grid search in search speed and precision in ultrasonic flaw classification.展开更多
Owing to the radical changing of Chinese economy, it is essential to build an effective financial distress prediction model. In this paper, we present a genetic algorithm (GA) approach for optimizing parameters of s...Owing to the radical changing of Chinese economy, it is essential to build an effective financial distress prediction model. In this paper, we present a genetic algorithm (GA) approach for optimizing parameters of support vector machine (SVM). We validate the proposed model on datasets of Chinese high-tech manufacturing industry. Experimental results reveal that the proposed GAo SVM model can compare to and even outperform other exiting classifiers. Compared to grid-search algorithm, the proposed GA-based takes less time to optimize SVM parameter without degrading the prediction accuracy of SVM.展开更多
基金supported by Development of asparagus price database based on agricultural big data(381724).
文摘Asparagus stem blight is a devastating crop disease,and the early detection of its pathogenic spores is essential for effective disease control and prevention.However,spore detection is still hindered by complex backgrounds,small target sizes,and high annotation costs,which limit its practical application and widespread adoption.To address these issues,a semi-supervised spore detection framework is proposed for use under complex background conditions.Firstly,a difficulty perception scoring function is designed to quantify the detection difficulty of each image region.For regions with higher difficulty scores,a masking strategy is applied,while the remaining regions are adversarial augmentation is applied to encourage the model to learn fromchallenging areasmore effectively.Secondly,a Gaussian Mixture Model is employed to dynamically adjust the allocation threshold for pseudo-labels,thereby reducing the influence of unreliable supervision signals and enhancing the stability of semi-supervised learning.Finally,the Wasserstein distance is introduced for object localization refinement,offering a more robust positioning approach.Experimental results demonstrate that the proposed framework achieves 88.9% mAP50 and 60.7% mAP50-95,surpassing the baseline method by 4.2% and 4.6%,respectively,using only 10% of labeled data.In comparison with other state-of-the-art semi-supervised detection models,the proposed method exhibits superior detection accuracy and robustness.In conclusion,the framework not only offers an efficient and reliable solution for plant pathogen spore detection but also provides strong algorithmic support for real-time spore detection and early disease warning systems,with significant engineering application potential.
基金Under the auspices of National Natural Science Foundation of China (No. 40671133)Fundamental Research Funds for the Central Universities (No. GK200902015)
文摘This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.
基金supported by the National Key R&D Program of China under Grant Nos.2022YFA1003800 and 2022YFA1003703the National Natural Science Foundation of China under Grant Nos.12531011,12231011 and 12471255+3 种基金the Natural Science Foundation of Shanghai under Grant No.23ZR1419400the Fundamental Research Funds for the Central Universities under Grant No.63253110supported by China Postdoctoral Science Foundation General Funding Program under Grant No.2025M7730792025 Annual Planning Project of the Commerce Statistical Society of China under Grant No.2025STY115。
文摘Model checking evaluates whether a statistical model faithfully captures the underlying data-generating process.Classical tests—such as local-smoothing and empirical-process methods—break down in high dimensions.More recent approaches use predictiveness comparisons with flexible machine-learning model fitting procedures to yield algorithm-agnostic tests,yet they require large labeled samples.The authors introduce a prediction-powered,semi-supervised framework that:1)Imputes responses for unlabeled data via a pretrained model;2)Corrects imputation bias with a rectifier calibrated on labeled data;3)Adaptively balances these components through a data-driven power-tuning parameter.Building on algorithm-agnostic out-of-sample predictiveness comparisons,the proposed method integrates unlabeled information to enhance power.Theoretical analyses and numerical results demonstrate that the proposed test controls Type I error and substantially improves power over fully supervised counterparts,even under imputation-model misspecification.
基金supported in part by the National Key R&D Program of China under Grant 2018YFA0701601part by the National Natural Science Foundation of China(Grant No.U22A2002,61941104,62201605)part by Tsinghua University-China Mobile Communications Group Co.,Ltd.Joint Institute。
文摘In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.
基金Project supported by National High-Technology Research and De-velopment Program of China (Grant No .863-2001AA602021)
文摘This paper proposes a new search strategy using mutative scale chaos optimization algorithm (MSCO) for model selection of support vector machine (SVM). It searches the parameter space of SVM with a very high efficiency and finds the optimum parameter setting for a practical classification problem with very low time cost. To demonstrate the performance of the proposed method it is applied to model selection of SVM in ultrasonic flaw classification and compared with grid search for model selection. Experimental results show that MSCO is a very powerful tool for model selection of SVM, and outperforms grid search in search speed and precision in ultrasonic flaw classification.
基金Supported by the Cultivation Fund of the Key Scientific and Technical Innovation Project from Ministry of Education of China ( No.706024)the International Science Cooperation Foundation of Shanghai (No.061307041)the Excellent Youth Foundation ofShanghai (No.07A212)
文摘Owing to the radical changing of Chinese economy, it is essential to build an effective financial distress prediction model. In this paper, we present a genetic algorithm (GA) approach for optimizing parameters of support vector machine (SVM). We validate the proposed model on datasets of Chinese high-tech manufacturing industry. Experimental results reveal that the proposed GAo SVM model can compare to and even outperform other exiting classifiers. Compared to grid-search algorithm, the proposed GA-based takes less time to optimize SVM parameter without degrading the prediction accuracy of SVM.