Accurately soft sensing of the mechanical properties of hot-rolled strips is essential to ensure product quality,optimize production,and reduce costs.However,it faces the difficulty caused by limited labeled samples,f...Accurately soft sensing of the mechanical properties of hot-rolled strips is essential to ensure product quality,optimize production,and reduce costs.However,it faces the difficulty caused by limited labeled samples,for which co-training based semi-supervised learning offers a potential solution.So in this paper,a novel soft sensing method for mechanical properties based on improved co-training(ICO)is proposed.Compared with the existing co-training framework,the proposed ICO introduces improvements from the aspects of multiple view partition,confidence estimation,and pseudo-label assignment.Specifically,(ⅰ)in the stage of multiple view partition,ICO integrates metallurgical mechanisms of hot rolling processes and statistical mutual information to achieve a balance between view sufficiency and independence,which improves model performance and interpretability;(ⅱ)in the stage of confidence estimation,ICO evaluates the confidence of unlabeled samples at the cluster level rather than at the level of a single sample,which facilitates the exploration of sample distribution and the selection of representative samples;(ⅲ)in the pseudo-label assignment stage,ICO adopts a safe pseudo-label algorithm(which is called SAFER by its author and originally used for each single sample)to assign pseudo-labels for cluster of samples with the highest confidence determined in the previous step stage,to take advantage of the merit of handling unlabeled samples at the cluster level mentioned above on one hand,and the merit of SAFER in enhancing the quality of pseudo-labels on the other hand.The proposed soft sensing method effectively predicts mechanical properties on the real hot rolling dataset,achieving approximately 5%improvement in R~2 compared to traditional supervised learning.展开更多
Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion...Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion algorithm took advantage of the fast optimization ability of PSO to optimize the population screening link of GA.The Simulink simulation results showed that the convergence of the fitness function of the fusion algorithm was accelerated,the system response adjustment time was reduced,and the overshoot was almost zero.Then the algorithm was applied to the steering test of agricultural robot in various scenes.After modeling the steering system of agricultural robot,the steering test results in the unloaded suspended state showed that the PID control based on fusion algorithm reduced the rise time,response adjustment time and overshoot of the system,and improved the response speed and stability of the system,compared with the artificial trial and error PID control and the PID control based on GA.The actual road steering test results showed that the PID control response rise time based on the fusion algorithm was the shortest,about 4.43 s.When the target pulse number was set to 100,the actual mean value in the steady-state regulation stage was about 102.9,which was the closest to the target value among the three control methods,and the overshoot was reduced at the same time.The steering test results under various scene states showed that the PID control based on the proposed fusion algorithm had good anti-interference ability,it can adapt to the changes of environment and load and improve the performance of the control system.It was effective in the steering control of agricultural robot.This method can provide a reference for the precise steering control of other robots.展开更多
Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting...Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting flood resource variables using single or hybrid machine learning techniques.However,class-based flood predictions have rarely been investigated,which can aid in quickly diagnosing comprehensive flood characteristics and proposing targeted management strategies.This study proposed a prediction approach of flood regime metrics and event classes coupling machine learning algorithms with clustering-deduced membership degrees.Five algorithms were adopted for this exploration.Results showed that the class membership degrees accurately determined event classes with class hit rates up to 100%,compared with the four classes clustered from nine regime metrics.The nonlinear algorithms(Multiple Linear Regression,Random Forest,and least squares-Support Vector Machine)outperformed the linear techniques(Multiple Linear Regression and Stepwise Regression)in predicting flood regime metrics.The proposed approach well predicted flood event classes with average class hit rates of 66.0%-85.4%and 47.2%-76.0%in calibration and validation periods,respectively,particularly for the slow and late flood events.The predictive capability of the proposed prediction approach for flood regime metrics and classes was considerably stronger than that of hydrological modeling approach.展开更多
This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the compl...This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research.展开更多
Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from...Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.展开更多
Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious an...Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy.展开更多
本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithmfor web page classification),简称GCo-training,并从理论上证明了...本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithmfor web page classification),简称GCo-training,并从理论上证明了算法的有效性.GCo-training在Co-training算法框架下,迭代地学习一个基于由超链接信息构造的图的半监督分类器和一个基于文本特征的Bayes分类器.基于图的半监督分类器只利用少量的标记数据,通过挖掘数据间大量的关系信息就可达到比较高的预测精度,可为Bayes分类器提供大量的标记信息;反过来学习大量标记信息后的Bayes分类器也可为基于图的分类器提供有效信息.迭代过程中,二者互相帮助,不断提高各自的性能,而后Bayes分类器可以用来预测大量未见数据的类别.在Web→KB数据集上的实验结果表明,与利用文本特征和锚文本特征的Co-training算法和基于EM的Bayes算法相比,GCo-training算法性能优越.展开更多
Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant infor...Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT (feature selection for co-training) is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method.展开更多
The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited stand...The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.展开更多
Due to the problems of few fault samples and large data fluctuations in the blast furnace(BF)ironmaking process,some transfer learning-based fault diagnosis methods are proposed.The vast majority of such methods perfo...Due to the problems of few fault samples and large data fluctuations in the blast furnace(BF)ironmaking process,some transfer learning-based fault diagnosis methods are proposed.The vast majority of such methods perform distribution adaptation by reducing the distance between data distributions and applying a classifier to generate pseudo-labels for self-training.However,since the training data is dominated by labeled source domain data,such classifiers tend to be weak classifiers in the target domain.In addition,the features generated after domain adaptation are likely to be at the decision boundary,resulting in a loss of classification performance.Hence,we propose a novel method called minimax entropy-based co-training(MMEC)that adversarially optimizes a transferable fault diagnosis model for the BF.The structure of MMEC includes a dual-view feature extractor,followed by two classifiers that compute the feature's cosine similarity to representative vector of each class.Knowledge transfer is achieved by alternately increasing and decreasing the entropy of unlabeled target samples with the classifier and the feature extractor,respectively.Transfer BF fault diagnosis experiments show that our method improves accuracy by about 5%over state-of-the-art methods.展开更多
Chinese organization name recognition is hard and important in natural language processing. To reduce tagged corpus and use untagged corpus,we presented combing Co-training with support vector machines (SVM) and condi...Chinese organization name recognition is hard and important in natural language processing. To reduce tagged corpus and use untagged corpus,we presented combing Co-training with support vector machines (SVM) and conditional random fields (CRF) to improve recognition results. Based on principles of uncorrelated and compatible,we constructed different classifiers from different views within SVM or CRF alone and combination of these two models. And we modified a heuristic untagged samples selection algorithm to reduce time complexity. Experimental results show that under the same tagged data,Co-training has 10% F-measure higher than using SVM or CRF alone; under the same F-measure,Co-training saves at most 70% of tagged data to achieve the same performance.展开更多
Precisely estimating the state of health(SOH)of lithium-ion batteries is essential for battery management systems(BMS),as it plays a key role in ensuring the safe and reliable operation of battery systems.However,curr...Precisely estimating the state of health(SOH)of lithium-ion batteries is essential for battery management systems(BMS),as it plays a key role in ensuring the safe and reliable operation of battery systems.However,current SOH estimation methods often overlook the valuable temperature information that can effectively characterize battery aging during capacity degradation.Additionally,the Elman neural network,which is commonly employed for SOH estimation,exhibits several drawbacks,including slow training speed,a tendency to become trapped in local minima,and the initialization of weights and thresholds using pseudo-random numbers,leading to unstable model performance.To address these issues,this study addresses the challenge of precise and effective SOH detection by proposing a method for estimating the SOH of lithium-ion batteries based on differential thermal voltammetry(DTV)and an SSA-Elman neural network.Firstly,two health features(HFs)considering temperature factors and battery voltage are extracted fromthe differential thermal voltammetry curves and incremental capacity curves.Next,the Sparrow Search Algorithm(SSA)is employed to optimize the initial weights and thresholds of the Elman neural network,forming the SSA-Elman neural network model.To validate the performance,various neural networks,including the proposed SSA-Elman network,are tested using the Oxford battery aging dataset.The experimental results demonstrate that the method developed in this study achieves superior accuracy and robustness,with a mean absolute error(MAE)of less than 0.9%and a rootmean square error(RMSE)below 1.4%.展开更多
基金supported in part by National Key Research&Development Program of China(2021YFB3301200)in part by the National Natural Science Foundation of China(61933015)。
文摘Accurately soft sensing of the mechanical properties of hot-rolled strips is essential to ensure product quality,optimize production,and reduce costs.However,it faces the difficulty caused by limited labeled samples,for which co-training based semi-supervised learning offers a potential solution.So in this paper,a novel soft sensing method for mechanical properties based on improved co-training(ICO)is proposed.Compared with the existing co-training framework,the proposed ICO introduces improvements from the aspects of multiple view partition,confidence estimation,and pseudo-label assignment.Specifically,(ⅰ)in the stage of multiple view partition,ICO integrates metallurgical mechanisms of hot rolling processes and statistical mutual information to achieve a balance between view sufficiency and independence,which improves model performance and interpretability;(ⅱ)in the stage of confidence estimation,ICO evaluates the confidence of unlabeled samples at the cluster level rather than at the level of a single sample,which facilitates the exploration of sample distribution and the selection of representative samples;(ⅲ)in the pseudo-label assignment stage,ICO adopts a safe pseudo-label algorithm(which is called SAFER by its author and originally used for each single sample)to assign pseudo-labels for cluster of samples with the highest confidence determined in the previous step stage,to take advantage of the merit of handling unlabeled samples at the cluster level mentioned above on one hand,and the merit of SAFER in enhancing the quality of pseudo-labels on the other hand.The proposed soft sensing method effectively predicts mechanical properties on the real hot rolling dataset,achieving approximately 5%improvement in R~2 compared to traditional supervised learning.
文摘Aiming to solve the steering instability and hysteresis of agricultural robots in the process of movement,a fusion PID control method of particle swarm optimization(PSO)and genetic algorithm(GA)was proposed.The fusion algorithm took advantage of the fast optimization ability of PSO to optimize the population screening link of GA.The Simulink simulation results showed that the convergence of the fitness function of the fusion algorithm was accelerated,the system response adjustment time was reduced,and the overshoot was almost zero.Then the algorithm was applied to the steering test of agricultural robot in various scenes.After modeling the steering system of agricultural robot,the steering test results in the unloaded suspended state showed that the PID control based on fusion algorithm reduced the rise time,response adjustment time and overshoot of the system,and improved the response speed and stability of the system,compared with the artificial trial and error PID control and the PID control based on GA.The actual road steering test results showed that the PID control response rise time based on the fusion algorithm was the shortest,about 4.43 s.When the target pulse number was set to 100,the actual mean value in the steady-state regulation stage was about 102.9,which was the closest to the target value among the three control methods,and the overshoot was reduced at the same time.The steering test results under various scene states showed that the PID control based on the proposed fusion algorithm had good anti-interference ability,it can adapt to the changes of environment and load and improve the performance of the control system.It was effective in the steering control of agricultural robot.This method can provide a reference for the precise steering control of other robots.
基金National Key Research and Development Program of China,No.2023YFC3006704National Natural Science Foundation of China,No.42171047CAS-CSIRO Partnership Joint Project of 2024,No.177GJHZ2023097MI。
文摘Accurate prediction of flood events is important for flood control and risk management.Machine learning techniques contributed greatly to advances in flood predictions,and existing studies mainly focused on predicting flood resource variables using single or hybrid machine learning techniques.However,class-based flood predictions have rarely been investigated,which can aid in quickly diagnosing comprehensive flood characteristics and proposing targeted management strategies.This study proposed a prediction approach of flood regime metrics and event classes coupling machine learning algorithms with clustering-deduced membership degrees.Five algorithms were adopted for this exploration.Results showed that the class membership degrees accurately determined event classes with class hit rates up to 100%,compared with the four classes clustered from nine regime metrics.The nonlinear algorithms(Multiple Linear Regression,Random Forest,and least squares-Support Vector Machine)outperformed the linear techniques(Multiple Linear Regression and Stepwise Regression)in predicting flood regime metrics.The proposed approach well predicted flood event classes with average class hit rates of 66.0%-85.4%and 47.2%-76.0%in calibration and validation periods,respectively,particularly for the slow and late flood events.The predictive capability of the proposed prediction approach for flood regime metrics and classes was considerably stronger than that of hydrological modeling approach.
基金supported by the Research Project of China Southern Power Grid(No.056200KK52222031).
文摘This paper proposes an equivalent modeling method for photovoltaic(PV)power stations via a particle swarm optimization(PSO)K-means clustering(KMC)algorithm with passive filter parameter clustering to address the complexities,simulation time cost and convergence problems of detailed PV power station models.First,the amplitude–frequency curves of different filter parameters are analyzed.Based on the results,a grouping parameter set for characterizing the external filter characteristics is established.These parameters are further defined as clustering parameters.A single PV inverter model is then established as a prerequisite foundation.The proposed equivalent method combines the global search capability of PSO with the rapid convergence of KMC,effectively overcoming the tendency of KMC to become trapped in local optima.This approach enhances both clustering accuracy and numerical stability when determining equivalence for PV inverter units.Using the proposed clustering method,both a detailed PV power station model and an equivalent model are developed and compared.Simulation and hardwarein-loop(HIL)results based on the equivalent model verify that the equivalent method accurately represents the dynamic characteristics of PVpower stations and adapts well to different operating conditions.The proposed equivalent modeling method provides an effective analysis tool for future renewable energy integration research.
基金supported by the Major Science and Technology Programs in Henan Province(No.241100210100)Henan Provincial Science and Technology Research Project(No.252102211085,No.252102211105)+3 种基金Endogenous Security Cloud Network Convergence R&D Center(No.602431011PQ1)The Special Project for Research and Development in Key Areas of Guangdong Province(No.2021ZDZX1098)The Stabilization Support Program of Science,Technology and Innovation Commission of Shenzhen Municipality(No.20231128083944001)The Key scientific research projects of Henan higher education institutions(No.24A520042).
文摘Existing feature selection methods for intrusion detection systems in the Industrial Internet of Things often suffer from local optimality and high computational complexity.These challenges hinder traditional IDS from effectively extracting features while maintaining detection accuracy.This paper proposes an industrial Internet ofThings intrusion detection feature selection algorithm based on an improved whale optimization algorithm(GSLDWOA).The aim is to address the problems that feature selection algorithms under high-dimensional data are prone to,such as local optimality,long detection time,and reduced accuracy.First,the initial population’s diversity is increased using the Gaussian Mutation mechanism.Then,Non-linear Shrinking Factor balances global exploration and local development,avoiding premature convergence.Lastly,Variable-step Levy Flight operator and Dynamic Differential Evolution strategy are introduced to improve the algorithm’s search efficiency and convergence accuracy in highdimensional feature space.Experiments on the NSL-KDD and WUSTL-IIoT-2021 datasets demonstrate that the feature subset selected by GSLDWOA significantly improves detection performance.Compared to the traditional WOA algorithm,the detection rate and F1-score increased by 3.68%and 4.12%.On the WUSTL-IIoT-2021 dataset,accuracy,recall,and F1-score all exceed 99.9%.
基金the National Key Research and Development Program of China(Grant No.2022YFF0711400)which provided valuable financial support and resources for my research and made it possible for me to deeply explore the unknown mysteries in the field of lunar geologythe National Space Science Data Center Youth Open Project(Grant No.NSSDC2302001),which has not only facilitated the smooth progress of my research,but has also built a platform for me to communicate and cooperate with experts in the field.
文摘Impact craters are important for understanding the evolution of lunar geologic and surface erosion rates,among other functions.However,the morphological characteristics of these micro impact craters are not obvious and they are numerous,resulting in low detection accuracy by deep learning models.Therefore,we proposed a new multi-scale fusion crater detection algorithm(MSF-CDA)based on the YOLO11 to improve the accuracy of lunar impact crater detection,especially for small craters with a diameter of<1 km.Using the images taken by the LROC(Lunar Reconnaissance Orbiter Camera)at the Chang’e-4(CE-4)landing area,we constructed three separate datasets for craters with diameters of 0-70 m,70-140 m,and>140 m.We then trained three submodels separately with these three datasets.Additionally,we designed a slicing-amplifying-slicing strategy to enhance the ability to extract features from small craters.To handle redundant predictions,we proposed a new Non-Maximum Suppression with Area Filtering method to fuse the results in overlapping targets within the multi-scale submodels.Finally,our new MSF-CDA method achieved high detection performance,with the Precision,Recall,and F1 score having values of 0.991,0.987,and 0.989,respectively,perfectly addressing the problems induced by the lesser features and sample imbalance of small craters.Our MSF-CDA can provide strong data support for more in-depth study of the geological evolution of the lunar surface and finer geological age estimations.This strategy can also be used to detect other small objects with lesser features and sample imbalance problems.We detected approximately 500,000 impact craters in an area of approximately 214 km2 around the CE-4 landing area.By statistically analyzing the new data,we updated the distribution function of the number and diameter of impact craters.Finally,we identified the most suitable lighting conditions for detecting impact crater targets by analyzing the effect of different lighting conditions on the detection accuracy.
文摘本文充分利用网页数据的超链接关系和文本信息,提出了一种用于网页分类的归纳式半监督学习算法:基于图的Co-training网页分类算法(Graph based Co-training algorithmfor web page classification),简称GCo-training,并从理论上证明了算法的有效性.GCo-training在Co-training算法框架下,迭代地学习一个基于由超链接信息构造的图的半监督分类器和一个基于文本特征的Bayes分类器.基于图的半监督分类器只利用少量的标记数据,通过挖掘数据间大量的关系信息就可达到比较高的预测精度,可为Bayes分类器提供大量的标记信息;反过来学习大量标记信息后的Bayes分类器也可为基于图的分类器提供有效信息.迭代过程中,二者互相帮助,不断提高各自的性能,而后Bayes分类器可以用来预测大量未见数据的类别.在Web→KB数据集上的实验结果表明,与利用文本特征和锚文本特征的Co-training算法和基于EM的Bayes算法相比,GCo-training算法性能优越.
基金Project supported by the National Natural Science Foundation of China (Grant No.20503015).
文摘Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT (feature selection for co-training) is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method.
基金supported by National Natural Science Foundation of China (No. 51674032)
文摘The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.
基金supported in part by the National Natural Science Foundation of China(61933015)in part by the Central University Basic Research Fund of China under Grant K20200002(for NGICS Platform,Zhejiang University)。
文摘Due to the problems of few fault samples and large data fluctuations in the blast furnace(BF)ironmaking process,some transfer learning-based fault diagnosis methods are proposed.The vast majority of such methods perform distribution adaptation by reducing the distance between data distributions and applying a classifier to generate pseudo-labels for self-training.However,since the training data is dominated by labeled source domain data,such classifiers tend to be weak classifiers in the target domain.In addition,the features generated after domain adaptation are likely to be at the decision boundary,resulting in a loss of classification performance.Hence,we propose a novel method called minimax entropy-based co-training(MMEC)that adversarially optimizes a transferable fault diagnosis model for the BF.The structure of MMEC includes a dual-view feature extractor,followed by two classifiers that compute the feature's cosine similarity to representative vector of each class.Knowledge transfer is achieved by alternately increasing and decreasing the entropy of unlabeled target samples with the classifier and the feature extractor,respectively.Transfer BF fault diagnosis experiments show that our method improves accuracy by about 5%over state-of-the-art methods.
基金National Natural Science Foundations of China (No.60873179, No.60803078)
文摘Chinese organization name recognition is hard and important in natural language processing. To reduce tagged corpus and use untagged corpus,we presented combing Co-training with support vector machines (SVM) and conditional random fields (CRF) to improve recognition results. Based on principles of uncorrelated and compatible,we constructed different classifiers from different views within SVM or CRF alone and combination of these two models. And we modified a heuristic untagged samples selection algorithm to reduce time complexity. Experimental results show that under the same tagged data,Co-training has 10% F-measure higher than using SVM or CRF alone; under the same F-measure,Co-training saves at most 70% of tagged data to achieve the same performance.
基金supported by the National Natural Science Foundation of China(NSFC)under Grant(No.51677058).
文摘Precisely estimating the state of health(SOH)of lithium-ion batteries is essential for battery management systems(BMS),as it plays a key role in ensuring the safe and reliable operation of battery systems.However,current SOH estimation methods often overlook the valuable temperature information that can effectively characterize battery aging during capacity degradation.Additionally,the Elman neural network,which is commonly employed for SOH estimation,exhibits several drawbacks,including slow training speed,a tendency to become trapped in local minima,and the initialization of weights and thresholds using pseudo-random numbers,leading to unstable model performance.To address these issues,this study addresses the challenge of precise and effective SOH detection by proposing a method for estimating the SOH of lithium-ion batteries based on differential thermal voltammetry(DTV)and an SSA-Elman neural network.Firstly,two health features(HFs)considering temperature factors and battery voltage are extracted fromthe differential thermal voltammetry curves and incremental capacity curves.Next,the Sparrow Search Algorithm(SSA)is employed to optimize the initial weights and thresholds of the Elman neural network,forming the SSA-Elman neural network model.To validate the performance,various neural networks,including the proposed SSA-Elman network,are tested using the Oxford battery aging dataset.The experimental results demonstrate that the method developed in this study achieves superior accuracy and robustness,with a mean absolute error(MAE)of less than 0.9%and a rootmean square error(RMSE)below 1.4%.