The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited stand...The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.展开更多
This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consi...This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.展开更多
Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant infor...Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT (feature selection for co-training) is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method.展开更多
In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In thi...In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.展开更多
Ethylene glycol(EG)plays a pivotal role as a primary raw material in the polyester industry,and the syngas-to-EG route has become a significant technical route in production.The carbon monoxide(CO)gas-phase catalytic ...Ethylene glycol(EG)plays a pivotal role as a primary raw material in the polyester industry,and the syngas-to-EG route has become a significant technical route in production.The carbon monoxide(CO)gas-phase catalytic coupling to synthesize dimethyl oxalate(DMO)is a crucial process in the syngas-to-EG route,whereby the composition of the reactor outlet exerts influence on the ultimate quality of the EG product and the energy consumption during the subsequent separation process.However,measuring product quality in real time or establishing accurate dynamic mechanism models is challenging.To effectively model the DMO synthesis process,this study proposes a hybrid modeling strategy that integrates process mechanisms and data-driven approaches.The CO gas-phase catalytic coupling mechanism model is developed based on intrinsic kinetics and material balance,while a long short-term memory(LSTM)neural network is employed to predict the macroscopic reaction rate by leveraging temporal relationships derived from archived measurements.The proposed model is trained semi-supervised to accommodate limited-label data scenarios,leveraging historical data.By integrating these predictions with the mechanism model,the hybrid modeling approach provides reliable and interpretable forecasts of mass fractions.Empirical investigations unequivocally validate the superiority of the proposed hybrid modeling approach over conventional data-driven models(DDMs)and other hybrid modeling techniques.展开更多
For large-scale radio frequency identification(RFID) indoor positioning system, the positioning scale is relatively large, with less labeled data and more unlabeled data, and it is easily affected by multipath and whi...For large-scale radio frequency identification(RFID) indoor positioning system, the positioning scale is relatively large, with less labeled data and more unlabeled data, and it is easily affected by multipath and white noise. An RFID positioning algorithm based on semi-supervised actor-critic co-training(SACC) was proposed to solve this problem. In this research, the positioning is regarded as Markov decision-making process. Firstly, the actor-critic was combined with random actions and the unlabeled best received signal arrival intensity(RSSI) data was selected by co-training of the semi-supervised. Secondly, the actor and the critic were updated by employing Kronecker-factored approximation calculate(K-FAC) natural gradient. Finally, the target position was obtained by co-locating with labeled RSSI data and the selected unlabeled RSSI data. The proposed method reduced the cost of indoor positioning significantly by decreasing the number of labeled data. Meanwhile, with the increase of the positioning targets, the actor could quickly select unlabeled RSSI data and updates the location model. Experiment shows that, compared with other RFID indoor positioning algorithms, such as twin delayed deep deterministic policy gradient(TD3), deep deterministic policy gradient(DDPG), and actor-critic using Kronecker-factored trust region(ACKTR), the proposed method decreased the average positioning error respectively by 50.226%, 41.916%, and 25.004%. Meanwhile, the positioning stability was improved by 23.430%, 28.518%, and 38.631%.展开更多
Semi-Supervised Classification (SSC),which makes use of both labeled and unlabeled data to determine classification borders in feature space,has great advantages in extracting classification information from mass data...Semi-Supervised Classification (SSC),which makes use of both labeled and unlabeled data to determine classification borders in feature space,has great advantages in extracting classification information from mass data.In this paper,a novel SSC method based on Gaussian Mixture Model (GMM) is proposed,in which each class’s feature space is described by one GMM.Experiments show the proposed method can achieve high classification accuracy with small amount of labeled data.However,for the same accuracy,supervised classification methods such as Support Vector Machine,Object Oriented Classification,etc.should be provided with much more labeled data.展开更多
As an important method for knowledge graph(KG)complementation,link prediction has become a hot research topic in recent years.In this paper,a performance enhancement scheme for link prediction models based on the idea...As an important method for knowledge graph(KG)complementation,link prediction has become a hot research topic in recent years.In this paper,a performance enhancement scheme for link prediction models based on the idea of semi-supervised learning and model soup is proposed,which effectively improves the model performance on several mainstream link prediction models with small changes to their architecture.This novel scheme consists of two main parts,one is predicting potential fact triples in the graph with semi-supervised learning strategies,the other is creatively combining semi-supervised learning and model soup to further improve the final model performance without adding significant computational overhead.Experiments validate the effectiveness of the scheme for a variety of link prediction models,especially on the dataset with dense relationships.In terms of CompGCN,the model with the best overall performance among the tested models improves its Hits@1 metric by 14.7%on the FB15K-237 dataset and 7.8%on the WN18RR dataset after using the enhancement scheme.Meanwhile,it is observed that the semi-supervised learning strategy in the augmentation scheme has a significant improvement for multi-class link prediction models,and the performance improvement brought by the introduction of the model soup is related to the specific tested models,as the performances of some models are improved while others remain largely unaffected.展开更多
Segmentation of intracranial aneurysm(IA)from computed tomography angiography(CTA)images is of significant importance for quantitative assessment of IA and further surgical treatment.Manual segmentation of IA is a lab...Segmentation of intracranial aneurysm(IA)from computed tomography angiography(CTA)images is of significant importance for quantitative assessment of IA and further surgical treatment.Manual segmentation of IA is a labor-intensive,time-consuming job and suffers from inter-and intra-observer variabilities.Training deep neural networks usually requires a large amount of labeled data,while annotating data is very time-consuming for the IA segmentation task.This paper presents a novel weight-perceptual self-ensembling model for semi-supervised IA segmentation,which employs unlabeled data by encouraging the predictions of given perturbed input samples to be consistent.Considering that the quality of consistency targets is not comparable to each other,we introduce a novel sample weight perception module to quantify the quality of different consistency targets.Our proposed module can be used to evaluate the contributions of unlabeled samples during training to force the network to focus on those well-predicted samples.We have conducted both horizontal and vertical comparisons on the clinical intracranial aneurysm CTA image dataset.Experimental results show that our proposed method can improve at least 3%Dice coefficient over the fully-supervised baseline,and at least 1.7%over other state-of-the-art semi-supervised methods.展开更多
Purpose-With the development of intelligent technology,deep learning has made significant progress and has been widely used in various fields.Deep learning is data-driven,and its training process requires a large amou...Purpose-With the development of intelligent technology,deep learning has made significant progress and has been widely used in various fields.Deep learning is data-driven,and its training process requires a large amount of data to improve model performance.However,labeled data is expensive and not readily available.Design/methodology/approach-To address the above problem,researchers have integrated semisupervised and deep learning,using a limited number of labeled data and many unlabeled data to train models.In this paper,Generative Adversarial Networks(GANs)are analyzed as an entry point.Firstly,we discuss the current research on GANs in image super-resolution applications,including supervised,unsupervised,and semi-supervised learning approaches.Secondly,based on semi-supervised learning,different optimization methods are introduced as an example of image classification.Eventually,experimental comparisons and analyses of existing semi-supervised optimization methods based on GANs will be performed.Findings-Following the analysis of the selected studies,we summarize the problems that existed during the research process and propose future research directions.Originality/value-This paper reviews and analyzes research on generative adversarial networks for image super-resolution and classification from various learning approaches.The comparative analysis of experimental results on current semi-supervised GAN optimizations is performed to provide a reference for further research.展开更多
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this prob...Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.展开更多
This study aimed to address the challenge of accurately and reliably detecting tomatoes in dense planting environments,a critical prerequisite for the automation implementation of robotic harvesting.However,the heavy ...This study aimed to address the challenge of accurately and reliably detecting tomatoes in dense planting environments,a critical prerequisite for the automation implementation of robotic harvesting.However,the heavy reliance on extensive manually annotated datasets for training deep learning models still poses significant limitations to their application in real-world agricultural production environments.To overcome these limitations,we employed domain adaptive learning approach combined with the YOLOv5 model to develop a novel tomato detection model called as TDA-YOLO(tomato detection domain adaptation).We designated the normal illumination scenes in dense planting environments as the source domain and utilized various other illumination scenes as the target domain.To construct bridge mechanism between source and target domains,neural preset for color style transfer is introduced to generate a pseudo-dataset,which served to deal with domain discrepancy.Furthermore,this study combines the semi-supervised learning method to enable the model to extract domain-invariant features more fully,and uses knowledge distillation to improve the model's ability to adapt to the target domain.Additionally,for purpose of promoting inference speed and low computational demand,the lightweight FasterNet network was integrated into the YOLOv5's C3 module,creating a modified C3_Faster module.The experimental results demonstrated that the proposed TDA-YOLO model significantly outperformed original YOLOv5s model,achieving a mAP(mean average precision)of 96.80%for tomato detection across diverse scenarios in dense planting environments,increasing by 7.19 percentage points;Compared with the latest YOLOv8 and YOLOv9,it is also 2.17 and 1.19 percentage points higher,respectively.The model's average detection time per image was an impressive 15 milliseconds,with a FLOPs(floating point operations per second)count of 13.8 G.After acceleration processing,the detection accuracy of the TDA-YOLO model on the Jetson Xavier NX development board is 90.95%,the mAP value is 91.35%,and the detection time of each image is 21 ms,which can still meet the requirements of real-time detection of tomatoes in dense planting environment.The experimental results show that the proposed TDA-YOLO model can accurately and quickly detect tomatoes in dense planting environment,and at the same time avoid the use of a large number of annotated data,which provides technical support for the development of automatic harvesting systems for tomatoes and other fruits.展开更多
Most heating,ventilation,and air-conditioning(HVAC)systems operate with one or more faults that result in increased energy consumption and that could lead to system failure over time.Today,most building owners are per...Most heating,ventilation,and air-conditioning(HVAC)systems operate with one or more faults that result in increased energy consumption and that could lead to system failure over time.Today,most building owners are performing reactive maintenance only and may be less concerned or less able to assess the health of the system until catastrophic failure occurs.This is mainly because the building owners do not previously have good tools to detect and diagnose these faults,determine their impact,and act on findings.Commercially available fault detection and diagnostics(FDD)tools have been developed to address this issue and have the potential to reduce equipment downtime,energy costs,maintenance costs,and improve occupant comfort and system reliability.However,many of these tools require an in-depth knowledge of system behavior and thermodynamic principles to interpret the results.In this paper,supervised and semi-supervised machine learning(ML)approaches are applied to datasets collected from an operating system in the field to develop new FDD methods and to help building owners see the value proposition of performing proactive maintenance.The study data was collected from one packaged rooftop unit(RTU)HVAC system running under normal operating conditions at an industrial facility in Connecticut.This paper compares three different approaches for fault classification for a real-time operating RTU using semi-supervised learning,achieving accuracies as high as 95.7%using few-shot learning.展开更多
X-ray inspection equipment is divided into small baggage inspection equipment and large cargo inspection equipment.In the case of inspection using X-ray scanning equipment,it is possible to identify the contents of go...X-ray inspection equipment is divided into small baggage inspection equipment and large cargo inspection equipment.In the case of inspection using X-ray scanning equipment,it is possible to identify the contents of goods,unauthorized transport,or hidden goods in real-time by-passing cargo through X-rays without opening it.In this paper,we propose a system for detecting dangerous objects in X-ray images using the Cascade Region-based Convolutional Neural Network(Cascade R-CNN)model,and the data used for learning consists of dangerous goods,storage media,firearms,and knives.In addition,to minimize the overfitting problem caused by the lack of data to be used for artificial intelligence(AI)training,data samples are increased by using the CP(copy-paste)algorithm on the existing data.It also solves the data labeling problem by mixing supervised and semi-supervised learning.The four comparative models to be used in this study are Faster Regionbased Convolutional Neural Networks Residual2 Network-101(Faster R-CNN_Res2Net-101)supervised learning,Cascade R-CNN_Res2Net-101_supervised learning,Cascade Region-based Convolutional Neural Networks Composite Backbone Network V2(CBNetV2)Network-101(Cascade R-CNN_CBNetV2Net-101)_supervised learning,and Cascade RCNN_CBNetV2-101_semi-supervised learning which are then compared and evaluated.As a result of comparing the performance of the four models in this paper,in case of Cascade R-CNN_CBNetV2-101_semi-supervised learning,Average Precision(AP)(Intersection over Union(IoU)=0.5):0.7%,AP(IoU=0.75):1.0%than supervised learning,Recall:0.8%higher.展开更多
The recent years have witnessed a surge of interests in graph-based semi-supervised learning(GBSSL).In this paper,we will introduce a series of works done by our group on this topic including:1)a method called linear ...The recent years have witnessed a surge of interests in graph-based semi-supervised learning(GBSSL).In this paper,we will introduce a series of works done by our group on this topic including:1)a method called linear neighborhood propagation(LNP)which can automatically construct the optimal graph;2)a novel multilevel scheme to make our algorithm scalable for large data sets;3)a generalized point charge scheme for GBSSL;4)a multilabel GBSSL method by solving a Sylvester equation;5)an information fusion framework for GBSSL;and 6)an application of GBSSL on fMRI image segmentation.展开更多
Automatic classification of blog entries is generally treated as a semi-supervised machine learning task, in which the blog entries are automatically assigned to one of a set of pre-defined classes based on the featur...Automatic classification of blog entries is generally treated as a semi-supervised machine learning task, in which the blog entries are automatically assigned to one of a set of pre-defined classes based on the features extracted from their textual content. This paper attempts automatic classification of unstructured blog entries by following pre-processing steps like tokenization, stop-word elimination and stemming;statistical techniques for feature set extraction, and feature set enhancement using semantic resources followed by modeling using two alternative machine learning models—the na?ve Bayesian model and the artificial neural network model. Empirical evaluations indicate that this multi-step classification approach has resulted in good overall classification accuracy over unstructured blog text datasets with both machine learning model alternatives. However, the na?ve Bayesian classification model clearly out-performs the ANN based classification model when a smaller feature-set is available which is usually the case when a blog topic is recent and the number of training datasets available is restricted.展开更多
Co-training is a famous semi-supervised learning algorithm which can exploit unlabeled data to improve learning performance.Generally it works under a two-view setting (the input examples have two disjoint feature set...Co-training is a famous semi-supervised learning algorithm which can exploit unlabeled data to improve learning performance.Generally it works under a two-view setting (the input examples have two disjoint feature sets in nature),with the assumption that each view is sufficient to predict the label.However,in real-world applications due to feature corruption or feature noise,both views may be insufficient and co-training will suffer from these insufficient views.In this paper,we propose a novel algorithm named Weighted Co-training to deal with this problem.It identifies the newly labeled examples that are probably harmful for the other view,and decreases their weights in the training set to avoid the risk.The experimental results show that Weighted Co-training performs better than the state-of-art co-training algorithms on several benchmarks.展开更多
The detection of abnormal vehicle events is a research hotspot in the analysis of highway surveillance video.Because of the complex factors,which include different conditions of weather,illumination,noise and so on,ve...The detection of abnormal vehicle events is a research hotspot in the analysis of highway surveillance video.Because of the complex factors,which include different conditions of weather,illumination,noise and so on,vehicle's feature extraction and abnormity detection become difficult.This paper proposes a Fast Constrained Delaunay Triangulation(FCDT) algorithm to replace complicated segmentation algorithms for multi-feature extraction.Based on the video frames segmented by FCDT,an improved algorithm is presented to estimate background self-adaptively.After the estimation,a multi-feature eigenvector is generated by Principal Component Analysis(PCA) in accordance with the static and motional features extracted through locating and tracking each vehicle.For abnormity detection,adaptive detection modeling of vehicle events(ADMVE) is presented,for which a semi-supervised Mixture of Gaussian Hidden Markov Model(MGHMM) is trained with the multi-feature eigenvectors from each video segment.The normal model is developed by supervised mode with manual labeling,and becomes more accurate via iterated adaptation.The abnormal models are trained through the adapted Bayesian learning with unsupervised mode.The paper also presents experiments using real video sequence to verify the proposed method.展开更多
As a supplementary of [Xu L. Front. Electr. Electron. Eng. China, 2010, 5(3): 281-328], this paper outlines current status of efforts made on Bayesian Ying- Yang (BYY) harmony learning, plus gene analysis appli- ...As a supplementary of [Xu L. Front. Electr. Electron. Eng. China, 2010, 5(3): 281-328], this paper outlines current status of efforts made on Bayesian Ying- Yang (BYY) harmony learning, plus gene analysis appli- cations. At the beginning, a bird's-eye view is provided via Gaussian mixture in comparison with typical learn- ing algorithms and model selection criteria. Particularly, semi-supervised learning is covered simply via choosing a scalar parameter. Then, essential topics and demand- ing issues about BYY system design and BYY harmony learning are systematically outlined, with a modern per- spective on Yin-Yang viewpoint discussed, another Yang factorization addressed, and coordinations across and within Ying-Yang summarized. The BYY system acts as a unified framework to accommodate unsupervised, su- pervised, and semi-supervised learning all in one formu- lation, while the best harmony learning provides novelty and strength to automatic model selection. Also, mathe- matical formulation of harmony functional has been ad- dressed as a unified scheme for measuring the proximity to be considered in a BYY system, and used as the best choice among others. Moreover, efforts are made on a number of learning tasks, including a mode-switching factor analysis proposed as a semi-blind learning frame- work for several types of independent factor analysis, a hidden Markov model (HMM) gated temporal fac- tor analysis suggested for modeling piecewise stationary temporal dependence, and a two-level hierarchical Gaus- sian mixture extended to cover semi-supervised learning, as well as a manifold learning modified to facilitate au- tomatic model selection. Finally, studies are applied to the problems of gene analysis, such as genome-wide asso- ciation, exome sequencing analysis, and gene transcrip- tional regulation.展开更多
基金supported by National Natural Science Foundation of China (No. 51674032)
文摘The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.
基金Under the auspices of National Natural Science Foundation of China (No. 40671133)Fundamental Research Funds for the Central Universities (No. GK200902015)
文摘This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.
基金Project supported by the National Natural Science Foundation of China (Grant No.20503015).
文摘Co-training is a semi-supervised learning method, which employs two complementary learners to label the unlabeled data for each other and to predict the test sample together. Previous studies show that redundant information can help improve the ratio of prediction accuracy between semi-supervised learning methods and supervised learning methods. However, redundant information often practically hurts the performance of learning machines. This paper investigates what redundant features have effect on the semi-supervised learning methods, e.g. co-training, and how to remove the redundant features as well as the irrelevant features. Here, FESCOT (feature selection for co-training) is proposed to improve the generalization performance of co-training with feature selection. Experimental results on artificial and real world data sets show that FESCOT helps to remove irrelevant and redundant features that hurt the performance of the co-training method.
基金supported in part by the National Key R&D Program of China under Grant 2018YFA0701601part by the National Natural Science Foundation of China(Grant No.U22A2002,61941104,62201605)part by Tsinghua University-China Mobile Communications Group Co.,Ltd.Joint Institute。
文摘In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.
基金supported in part by the National Key Research and Development Program of China(2022YFB3305300)the National Natural Science Foundation of China(62173178).
文摘Ethylene glycol(EG)plays a pivotal role as a primary raw material in the polyester industry,and the syngas-to-EG route has become a significant technical route in production.The carbon monoxide(CO)gas-phase catalytic coupling to synthesize dimethyl oxalate(DMO)is a crucial process in the syngas-to-EG route,whereby the composition of the reactor outlet exerts influence on the ultimate quality of the EG product and the energy consumption during the subsequent separation process.However,measuring product quality in real time or establishing accurate dynamic mechanism models is challenging.To effectively model the DMO synthesis process,this study proposes a hybrid modeling strategy that integrates process mechanisms and data-driven approaches.The CO gas-phase catalytic coupling mechanism model is developed based on intrinsic kinetics and material balance,while a long short-term memory(LSTM)neural network is employed to predict the macroscopic reaction rate by leveraging temporal relationships derived from archived measurements.The proposed model is trained semi-supervised to accommodate limited-label data scenarios,leveraging historical data.By integrating these predictions with the mechanism model,the hybrid modeling approach provides reliable and interpretable forecasts of mass fractions.Empirical investigations unequivocally validate the superiority of the proposed hybrid modeling approach over conventional data-driven models(DDMs)and other hybrid modeling techniques.
基金the National Natural Science Foundation of China(61761004)the Natural Science Foundation of Guangxi Province,China(2019GXNSFAA245045)。
文摘For large-scale radio frequency identification(RFID) indoor positioning system, the positioning scale is relatively large, with less labeled data and more unlabeled data, and it is easily affected by multipath and white noise. An RFID positioning algorithm based on semi-supervised actor-critic co-training(SACC) was proposed to solve this problem. In this research, the positioning is regarded as Markov decision-making process. Firstly, the actor-critic was combined with random actions and the unlabeled best received signal arrival intensity(RSSI) data was selected by co-training of the semi-supervised. Secondly, the actor and the critic were updated by employing Kronecker-factored approximation calculate(K-FAC) natural gradient. Finally, the target position was obtained by co-locating with labeled RSSI data and the selected unlabeled RSSI data. The proposed method reduced the cost of indoor positioning significantly by decreasing the number of labeled data. Meanwhile, with the increase of the positioning targets, the actor could quickly select unlabeled RSSI data and updates the location model. Experiment shows that, compared with other RFID indoor positioning algorithms, such as twin delayed deep deterministic policy gradient(TD3), deep deterministic policy gradient(DDPG), and actor-critic using Kronecker-factored trust region(ACKTR), the proposed method decreased the average positioning error respectively by 50.226%, 41.916%, and 25.004%. Meanwhile, the positioning stability was improved by 23.430%, 28.518%, and 38.631%.
基金supported by the State Key Laboratory of Remote Sensing Science and Chinese Academy of Surveying & Mapping (Grant No.20903)
文摘Semi-Supervised Classification (SSC),which makes use of both labeled and unlabeled data to determine classification borders in feature space,has great advantages in extracting classification information from mass data.In this paper,a novel SSC method based on Gaussian Mixture Model (GMM) is proposed,in which each class’s feature space is described by one GMM.Experiments show the proposed method can achieve high classification accuracy with small amount of labeled data.However,for the same accuracy,supervised classification methods such as Support Vector Machine,Object Oriented Classification,etc.should be provided with much more labeled data.
文摘As an important method for knowledge graph(KG)complementation,link prediction has become a hot research topic in recent years.In this paper,a performance enhancement scheme for link prediction models based on the idea of semi-supervised learning and model soup is proposed,which effectively improves the model performance on several mainstream link prediction models with small changes to their architecture.This novel scheme consists of two main parts,one is predicting potential fact triples in the graph with semi-supervised learning strategies,the other is creatively combining semi-supervised learning and model soup to further improve the final model performance without adding significant computational overhead.Experiments validate the effectiveness of the scheme for a variety of link prediction models,especially on the dataset with dense relationships.In terms of CompGCN,the model with the best overall performance among the tested models improves its Hits@1 metric by 14.7%on the FB15K-237 dataset and 7.8%on the WN18RR dataset after using the enhancement scheme.Meanwhile,it is observed that the semi-supervised learning strategy in the augmentation scheme has a significant improvement for multi-class link prediction models,and the performance improvement brought by the introduction of the model soup is related to the specific tested models,as the performances of some models are improved while others remain largely unaffected.
基金supported by Shenzhen Fundamental Research Program of China under Grant Nos.JCYJ20200109110420626 and JCYJ20200109110208764the National Natural Science Foundation of China under Grant Nos.U1813204 and 61802385+1 种基金the Natural Science Foundation of Guangdong of China under Grant No.2021A1515012604the Clinical Research Project of Shenzhen Municiple Health Commission under Grant No.SZLY2017011.
文摘Segmentation of intracranial aneurysm(IA)from computed tomography angiography(CTA)images is of significant importance for quantitative assessment of IA and further surgical treatment.Manual segmentation of IA is a labor-intensive,time-consuming job and suffers from inter-and intra-observer variabilities.Training deep neural networks usually requires a large amount of labeled data,while annotating data is very time-consuming for the IA segmentation task.This paper presents a novel weight-perceptual self-ensembling model for semi-supervised IA segmentation,which employs unlabeled data by encouraging the predictions of given perturbed input samples to be consistent.Considering that the quality of consistency targets is not comparable to each other,we introduce a novel sample weight perception module to quantify the quality of different consistency targets.Our proposed module can be used to evaluate the contributions of unlabeled samples during training to force the network to focus on those well-predicted samples.We have conducted both horizontal and vertical comparisons on the clinical intracranial aneurysm CTA image dataset.Experimental results show that our proposed method can improve at least 3%Dice coefficient over the fully-supervised baseline,and at least 1.7%over other state-of-the-art semi-supervised methods.
基金supported by the Nature Science Foundation of China(Grant No.62376114)the Nature Science Foundation of Fujian Province(Grant No.2021J011004,No.2021J011002)+1 种基金the Ministry of Education Industry-University-Research Innovation Program(Grant No.2021LDA09003)the Department of Education Foundation of Fujian Province(No.JAT210266)。
文摘Purpose-With the development of intelligent technology,deep learning has made significant progress and has been widely used in various fields.Deep learning is data-driven,and its training process requires a large amount of data to improve model performance.However,labeled data is expensive and not readily available.Design/methodology/approach-To address the above problem,researchers have integrated semisupervised and deep learning,using a limited number of labeled data and many unlabeled data to train models.In this paper,Generative Adversarial Networks(GANs)are analyzed as an entry point.Firstly,we discuss the current research on GANs in image super-resolution applications,including supervised,unsupervised,and semi-supervised learning approaches.Secondly,based on semi-supervised learning,different optimization methods are introduced as an example of image classification.Eventually,experimental comparisons and analyses of existing semi-supervised optimization methods based on GANs will be performed.Findings-Following the analysis of the selected studies,we summarize the problems that existed during the research process and propose future research directions.Originality/value-This paper reviews and analyzes research on generative adversarial networks for image super-resolution and classification from various learning approaches.The comparative analysis of experimental results on current semi-supervised GAN optimizations is performed to provide a reference for further research.
基金partially supported by the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation(DFG)supported by a scholarship of the German Academic Exchange Service(DAAD)
文摘Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.
基金The National Natural Science Foundation of China (32371993)The Natural Science Research Key Project of Anhui Provincial University(2022AH040125&2023AH040135)The Key Research and Development Plan of Anhui Province (202204c06020022&2023n06020057)。
文摘This study aimed to address the challenge of accurately and reliably detecting tomatoes in dense planting environments,a critical prerequisite for the automation implementation of robotic harvesting.However,the heavy reliance on extensive manually annotated datasets for training deep learning models still poses significant limitations to their application in real-world agricultural production environments.To overcome these limitations,we employed domain adaptive learning approach combined with the YOLOv5 model to develop a novel tomato detection model called as TDA-YOLO(tomato detection domain adaptation).We designated the normal illumination scenes in dense planting environments as the source domain and utilized various other illumination scenes as the target domain.To construct bridge mechanism between source and target domains,neural preset for color style transfer is introduced to generate a pseudo-dataset,which served to deal with domain discrepancy.Furthermore,this study combines the semi-supervised learning method to enable the model to extract domain-invariant features more fully,and uses knowledge distillation to improve the model's ability to adapt to the target domain.Additionally,for purpose of promoting inference speed and low computational demand,the lightweight FasterNet network was integrated into the YOLOv5's C3 module,creating a modified C3_Faster module.The experimental results demonstrated that the proposed TDA-YOLO model significantly outperformed original YOLOv5s model,achieving a mAP(mean average precision)of 96.80%for tomato detection across diverse scenarios in dense planting environments,increasing by 7.19 percentage points;Compared with the latest YOLOv8 and YOLOv9,it is also 2.17 and 1.19 percentage points higher,respectively.The model's average detection time per image was an impressive 15 milliseconds,with a FLOPs(floating point operations per second)count of 13.8 G.After acceleration processing,the detection accuracy of the TDA-YOLO model on the Jetson Xavier NX development board is 90.95%,the mAP value is 91.35%,and the detection time of each image is 21 ms,which can still meet the requirements of real-time detection of tomatoes in dense planting environment.The experimental results show that the proposed TDA-YOLO model can accurately and quickly detect tomatoes in dense planting environment,and at the same time avoid the use of a large number of annotated data,which provides technical support for the development of automatic harvesting systems for tomatoes and other fruits.
基金supported in part by the US Department of Energy(No.DE-EE0008189)and the National Science Foundation(Nos.1743418 and 1843025).
文摘Most heating,ventilation,and air-conditioning(HVAC)systems operate with one or more faults that result in increased energy consumption and that could lead to system failure over time.Today,most building owners are performing reactive maintenance only and may be less concerned or less able to assess the health of the system until catastrophic failure occurs.This is mainly because the building owners do not previously have good tools to detect and diagnose these faults,determine their impact,and act on findings.Commercially available fault detection and diagnostics(FDD)tools have been developed to address this issue and have the potential to reduce equipment downtime,energy costs,maintenance costs,and improve occupant comfort and system reliability.However,many of these tools require an in-depth knowledge of system behavior and thermodynamic principles to interpret the results.In this paper,supervised and semi-supervised machine learning(ML)approaches are applied to datasets collected from an operating system in the field to develop new FDD methods and to help building owners see the value proposition of performing proactive maintenance.The study data was collected from one packaged rooftop unit(RTU)HVAC system running under normal operating conditions at an industrial facility in Connecticut.This paper compares three different approaches for fault classification for a real-time operating RTU using semi-supervised learning,achieving accuracies as high as 95.7%using few-shot learning.
文摘X-ray inspection equipment is divided into small baggage inspection equipment and large cargo inspection equipment.In the case of inspection using X-ray scanning equipment,it is possible to identify the contents of goods,unauthorized transport,or hidden goods in real-time by-passing cargo through X-rays without opening it.In this paper,we propose a system for detecting dangerous objects in X-ray images using the Cascade Region-based Convolutional Neural Network(Cascade R-CNN)model,and the data used for learning consists of dangerous goods,storage media,firearms,and knives.In addition,to minimize the overfitting problem caused by the lack of data to be used for artificial intelligence(AI)training,data samples are increased by using the CP(copy-paste)algorithm on the existing data.It also solves the data labeling problem by mixing supervised and semi-supervised learning.The four comparative models to be used in this study are Faster Regionbased Convolutional Neural Networks Residual2 Network-101(Faster R-CNN_Res2Net-101)supervised learning,Cascade R-CNN_Res2Net-101_supervised learning,Cascade Region-based Convolutional Neural Networks Composite Backbone Network V2(CBNetV2)Network-101(Cascade R-CNN_CBNetV2Net-101)_supervised learning,and Cascade RCNN_CBNetV2-101_semi-supervised learning which are then compared and evaluated.As a result of comparing the performance of the four models in this paper,in case of Cascade R-CNN_CBNetV2-101_semi-supervised learning,Average Precision(AP)(Intersection over Union(IoU)=0.5):0.7%,AP(IoU=0.75):1.0%than supervised learning,Recall:0.8%higher.
基金supported by the National Natural Science Foundation of China(Grant Nos.60835002,61075004).
文摘The recent years have witnessed a surge of interests in graph-based semi-supervised learning(GBSSL).In this paper,we will introduce a series of works done by our group on this topic including:1)a method called linear neighborhood propagation(LNP)which can automatically construct the optimal graph;2)a novel multilevel scheme to make our algorithm scalable for large data sets;3)a generalized point charge scheme for GBSSL;4)a multilabel GBSSL method by solving a Sylvester equation;5)an information fusion framework for GBSSL;and 6)an application of GBSSL on fMRI image segmentation.
文摘Automatic classification of blog entries is generally treated as a semi-supervised machine learning task, in which the blog entries are automatically assigned to one of a set of pre-defined classes based on the features extracted from their textual content. This paper attempts automatic classification of unstructured blog entries by following pre-processing steps like tokenization, stop-word elimination and stemming;statistical techniques for feature set extraction, and feature set enhancement using semantic resources followed by modeling using two alternative machine learning models—the na?ve Bayesian model and the artificial neural network model. Empirical evaluations indicate that this multi-step classification approach has resulted in good overall classification accuracy over unstructured blog text datasets with both machine learning model alternatives. However, the na?ve Bayesian classification model clearly out-performs the ANN based classification model when a smaller feature-set is available which is usually the case when a blog topic is recent and the number of training datasets available is restricted.
文摘Co-training is a famous semi-supervised learning algorithm which can exploit unlabeled data to improve learning performance.Generally it works under a two-view setting (the input examples have two disjoint feature sets in nature),with the assumption that each view is sufficient to predict the label.However,in real-world applications due to feature corruption or feature noise,both views may be insufficient and co-training will suffer from these insufficient views.In this paper,we propose a novel algorithm named Weighted Co-training to deal with this problem.It identifies the newly labeled examples that are probably harmful for the other view,and decreases their weights in the training set to avoid the risk.The experimental results show that Weighted Co-training performs better than the state-of-art co-training algorithms on several benchmarks.
基金Supported by the National Natural Science Foundation of China (Grant No.60803120)
文摘The detection of abnormal vehicle events is a research hotspot in the analysis of highway surveillance video.Because of the complex factors,which include different conditions of weather,illumination,noise and so on,vehicle's feature extraction and abnormity detection become difficult.This paper proposes a Fast Constrained Delaunay Triangulation(FCDT) algorithm to replace complicated segmentation algorithms for multi-feature extraction.Based on the video frames segmented by FCDT,an improved algorithm is presented to estimate background self-adaptively.After the estimation,a multi-feature eigenvector is generated by Principal Component Analysis(PCA) in accordance with the static and motional features extracted through locating and tracking each vehicle.For abnormity detection,adaptive detection modeling of vehicle events(ADMVE) is presented,for which a semi-supervised Mixture of Gaussian Hidden Markov Model(MGHMM) is trained with the multi-feature eigenvectors from each video segment.The normal model is developed by supervised mode with manual labeling,and becomes more accurate via iterated adaptation.The abnormal models are trained through the adapted Bayesian learning with unsupervised mode.The paper also presents experiments using real video sequence to verify the proposed method.
文摘As a supplementary of [Xu L. Front. Electr. Electron. Eng. China, 2010, 5(3): 281-328], this paper outlines current status of efforts made on Bayesian Ying- Yang (BYY) harmony learning, plus gene analysis appli- cations. At the beginning, a bird's-eye view is provided via Gaussian mixture in comparison with typical learn- ing algorithms and model selection criteria. Particularly, semi-supervised learning is covered simply via choosing a scalar parameter. Then, essential topics and demand- ing issues about BYY system design and BYY harmony learning are systematically outlined, with a modern per- spective on Yin-Yang viewpoint discussed, another Yang factorization addressed, and coordinations across and within Ying-Yang summarized. The BYY system acts as a unified framework to accommodate unsupervised, su- pervised, and semi-supervised learning all in one formu- lation, while the best harmony learning provides novelty and strength to automatic model selection. Also, mathe- matical formulation of harmony functional has been ad- dressed as a unified scheme for measuring the proximity to be considered in a BYY system, and used as the best choice among others. Moreover, efforts are made on a number of learning tasks, including a mode-switching factor analysis proposed as a semi-blind learning frame- work for several types of independent factor analysis, a hidden Markov model (HMM) gated temporal fac- tor analysis suggested for modeling piecewise stationary temporal dependence, and a two-level hierarchical Gaus- sian mixture extended to cover semi-supervised learning, as well as a manifold learning modified to facilitate au- tomatic model selection. Finally, studies are applied to the problems of gene analysis, such as genome-wide asso- ciation, exome sequencing analysis, and gene transcrip- tional regulation.