Tag recommendation systems can significantly improve the accuracy of information retrieval by recommending relevant tag sets that align with user preferences and resource characteristics.However,metric learning method...Tag recommendation systems can significantly improve the accuracy of information retrieval by recommending relevant tag sets that align with user preferences and resource characteristics.However,metric learning methods often suffer from high sensitivity,leading to unstable recommendation results when facing adversarial samples generated through malicious user behavior.Adversarial training is considered to be an effective method for improving the robustness of tag recommendation systems and addressing adversarial samples.However,it still faces the challenge of overfitting.Although curriculum learning-based adversarial training somewhat mitigates this issue,challenges still exist,such as the lack of a quantitative standard for attack intensity and catastrophic forgetting.To address these challenges,we propose a Self-Paced Adversarial Metric Learning(SPAML)method.First,we employ a metric learning model to capture the deep distance relationships between normal samples.Then,we incorporate a self-paced adversarial training model,which dynamically adjusts the weights of adversarial samples,allowing the model to progressively learn from simpler to more complex adversarial samples.Finally,we jointly optimize the metric learning loss and self-paced adversarial training loss in an adversarial manner,enhancing the robustness and performance of tag recommendation tasks.Extensive experiments on the MovieLens and LastFm datasets demonstrate that SPAML achieves F1@3 and NDCG@3 scores of 22%and 32.7%on the MovieLens dataset,and 19.4%and 29%on the LastFm dataset,respectively,outperforming the most competitive baselines.Specifically,F1@3 improves by 4.7%and 6.8%,and NDCG@3 improves by 5.0%and 6.9%,respectively.展开更多
Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abno...Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abnormal network traffic patterns and traffic classification based on labeled network traffic data are among the most effective approaches for network behavior identification.Traditional methods for network traffic classification utilize algorithms such as Naive Bayes,Decision Tree and XGBoost.However,network traffic classification,which is required for network behavior identification,generally suffers from the problem of low accuracy even with the recently proposed deep learning models.To improve network traffic classification accuracy thus improving network intrusion detection rate,this paper proposes a new network traffic classification model,called ArcMargin,which incorporates metric learning into a convolutional neural network(CNN)to make the CNN model more discriminative.ArcMargin maps network traffic samples from the same category more closely while samples from different categories are mapped as far apart as possible.The metric learning regularization feature is called additive angular margin loss,and it is embedded in the object function of traditional CNN models.The proposed ArcMargin model is validated with three datasets and is compared with several other related algorithms.According to a set of classification indicators,the ArcMargin model is proofed to have better performances in both network traffic classification tasks and open-set tasks.Moreover,in open-set tasks,the ArcMargin model can cluster unknown data classes that do not exist in the previous training dataset.展开更多
The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decisi...The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples.展开更多
Few‐shot image classification is the task of classifying novel classes using extremely limited labelled samples.To perform classification using the limited samples,one solution is to learn the feature alignment(FA)in...Few‐shot image classification is the task of classifying novel classes using extremely limited labelled samples.To perform classification using the limited samples,one solution is to learn the feature alignment(FA)information between the labelled and unlabelled sample features.Most FA methods use the feature mean as the class prototype and calculate the correlation between prototype and unlabelled features to learn an alignment strategy.However,mean prototypes tend to degenerate informative features because spatial features at the same position may not be equally important for the final classification,leading to inaccurate correlation calculations.Therefore,the authors propose an effective intraclass FA strategy that aggregates semantically similar spatial features from an adaptive reference prototype in low‐dimensional feature space to obtain an informative prototype feature map for precise correlation computation.Moreover,a dual correlation module to learn the hard and soft correlations was developed by the authors.This module combines the correlation information between the prototype and unlabelled features in both the original and learnable feature spaces,aiming to produce a comprehensive cross‐correlation between the prototypes and unlabelled features.Using both FA and cross‐attention modules,our model can maintain informative class features and capture important shared features for classification.Experimental results on three few‐shot classification benchmarks show that the proposed method outperformed related methods and resulted in a 3%performance boost in the 1‐shot setting by inserting the proposed module into the related methods.展开更多
In recent years, the crack fault is one of the most common faults in the rotor system and it is still a challenge for crack position diagnosis in the hollow shaft rotor system. In this paper, a method based on the Con...In recent years, the crack fault is one of the most common faults in the rotor system and it is still a challenge for crack position diagnosis in the hollow shaft rotor system. In this paper, a method based on the Convolutional Neural Network and deep metric learning(CNN-C) is proposed to effectively identify the crack position for a hollow shaft rotor system. Center-loss function is used to enhance the performance of neural network. Main contributions include: Firstly, the dynamic response of the dual-disks hollow shaft rotor system is obtained. The analysis results show that the crack will cause super-harmonic resonance, and the peak value of it is closely related to the position and depth of the crack. In addition, the amplitude near the non-resonant region also has relationship with the crack parameters. Secondly, we proposed an effective crack position diagnosis method which has the highest 99.04% recognition accuracy compared with other algorithms. Then,the influence of penalty factor on CNN-C performance is analyzed, which shows that too high penalty factor will lead to the decline of the neural network performance. Finally, the feature vectors are visualized via t-distributed Stochastic Neighbor Embedding(t-SNE). Naive Bayes classifier(NB) and K-Nearest Neighbor algorithm(KNN) are used to verify the validity of the feature vectors extracted by CNN-C. The results show that NB and KNN have more regular decision boundaries and higher recognition accuracy on the feature vectors data set extracted by CNN-C,indicating that the feature vectors extracted by CNN-C have great intra-class compactness and inter-class separability.展开更多
Deep metric learning(DML)has achieved great results on visual understanding tasks by seamlessly integrating conventional metric learning with deep neural networks.Existing deep metric learning methods focus on designi...Deep metric learning(DML)has achieved great results on visual understanding tasks by seamlessly integrating conventional metric learning with deep neural networks.Existing deep metric learning methods focus on designing pair-based distance loss to decrease intra-class distance while increasing interclass distance.However,these methods fail to preserve the geometric structure of data in the embedding space,which leads to the spatial structure shift across mini-batches and may slow down the convergence of embedding learning.To alleviate these issues,by assuming that the input data is embedded in a lower-dimensional sub-manifold,we propose a novel deep Riemannian metric learning(DRML)framework that exploits the non-Euclidean geometric structural information.Considering that the curvature information of data measures how much the Riemannian(nonEuclidean)metric deviates from the Euclidean metric,we leverage geometry flow,which is called a geometric evolution equation,to characterize the relation between the Riemannian metric and its curvature.Our DRML not only regularizes the local neighborhoods connection of the embeddings at the hidden layer but also adapts the embeddings to preserve the geometric structure of the data.On several benchmark datasets,the proposed DRML outperforms all existing methods and these results demonstrate its effectiveness.展开更多
Inspired by the tremendous achievements of meta-learning in various fields,this paper proposes the local quadratic embedding learning(LQEL)algorithm for regression problems based on metric learning and neural networks...Inspired by the tremendous achievements of meta-learning in various fields,this paper proposes the local quadratic embedding learning(LQEL)algorithm for regression problems based on metric learning and neural networks(NNs).First,Mahalanobis metric learning is improved by optimizing the global consistency of the metrics between instances in the input and output space.Then,we further prove that the improved metric learning problem is equivalent to a convex programming problem by relaxing the constraints.Based on the hypothesis of local quadratic interpolation,the algorithm introduces two lightweight NNs;one is used to learn the coefficient matrix in the local quadratic model,and the other is implemented for weight assignment for the prediction results obtained from different local neighbors.Finally,the two sub-mod els are embedded in a unified regression framework,and the parameters are learned by means of a stochastic gradient descent(SGD)algorithm.The proposed algorithm can make full use of the information implied in target labels to find more reliable reference instances.Moreover,it prevents the model degradation caused by sensor drift and unmeasurable variables by modeling variable differences with the LQEL algorithm.Simulation results on multiple benchmark datasets and two practical industrial applications show that the proposed method outperforms several popular regression methods.展开更多
In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D ...In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.展开更多
A group activity recognition algorithm is proposed to improve the recognition accuracy in video surveillance by using complex wavelet domain based Cayley-Klein metric learning.Non-sampled dual-tree complex wavelet pac...A group activity recognition algorithm is proposed to improve the recognition accuracy in video surveillance by using complex wavelet domain based Cayley-Klein metric learning.Non-sampled dual-tree complex wavelet packet transform(NS-DTCWPT)is used to decompose the human images in videos into multi-scale and multi-resolution.An improved local binary pattern(ILBP)and an inner-distance shape context(IDSC)combined with bag-of-words model is adopted to extract the decomposed high and low frequency coefficient features.The extracted coefficient features of the training samples are used to optimize Cayley-Klein metric matrix by solving a nonlinear optimization problem.The group activities in videos are recognized by using the method of feature extraction and Cayley-Klein metric learning.Experimental results on behave video set,group activity video set,and self-built video set show that the proposed algorithm has higher recognition accuracy than the existing algorithms.展开更多
Now a days,Remote Sensing(RS)techniques are used for earth observation and for detection of soil types with high accuracy and better reliability.This technique provides perspective view of spatial resolution and aids ...Now a days,Remote Sensing(RS)techniques are used for earth observation and for detection of soil types with high accuracy and better reliability.This technique provides perspective view of spatial resolution and aids in instantaneous measurement of soil’s minerals and its characteristics.There are a few challenges that is present in soil classification using image enhancement such as,locating and plotting soil boundaries,slopes,hazardous areas,drainage condition,land use,vegetation etc.There are some traditional approaches which involves few drawbacks such as,manual involvement which results in inaccuracy due to human interference,time consuming,inconsistent prediction etc.To overcome these draw backs and to improve the predictive analysis of soil characteristics,we propose a Hybrid Deep Learning improved BAT optimization algorithm(HDIB)for soil classification using remote sensing hyperspectral features.In HDIB,we propose a spontaneous BAT optimization algorithm for feature extraction of both spectral-spatial features by choosing pure pixels from the Hyper Spectral(HS)image.Spectral-spatial vector as training illustrations is attained by merging spatial and spectral vector by means of priority stacking methodology.Then,a recurring Deep Learning(DL)Neural Network(NN)is used for classifying the HS images,considering the datasets of Pavia University,Salinas and Tamil Nadu Hill Scene,which in turn improves the reliability of classification.Finally,the performance of the proposed HDIB based soil classifier is compared and analyzed with existing methodologies like Single Layer Perceptron(SLP),Convolutional Neural Networks(CNN)and Deep Metric Learning(DML)and it shows an improved classification accuracy of 99.87%,98.34%and 99.9%for Tamil Nadu Hills dataset,Pavia University and Salinas scene datasets respectively.展开更多
Existing clothes retrieval methods mostly adopt binary supervision in metric learning.For each iteration,only the clothes belonging to the same instance are positive samples,and all other clothes are“indistinguishabl...Existing clothes retrieval methods mostly adopt binary supervision in metric learning.For each iteration,only the clothes belonging to the same instance are positive samples,and all other clothes are“indistinguishable”negative samples,which causes the following problem.The relevance between the query and candidates is only treated as relevant or irrelevant,which makes the model difficult to learn the continu-ous semantic similarities between clothes.Clothes that do not belong to the same instance are completely considered irrelevant and are uni-formly pushed away from the query by an equal margin in the embedding space,which is not consistent with the ideal retrieval results.Moti-vated by this,we propose a novel method called semantic-based clothes retrieval(SCR).In SCR,we measure the semantic similarities be-tween clothes and design a new adaptive loss based on these similarities.The margin in the proposed adaptive loss can vary with different se-mantic similarities between the anchor and negative samples.In this way,more coherent embedding space can be learned,where candidates with higher semantic similarities are mapped closer to the query than those with lower ones.We use Recall@K and normalized Discounted Cu-mulative Gain(nDCG)as evaluation metrics to conduct experiments on the DeepFashion dataset and have achieved better performance.展开更多
With the development of new media technology,vehicle matching plays a further significant role in video surveillance systems.Recent methods explored the vehicle matching based on the feature extraction.Meanwhile,simil...With the development of new media technology,vehicle matching plays a further significant role in video surveillance systems.Recent methods explored the vehicle matching based on the feature extraction.Meanwhile,similarity metric learning also has achieved enormous progress in vehicle matching.But most of these methods are less effective in some realistic scenarios where vehicles usually be captured in different times.To address this cross-domain problem,we propose a cross-domain similarity metric learning method that utilizes theGANto generate vehicle imageswith another domain and propose the two-channel Siamese network to learn a similarity metric from both domains(i.e.,Day pattern or Night pattern)for vehicle matching.To exploit properties and relationships among vehicle datasets,we first apply the domain transformer to translate the domain of vehicle images,and then utilize the two-channel Siamese network to extract features from both domains for better feature similarity learning.Experimental results illustrate that our models achieve improvements over state-of-the-arts.展开更多
With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detectin...With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detecting attacks.However,two challenges continue to stymie the development of a viable network intrusion detection system:imbalanced training data and new undiscovered attacks.Therefore,this study proposes a unique deep learning-based intrusion detection method.We use two independent in-memory autoencoders trained on regular network traffic and attacks to capture the dynamic relationship between traffic features in the presence of unbalanced training data.Then the original data is fed into the triplet network by forming a triplet with the data reconstructed from the two encoders to train.Finally,the distance relationship between the triples determines whether the traffic is an attack.In addition,to improve the accuracy of detecting unknown attacks,this research proposes an improved triplet loss function that is used to pull the distances of the same class closer while pushing the distances belonging to different classes farther in the learned feature space.The proposed approach’s effectiveness,stability,and significance are evaluated against advanced models on the Android Adware and General Malware Dataset(AAGM17),Knowledge Discovery and Data Mining Cup 1999(KDDCUP99),Canadian Institute for Cybersecurity Group’s Intrusion Detection Evaluation Dataset(CICIDS2017),UNSW-NB15,Network Security Lab-Knowledge Discovery and Data Mining(NSL-KDD)datasets.The achieved results confirmed the superiority of the proposed method for the task of network intrusion detection.展开更多
Gait recognition is a biometric technique that captures human walking pattern using gait silhouettes as input and can be used for long-term recognition.Recently proposed video-based methods achieve high performance.Ho...Gait recognition is a biometric technique that captures human walking pattern using gait silhouettes as input and can be used for long-term recognition.Recently proposed video-based methods achieve high performance.However,gait covariates or walking conditions,i.e.,bag carrying and clothing,make the recognition of intra-class gait samples hard.Advanced methods simply use triplet loss for metric learning,which does not take the gait covariates into account.For alleviating the adverse influence of gait covariates,we propose cross walking condition constraint to explicitly consider the gait covariates.Specifically,this approach designs center-based and pair-wise loss functions to decrease discrepancy of intra-class gait samples under different walking conditions and enlarge the distance of inter-class gait samples under the same walking condition.Besides,we also propose a video-based strong baseline model of high performance by applying simple yet effective tricks,which have been validated in other individual recognition fields.With the proposed baseline model and loss functions,our method achieves the state-of-the-art performance.展开更多
In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complem...In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complementary data can improve the discriminative ability of the model.Fusion is a very challenging task since 2D and 3D data are essentially different and show different formats.The existing methods first extract 2D multi-view image features and then aggregate them into sparse 3D point clouds and achieve superior performance.However,the existing methods ignore the structural relations between pixels and point clouds and directly fuse the two modals of data without adaptation.To address this,we propose a structural deep metric learning method on pixels and points to explore the relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.Extensive experiments on the widely used ScanNetV2 and S3DIS datasets verify the performance of the proposed SAFNet.展开更多
Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Netw...Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments.展开更多
In existing remote sensing image retrieval(RSIR)datasets,the number of images among different classes varies dramatically,which leads to a severe class imbalance problem.Some studies propose to train the model with th...In existing remote sensing image retrieval(RSIR)datasets,the number of images among different classes varies dramatically,which leads to a severe class imbalance problem.Some studies propose to train the model with the ranking‐based metric(e.g.,average precision[AP]),because AP is robust to class imbalance.However,current AP‐based methods overlook an important issue:only optimising samples ranking before each positive sample,which is limited by the definition of AP and is prone to local optimum.To achieve global optimisation of AP,a novel method,namely Optimising Samples after positive ones&AP loss(OSAP‐Loss)is proposed in this study.Specifically,a novel superior ranking function is designed to make the AP loss differentiable while providing a tighter upper bound.Then,a novel loss called Optimising Samples after Positive ones(OSP)loss is proposed to involve all positive and negative samples ranking after each positive one and to provide a more flexible optimisation strategy for each sample.Finally,a graphics processing unit memory‐free mechanism is developed to thoroughly address the non‐decomposability of AP optimisation.Extensive experimental results on RSIR as well as conventional image retrieval datasets show the superiority and competitive performance of OSAP‐Loss compared to the state‐of‐the‐art.展开更多
Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.The...Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm.展开更多
Target recognition based on deep learning relies on a large quantity of samples,but in some specific remote sensing scenes,the samples are very rare.Currently,few-shot learning can obtain high-performance target class...Target recognition based on deep learning relies on a large quantity of samples,but in some specific remote sensing scenes,the samples are very rare.Currently,few-shot learning can obtain high-performance target classification models using only a few samples,but most researches are based on the natural scene.Therefore,this paper proposes a metric-based few-shot classification technology in remote sensing.First,we constructed a dataset(RSD-FSC)for few-shot classification in remote sensing,which contained 21 classes typical target sample slices of remote sensing images.Second,based on metric learning,a k-nearest neighbor classification network is proposed,to find multiple training samples similar to the testing target,and then the similarity between the testing target and multiple similar samples is calculated to classify the testing target.Finally,the 5-way 1-shot,5-way 5-shot and 5-way 10-shot experiments are conducted to improve the generalization of the model on few-shot classification tasks.The experimental results show that for the newly emerged classes few-shot samples,when the number of training samples is 1,5 and 10,the average accuracy of target recognition can reach 59.134%,82.553%and 87.796%,respectively.It demonstrates that our proposed method can resolve few-shot classification in remote sensing image and perform better than other few-shot classification methods.展开更多
Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-i...Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.展开更多
基金supported by the Key Research and Development Program of Zhejiang Province(No.2024C01071)the Natural Science Foundation of Zhejiang Province(No.LQ15F030006).
文摘Tag recommendation systems can significantly improve the accuracy of information retrieval by recommending relevant tag sets that align with user preferences and resource characteristics.However,metric learning methods often suffer from high sensitivity,leading to unstable recommendation results when facing adversarial samples generated through malicious user behavior.Adversarial training is considered to be an effective method for improving the robustness of tag recommendation systems and addressing adversarial samples.However,it still faces the challenge of overfitting.Although curriculum learning-based adversarial training somewhat mitigates this issue,challenges still exist,such as the lack of a quantitative standard for attack intensity and catastrophic forgetting.To address these challenges,we propose a Self-Paced Adversarial Metric Learning(SPAML)method.First,we employ a metric learning model to capture the deep distance relationships between normal samples.Then,we incorporate a self-paced adversarial training model,which dynamically adjusts the weights of adversarial samples,allowing the model to progressively learn from simpler to more complex adversarial samples.Finally,we jointly optimize the metric learning loss and self-paced adversarial training loss in an adversarial manner,enhancing the robustness and performance of tag recommendation tasks.Extensive experiments on the MovieLens and LastFm datasets demonstrate that SPAML achieves F1@3 and NDCG@3 scores of 22%and 32.7%on the MovieLens dataset,and 19.4%and 29%on the LastFm dataset,respectively,outperforming the most competitive baselines.Specifically,F1@3 improves by 4.7%and 6.8%,and NDCG@3 improves by 5.0%and 6.9%,respectively.
基金This work was supported by the National Natural Science Foundation of China(61871046).
文摘Attacks on websites and network servers are among the most critical threats in network security.Network behavior identification is one of the most effective ways to identify malicious network intrusions.Analyzing abnormal network traffic patterns and traffic classification based on labeled network traffic data are among the most effective approaches for network behavior identification.Traditional methods for network traffic classification utilize algorithms such as Naive Bayes,Decision Tree and XGBoost.However,network traffic classification,which is required for network behavior identification,generally suffers from the problem of low accuracy even with the recently proposed deep learning models.To improve network traffic classification accuracy thus improving network intrusion detection rate,this paper proposes a new network traffic classification model,called ArcMargin,which incorporates metric learning into a convolutional neural network(CNN)to make the CNN model more discriminative.ArcMargin maps network traffic samples from the same category more closely while samples from different categories are mapped as far apart as possible.The metric learning regularization feature is called additive angular margin loss,and it is embedded in the object function of traditional CNN models.The proposed ArcMargin model is validated with three datasets and is compared with several other related algorithms.According to a set of classification indicators,the ArcMargin model is proofed to have better performances in both network traffic classification tasks and open-set tasks.Moreover,in open-set tasks,the ArcMargin model can cluster unknown data classes that do not exist in the previous training dataset.
基金supported by the National Natural Science Foundation of China(No.61501229)the Fundamental Research Funds for the Central Universities(Nos.2019054,2020045)。
文摘The rapid growth of air traffic has continuously increased the workload of controllers,which has become an important factor restricting sector capacity.If similar traffic scenes can be identified,the historical decision-making experience may be used to help controllers decide control strategies quickly.Considering that there are many traffic scenes and it is hard to label them all,in this paper,we propose an active SVM metric learning(ASVM2L)algorithm to measure and identify the similar traffic scenes.First of all,we obtain some traffic scene samples correctly labeled by experienced air traffic controllers.We design an active sampling strategy based on voting difference to choose the most valuable unlabeled samples and label them.Then the metric matrix of all the labeled samples is learned and used to complete the classification of traffic scenes.We verify the effectiveness of ASVM2L on standard data sets,and then use it to measure and classify the traffic scenes on the historical air traffic data set of the Central South Sector of China.The experimental results show that,compared with other existing methods,the proposed method can use the information of traffic scene samples more thoroughly and achieve better classification performance under limited labeled samples.
基金Institute of Information&Communications Technology Planning&Evaluation,Grant/Award Number:2022-0-00074。
文摘Few‐shot image classification is the task of classifying novel classes using extremely limited labelled samples.To perform classification using the limited samples,one solution is to learn the feature alignment(FA)information between the labelled and unlabelled sample features.Most FA methods use the feature mean as the class prototype and calculate the correlation between prototype and unlabelled features to learn an alignment strategy.However,mean prototypes tend to degenerate informative features because spatial features at the same position may not be equally important for the final classification,leading to inaccurate correlation calculations.Therefore,the authors propose an effective intraclass FA strategy that aggregates semantically similar spatial features from an adaptive reference prototype in low‐dimensional feature space to obtain an informative prototype feature map for precise correlation computation.Moreover,a dual correlation module to learn the hard and soft correlations was developed by the authors.This module combines the correlation information between the prototype and unlabelled features in both the original and learnable feature spaces,aiming to produce a comprehensive cross‐correlation between the prototypes and unlabelled features.Using both FA and cross‐attention modules,our model can maintain informative class features and capture important shared features for classification.Experimental results on three few‐shot classification benchmarks show that the proposed method outperformed related methods and resulted in a 3%performance boost in the 1‐shot setting by inserting the proposed module into the related methods.
基金the financial supports from the National Natural Science Foundation of China(No.11972129)the National Major Science and Technology Projects of China(No.2017-IV-0008-0045)。
文摘In recent years, the crack fault is one of the most common faults in the rotor system and it is still a challenge for crack position diagnosis in the hollow shaft rotor system. In this paper, a method based on the Convolutional Neural Network and deep metric learning(CNN-C) is proposed to effectively identify the crack position for a hollow shaft rotor system. Center-loss function is used to enhance the performance of neural network. Main contributions include: Firstly, the dynamic response of the dual-disks hollow shaft rotor system is obtained. The analysis results show that the crack will cause super-harmonic resonance, and the peak value of it is closely related to the position and depth of the crack. In addition, the amplitude near the non-resonant region also has relationship with the crack parameters. Secondly, we proposed an effective crack position diagnosis method which has the highest 99.04% recognition accuracy compared with other algorithms. Then,the influence of penalty factor on CNN-C performance is analyzed, which shows that too high penalty factor will lead to the decline of the neural network performance. Finally, the feature vectors are visualized via t-distributed Stochastic Neighbor Embedding(t-SNE). Naive Bayes classifier(NB) and K-Nearest Neighbor algorithm(KNN) are used to verify the validity of the feature vectors extracted by CNN-C. The results show that NB and KNN have more regular decision boundaries and higher recognition accuracy on the feature vectors data set extracted by CNN-C,indicating that the feature vectors extracted by CNN-C have great intra-class compactness and inter-class separability.
基金supported in part by the Young Elite Scientists Sponsorship Program by CAST(2022QNRC001)the National Natural Science Foundation of China(61621003,62101136)+2 种基金Natural Science Foundation of Shanghai(21ZR1403600)Shanghai Municipal Science and Technology Major Project(2018SHZDZX01)ZJLab,and Shanghai Municipal of Science and Technology Project(20JC1419500)。
文摘Deep metric learning(DML)has achieved great results on visual understanding tasks by seamlessly integrating conventional metric learning with deep neural networks.Existing deep metric learning methods focus on designing pair-based distance loss to decrease intra-class distance while increasing interclass distance.However,these methods fail to preserve the geometric structure of data in the embedding space,which leads to the spatial structure shift across mini-batches and may slow down the convergence of embedding learning.To alleviate these issues,by assuming that the input data is embedded in a lower-dimensional sub-manifold,we propose a novel deep Riemannian metric learning(DRML)framework that exploits the non-Euclidean geometric structural information.Considering that the curvature information of data measures how much the Riemannian(nonEuclidean)metric deviates from the Euclidean metric,we leverage geometry flow,which is called a geometric evolution equation,to characterize the relation between the Riemannian metric and its curvature.Our DRML not only regularizes the local neighborhoods connection of the embeddings at the hidden layer but also adapts the embeddings to preserve the geometric structure of the data.On several benchmark datasets,the proposed DRML outperforms all existing methods and these results demonstrate its effectiveness.
基金supported by the National Key Research and Development Program of China(2016YFB0303401)the International(Regional)Cooperation and Exchange Project(61720106008)+1 种基金the National Science Fund for Distinguished Young Scholars(61725301)the Shanghai AI Lab。
文摘Inspired by the tremendous achievements of meta-learning in various fields,this paper proposes the local quadratic embedding learning(LQEL)algorithm for regression problems based on metric learning and neural networks(NNs).First,Mahalanobis metric learning is improved by optimizing the global consistency of the metrics between instances in the input and output space.Then,we further prove that the improved metric learning problem is equivalent to a convex programming problem by relaxing the constraints.Based on the hypothesis of local quadratic interpolation,the algorithm introduces two lightweight NNs;one is used to learn the coefficient matrix in the local quadratic model,and the other is implemented for weight assignment for the prediction results obtained from different local neighbors.Finally,the two sub-mod els are embedded in a unified regression framework,and the parameters are learned by means of a stochastic gradient descent(SGD)algorithm.The proposed algorithm can make full use of the information implied in target labels to find more reliable reference instances.Moreover,it prevents the model degradation caused by sensor drift and unmeasurable variables by modeling variable differences with the LQEL algorithm.Simulation results on multiple benchmark datasets and two practical industrial applications show that the proposed method outperforms several popular regression methods.
基金the National Natural Science Foundation of China,No.61932003and the Fundamental Research Funds for the Central Universities.
文摘In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer interaction.Previous studies have made significant achievements in predicting 2D or 3D gazes from monocular face images.This study presents a deep neural network for 2D gaze estimation on mobile devices.It achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the display.To this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression performance.Subsequently,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional supervision.Consequently,both gaze point regression and quadrant classification perfor-mances are improved.The experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.
基金Supported by the National Natural Science Foundation of China(61672032,61401001)the Natural Science Foundation of Anhui Province(1408085MF121)the Opening Foundation of Anhui Key Laboratory of Polarization Imaging Detection Technology(2016-KFKT-003)
文摘A group activity recognition algorithm is proposed to improve the recognition accuracy in video surveillance by using complex wavelet domain based Cayley-Klein metric learning.Non-sampled dual-tree complex wavelet packet transform(NS-DTCWPT)is used to decompose the human images in videos into multi-scale and multi-resolution.An improved local binary pattern(ILBP)and an inner-distance shape context(IDSC)combined with bag-of-words model is adopted to extract the decomposed high and low frequency coefficient features.The extracted coefficient features of the training samples are used to optimize Cayley-Klein metric matrix by solving a nonlinear optimization problem.The group activities in videos are recognized by using the method of feature extraction and Cayley-Klein metric learning.Experimental results on behave video set,group activity video set,and self-built video set show that the proposed algorithm has higher recognition accuracy than the existing algorithms.
文摘Now a days,Remote Sensing(RS)techniques are used for earth observation and for detection of soil types with high accuracy and better reliability.This technique provides perspective view of spatial resolution and aids in instantaneous measurement of soil’s minerals and its characteristics.There are a few challenges that is present in soil classification using image enhancement such as,locating and plotting soil boundaries,slopes,hazardous areas,drainage condition,land use,vegetation etc.There are some traditional approaches which involves few drawbacks such as,manual involvement which results in inaccuracy due to human interference,time consuming,inconsistent prediction etc.To overcome these draw backs and to improve the predictive analysis of soil characteristics,we propose a Hybrid Deep Learning improved BAT optimization algorithm(HDIB)for soil classification using remote sensing hyperspectral features.In HDIB,we propose a spontaneous BAT optimization algorithm for feature extraction of both spectral-spatial features by choosing pure pixels from the Hyper Spectral(HS)image.Spectral-spatial vector as training illustrations is attained by merging spatial and spectral vector by means of priority stacking methodology.Then,a recurring Deep Learning(DL)Neural Network(NN)is used for classifying the HS images,considering the datasets of Pavia University,Salinas and Tamil Nadu Hill Scene,which in turn improves the reliability of classification.Finally,the performance of the proposed HDIB based soil classifier is compared and analyzed with existing methodologies like Single Layer Perceptron(SLP),Convolutional Neural Networks(CNN)and Deep Metric Learning(DML)and it shows an improved classification accuracy of 99.87%,98.34%and 99.9%for Tamil Nadu Hills dataset,Pavia University and Salinas scene datasets respectively.
文摘Existing clothes retrieval methods mostly adopt binary supervision in metric learning.For each iteration,only the clothes belonging to the same instance are positive samples,and all other clothes are“indistinguishable”negative samples,which causes the following problem.The relevance between the query and candidates is only treated as relevant or irrelevant,which makes the model difficult to learn the continu-ous semantic similarities between clothes.Clothes that do not belong to the same instance are completely considered irrelevant and are uni-formly pushed away from the query by an equal margin in the embedding space,which is not consistent with the ideal retrieval results.Moti-vated by this,we propose a novel method called semantic-based clothes retrieval(SCR).In SCR,we measure the semantic similarities be-tween clothes and design a new adaptive loss based on these similarities.The margin in the proposed adaptive loss can vary with different se-mantic similarities between the anchor and negative samples.In this way,more coherent embedding space can be learned,where candidates with higher semantic similarities are mapped closer to the query than those with lower ones.We use Recall@K and normalized Discounted Cu-mulative Gain(nDCG)as evaluation metrics to conduct experiments on the DeepFashion dataset and have achieved better performance.
基金supported in part by the National Natural Science Foundation of China under Grant 61972205in part by the National Key R&D Program of China under Grant 2018YFB1003205.
文摘With the development of new media technology,vehicle matching plays a further significant role in video surveillance systems.Recent methods explored the vehicle matching based on the feature extraction.Meanwhile,similarity metric learning also has achieved enormous progress in vehicle matching.But most of these methods are less effective in some realistic scenarios where vehicles usually be captured in different times.To address this cross-domain problem,we propose a cross-domain similarity metric learning method that utilizes theGANto generate vehicle imageswith another domain and propose the two-channel Siamese network to learn a similarity metric from both domains(i.e.,Day pattern or Night pattern)for vehicle matching.To exploit properties and relationships among vehicle datasets,we first apply the domain transformer to translate the domain of vehicle images,and then utilize the two-channel Siamese network to extract features from both domains for better feature similarity learning.Experimental results illustrate that our models achieve improvements over state-of-the-arts.
基金support of National Natural Science Foundation of China(U1936213)Yunnan Provincial Natural Science Foundation,“Robustness analysis method and coupling mechanism of complex coupled network system”(202101AT070167)Yunnan Provincial Major Science and Technology Program,“Construction and application demonstration of intelligent diagnosis and treatment system for childhood diseases based on intelligent medical platform”(202102AA100021).
文摘With the advancement of network communication technology,network traffic shows explosive growth.Consequently,network attacks occur frequently.Network intrusion detection systems are still the primary means of detecting attacks.However,two challenges continue to stymie the development of a viable network intrusion detection system:imbalanced training data and new undiscovered attacks.Therefore,this study proposes a unique deep learning-based intrusion detection method.We use two independent in-memory autoencoders trained on regular network traffic and attacks to capture the dynamic relationship between traffic features in the presence of unbalanced training data.Then the original data is fed into the triplet network by forming a triplet with the data reconstructed from the two encoders to train.Finally,the distance relationship between the triples determines whether the traffic is an attack.In addition,to improve the accuracy of detecting unknown attacks,this research proposes an improved triplet loss function that is used to pull the distances of the same class closer while pushing the distances belonging to different classes farther in the learned feature space.The proposed approach’s effectiveness,stability,and significance are evaluated against advanced models on the Android Adware and General Malware Dataset(AAGM17),Knowledge Discovery and Data Mining Cup 1999(KDDCUP99),Canadian Institute for Cybersecurity Group’s Intrusion Detection Evaluation Dataset(CICIDS2017),UNSW-NB15,Network Security Lab-Knowledge Discovery and Data Mining(NSL-KDD)datasets.The achieved results confirmed the superiority of the proposed method for the task of network intrusion detection.
基金This work was supported in part by the Natural Science Foundation of China under Grant 61972169 and U1536203in part by the National key research and development program of China(2016QY01W0200)in part by the Major Scientific and Technological Project of Hubei Province(2018AAA068 and 2019AAA051).
文摘Gait recognition is a biometric technique that captures human walking pattern using gait silhouettes as input and can be used for long-term recognition.Recently proposed video-based methods achieve high performance.However,gait covariates or walking conditions,i.e.,bag carrying and clothing,make the recognition of intra-class gait samples hard.Advanced methods simply use triplet loss for metric learning,which does not take the gait covariates into account.For alleviating the adverse influence of gait covariates,we propose cross walking condition constraint to explicitly consider the gait covariates.Specifically,this approach designs center-based and pair-wise loss functions to decrease discrepancy of intra-class gait samples under different walking conditions and enlarge the distance of inter-class gait samples under the same walking condition.Besides,we also propose a video-based strong baseline model of high performance by applying simple yet effective tricks,which have been validated in other individual recognition fields.With the proposed baseline model and loss functions,our method achieves the state-of-the-art performance.
基金supported by the National Natural Science Foundation of China(No.61976023)。
文摘In this paper,we propose a Structure-Aware Fusion Network(SAFNet)for 3D scene understanding.As 2D images present more detailed information while 3D point clouds convey more geometric information,fusing the two complementary data can improve the discriminative ability of the model.Fusion is a very challenging task since 2D and 3D data are essentially different and show different formats.The existing methods first extract 2D multi-view image features and then aggregate them into sparse 3D point clouds and achieve superior performance.However,the existing methods ignore the structural relations between pixels and point clouds and directly fuse the two modals of data without adaptation.To address this,we propose a structural deep metric learning method on pixels and points to explore the relations and further utilize them to adaptively map the images and point clouds into a common canonical space for prediction.Extensive experiments on the widely used ScanNetV2 and S3DIS datasets verify the performance of the proposed SAFNet.
文摘Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments.
基金supported by the National Nature Science Foundation of China(No.U1803262,62176191,62171325)Nature Science Foundation of Hubei Province(2022CFB018)financially supported by fund from Hubei Province Key Laboratory of Intelligent Information Processing and Real‐time Industrial System(Wuhan University of Science and Technology)(ZNXX2022001).
文摘In existing remote sensing image retrieval(RSIR)datasets,the number of images among different classes varies dramatically,which leads to a severe class imbalance problem.Some studies propose to train the model with the ranking‐based metric(e.g.,average precision[AP]),because AP is robust to class imbalance.However,current AP‐based methods overlook an important issue:only optimising samples ranking before each positive sample,which is limited by the definition of AP and is prone to local optimum.To achieve global optimisation of AP,a novel method,namely Optimising Samples after positive ones&AP loss(OSAP‐Loss)is proposed in this study.Specifically,a novel superior ranking function is designed to make the AP loss differentiable while providing a tighter upper bound.Then,a novel loss called Optimising Samples after Positive ones(OSP)loss is proposed to involve all positive and negative samples ranking after each positive one and to provide a more flexible optimisation strategy for each sample.Finally,a graphics processing unit memory‐free mechanism is developed to thoroughly address the non‐decomposability of AP optimisation.Extensive experimental results on RSIR as well as conventional image retrieval datasets show the superiority and competitive performance of OSAP‐Loss compared to the state‐of‐the‐art.
文摘Clustering analysis is one of the main concerns in data mining.A common approach to the clustering process is to bring together points that are close to each other and separate points that are away from each other.Therefore,measuring the distance between sample points is crucial to the effectiveness of clustering.Filtering features by label information and mea-suring the distance between samples by these features is a common supervised learning method to reconstruct distance metric.However,in many application scenarios,it is very expensive to obtain a large number of labeled samples.In this paper,to solve the clustering problem in the few supervised sample and high data dimensionality scenarios,a novel semi-supervised clustering algorithm is proposed by designing an improved prototype network that attempts to reconstruct the distance metric in the sample space with a small amount of pairwise supervised information,such as Must-Link and Cannot-Link,and then cluster the data in the new metric space.The core idea is to make the similar ones closer and the dissimilar ones further away through embedding mapping.Extensive experiments on both real-world and synthetic datasets show the effectiveness of this algorithm.Average clustering metrics on various datasets improved by 8%compared to the comparison algorithm.
基金This work was supported in part by the CETC key laboratory of Aerospace Information Applications under Grant No.SXX19629X060.
文摘Target recognition based on deep learning relies on a large quantity of samples,but in some specific remote sensing scenes,the samples are very rare.Currently,few-shot learning can obtain high-performance target classification models using only a few samples,but most researches are based on the natural scene.Therefore,this paper proposes a metric-based few-shot classification technology in remote sensing.First,we constructed a dataset(RSD-FSC)for few-shot classification in remote sensing,which contained 21 classes typical target sample slices of remote sensing images.Second,based on metric learning,a k-nearest neighbor classification network is proposed,to find multiple training samples similar to the testing target,and then the similarity between the testing target and multiple similar samples is calculated to classify the testing target.Finally,the 5-way 1-shot,5-way 5-shot and 5-way 10-shot experiments are conducted to improve the generalization of the model on few-shot classification tasks.The experimental results show that for the newly emerged classes few-shot samples,when the number of training samples is 1,5 and 10,the average accuracy of target recognition can reach 59.134%,82.553%and 87.796%,respectively.It demonstrates that our proposed method can resolve few-shot classification in remote sensing image and perform better than other few-shot classification methods.
文摘Person re-identification has emerged as a hotspot for computer vision research due to the growing demands of social public safety requirements and the quick development of intelligent surveillance networks.Person re-identification(Re-ID)in video surveillance system can track and identify suspicious people,track and statistically analyze persons.The purpose of person re-identification is to recognize the same person in different cameras.Deep learning-based person re-identification research has produced numerous remarkable outcomes as a result of deep learning's growing popularity.The purpose of this paperis to help researchers better understand where person re-identification research is at the moment and where it is headed.Firstly,this paper arranges the widely used datasets and assessment criteria in person re-identification and reviews the pertinent research on deep learning-based person re-identification techniques conducted in the last several years.Then,the commonly used method techniques are also discussed from four aspects:appearance features,metric learning,local features,and adversarial learning.Finally,future research directions in the field of person re-identification are outlooked.