As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately l...As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability.展开更多
Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Netw...Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments.展开更多
This paper presents a trainable Generative Adversarial Network(GAN)-based end-to-end system for image dehazing,which is named the DehazeGAN.DehazeGAN can be used for edge computing-based applications,such as roadside ...This paper presents a trainable Generative Adversarial Network(GAN)-based end-to-end system for image dehazing,which is named the DehazeGAN.DehazeGAN can be used for edge computing-based applications,such as roadside monitoring.It adopts two networks:one is generator(G),and the other is discriminator(D).The G adopts the U-Net architecture,whose layers are particularly designed to incorporate the atmospheric scattering model of image dehazing.By using a reformulated atmospheric scattering model,the weights of the generator network are initialized by the coarse transmission map,and the biases are adaptively adjusted by using the previous round's trained weights.Since the details may be blurry after the fog is removed,the contrast loss is added to enhance the visibility actively.Aside from the typical GAN adversarial loss,the pixel-wise Mean Square Error(MSE)loss,the contrast loss and the dark channel loss are introduced into the generator loss function.Extensive experiments on benchmark images,the results of which are compared with those of several state-of-the-art methods,demonstrate that the proposed DehazeGAN performs better and is more effective.展开更多
The long-tailed data distribution poses an enormous challenge for training neural networks in classification.A classification network can be decoupled into a feature extractor and a classifier.This paper takes a semi-...The long-tailed data distribution poses an enormous challenge for training neural networks in classification.A classification network can be decoupled into a feature extractor and a classifier.This paper takes a semi-discrete optimal transport(OT)perspective to analyze the long-tailed classification problem,where the feature space is viewed as a continuous source domain,and the classifier weights are viewed as a discrete target domain.The classifier is indeed to find a cell decomposition of the feature space with each cell corresponding to one class.An imbalanced training set causes the more frequent classes to have larger volume cells,which means that the classifier's decision boundary is biased towards less frequent classes,resulting in reduced classification performance in the inference phase.Therefore,we propose a novel OTdynamic softmax loss,which dynamically adjusts the decision boundary in the training phase to avoid overfitting in the tail classes.In addition,our method incorporates the supervised contrastive loss so that the feature space can satisfy the uniform distribution condition.Extensive and comprehensive experiments demonstrate that our method achieves state-ofthe-art performance on multiple long-tailed recognition benchmarks,including CIFAR-LT,ImageNet-LT,iNaturalist 2018,and Places-LT.展开更多
Federated learning(FL)has gained significant attention for enabling privacy preservation and knowledge sharing by transmitting model parameters from clients to a central server.However,with increasing network scale an...Federated learning(FL)has gained significant attention for enabling privacy preservation and knowledge sharing by transmitting model parameters from clients to a central server.However,with increasing network scale and limited bandwidth,uploading complete model parameters has become increasingly impractical.To address this challenge,we leverage the high informativeness of prototypes—feature centroids representing samples of the same class—and propose federated prototype momentum contrastive learning(FedPMC).At the communication level,FedPMC reduces communication overhead by using prototypes as carriers instead of full model parameters.At the local model update level,to mitigate overfitting,we construct an expanded batch sample space to incorporate richer visual information,design a supervised contrastive loss between global and real-time local prototypes,and adopt momentum contrast to gradually update the model.At the framework level,to fully exploit the sample's feature space,we employ three different pre-trained models for feature extraction and concatenate their outputs as input to the local model.FedPMC supports personalized local models and utilizes both global and local prototypes to address data heterogeneity among clients.We evaluate FedPMC alongside other state-of-the-art FL algorithms on the Digit-5 dataset within a unified lightweight framework to assess their comparative performance.The code is available at https://github.com/zhy665/fedPMC.展开更多
Oil spill monitoring in remote sensing field has become a very popular technology to detect the spatial distribution of polluted regions.However,previous studies mainly focus on the supervised detection technologies,w...Oil spill monitoring in remote sensing field has become a very popular technology to detect the spatial distribution of polluted regions.However,previous studies mainly focus on the supervised detection technologies,which requires a large number of high-quality training set.To solve this problem,we propose a self-supervised learning method to learn the deep neural network from unlabelled hyperspectral data for oil spill detection,which consists of three parts:data augmentation,unsupervised deep feature learning,and oil spill detection network.First,the original image is augmented with spectral and spatial transformation to improve robustness of the self-supervised model.Then,the deep neural networks are trained on the augmented data without label information to produce the high-level semantic features.Finally,the pre-trained parameters are transferred to establish a neural network classifier to obtain the detection result,where a contrastive loss is developed to fine-tune the learned parameters so as to improve the generalization ability of the proposed method.Experiments performed on ten oil spill datasets reveal that the proposed method obtains promising detection performance with respect to other state-of-the-art hyperspectral detection approaches.展开更多
基金This study was funded by the Science and Technology Project in Xi’an(No.22GXFW0123)this work was supported by the Special Fund Construction Project of Key Disciplines in Ordinary Colleges and Universities in Shaanxi Province,the authors would like to thank the anonymous reviewers for their helpful comments and suggestions.
文摘As image manipulation technology advances rapidly,the malicious use of image tampering has alarmingly escalated,posing a significant threat to social stability.In the realm of image tampering localization,accurately localizing limited samples,multiple types,and various sizes of regions remains a multitude of challenges.These issues impede the model’s universality and generalization capability and detrimentally affect its performance.To tackle these issues,we propose FL-MobileViT-an improved MobileViT model devised for image tampering localization.Our proposed model utilizes a dual-stream architecture that independently processes the RGB and noise domain,and captures richer traces of tampering through dual-stream integration.Meanwhile,the model incorporating the Focused Linear Attention mechanism within the lightweight network(MobileViT).This substitution significantly diminishes computational complexity and resolves homogeneity problems associated with traditional Transformer attention mechanisms,enhancing feature extraction diversity and improving the model’s localization performance.To comprehensively fuse the generated results from both feature extractors,we introduce the ASPP architecture for multi-scale feature fusion.This facilitates a more precise localization of tampered regions of various sizes.Furthermore,to bolster the model’s generalization ability,we adopt a contrastive learning method and devise a joint optimization training strategy that leverages fused features and captures the disparities in feature distribution in tampered images.This strategy enables the learning of contrastive loss at various stages of the feature extractor and employs it as an additional constraint condition in conjunction with cross-entropy loss.As a result,overfitting issues are effectively alleviated,and the differentiation between tampered and untampered regions is enhanced.Experimental evaluations on five benchmark datasets(IMD-20,CASIA,NIST-16,Columbia and Coverage)validate the effectiveness of our proposed model.The meticulously calibrated FL-MobileViT model consistently outperforms numerous existing general models regarding localization accuracy across diverse datasets,demonstrating superior adaptability.
文摘Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(grant number NRF-2018R1D1A1B07043331).
文摘This paper presents a trainable Generative Adversarial Network(GAN)-based end-to-end system for image dehazing,which is named the DehazeGAN.DehazeGAN can be used for edge computing-based applications,such as roadside monitoring.It adopts two networks:one is generator(G),and the other is discriminator(D).The G adopts the U-Net architecture,whose layers are particularly designed to incorporate the atmospheric scattering model of image dehazing.By using a reformulated atmospheric scattering model,the weights of the generator network are initialized by the coarse transmission map,and the biases are adaptively adjusted by using the previous round's trained weights.Since the details may be blurry after the fog is removed,the contrast loss is added to enhance the visibility actively.Aside from the typical GAN adversarial loss,the pixel-wise Mean Square Error(MSE)loss,the contrast loss and the dark channel loss are introduced into the generator loss function.Extensive experiments on benchmark images,the results of which are compared with those of several state-of-the-art methods,demonstrate that the proposed DehazeGAN performs better and is more effective.
基金supported by the National Key Research and Development Program of China under Grant No.2021YFA1003003the National Natural Science Foundation of China under Grant Nos.61936002 and T2225012.
文摘The long-tailed data distribution poses an enormous challenge for training neural networks in classification.A classification network can be decoupled into a feature extractor and a classifier.This paper takes a semi-discrete optimal transport(OT)perspective to analyze the long-tailed classification problem,where the feature space is viewed as a continuous source domain,and the classifier weights are viewed as a discrete target domain.The classifier is indeed to find a cell decomposition of the feature space with each cell corresponding to one class.An imbalanced training set causes the more frequent classes to have larger volume cells,which means that the classifier's decision boundary is biased towards less frequent classes,resulting in reduced classification performance in the inference phase.Therefore,we propose a novel OTdynamic softmax loss,which dynamically adjusts the decision boundary in the training phase to avoid overfitting in the tail classes.In addition,our method incorporates the supervised contrastive loss so that the feature space can satisfy the uniform distribution condition.Extensive and comprehensive experiments demonstrate that our method achieves state-ofthe-art performance on multiple long-tailed recognition benchmarks,including CIFAR-LT,ImageNet-LT,iNaturalist 2018,and Places-LT.
基金supported in part by the Innovation Team Project of Guangdong Province of China(2024KCXTD017)Guangdong-Hong Kong-Macao University Joint Laboratory(2023LSYS005).
文摘Federated learning(FL)has gained significant attention for enabling privacy preservation and knowledge sharing by transmitting model parameters from clients to a central server.However,with increasing network scale and limited bandwidth,uploading complete model parameters has become increasingly impractical.To address this challenge,we leverage the high informativeness of prototypes—feature centroids representing samples of the same class—and propose federated prototype momentum contrastive learning(FedPMC).At the communication level,FedPMC reduces communication overhead by using prototypes as carriers instead of full model parameters.At the local model update level,to mitigate overfitting,we construct an expanded batch sample space to incorporate richer visual information,design a supervised contrastive loss between global and real-time local prototypes,and adopt momentum contrast to gradually update the model.At the framework level,to fully exploit the sample's feature space,we employ three different pre-trained models for feature extraction and concatenate their outputs as input to the local model.FedPMC supports personalized local models and utilizes both global and local prototypes to address data heterogeneity among clients.We evaluate FedPMC alongside other state-of-the-art FL algorithms on the Digit-5 dataset within a unified lightweight framework to assess their comparative performance.The code is available at https://github.com/zhy665/fedPMC.
基金supported by the National Natural Science Foundation of China (Grant No. 61890962 and 61871179)the Scientific Research Project of Hunan Education Department (Grant No. 19B105)+3 种基金the Natural Science Foundation of Hunan Province (Grant Nos. 2019JJ50036 and 2020GK2038)the National Key Research and Development Project (Grant No. 2021YFA0715203)the Hunan Provincial Natural Science Foundation for Distinguished Young Scholars (Grant No. 2021JJ022)the Huxiang Young Talents Science and Technology Innovation Program (Grant No. 2020RC3013)
文摘Oil spill monitoring in remote sensing field has become a very popular technology to detect the spatial distribution of polluted regions.However,previous studies mainly focus on the supervised detection technologies,which requires a large number of high-quality training set.To solve this problem,we propose a self-supervised learning method to learn the deep neural network from unlabelled hyperspectral data for oil spill detection,which consists of three parts:data augmentation,unsupervised deep feature learning,and oil spill detection network.First,the original image is augmented with spectral and spatial transformation to improve robustness of the self-supervised model.Then,the deep neural networks are trained on the augmented data without label information to produce the high-level semantic features.Finally,the pre-trained parameters are transferred to establish a neural network classifier to obtain the detection result,where a contrastive loss is developed to fine-tune the learned parameters so as to improve the generalization ability of the proposed method.Experiments performed on ten oil spill datasets reveal that the proposed method obtains promising detection performance with respect to other state-of-the-art hyperspectral detection approaches.