Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed ...Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025).展开更多
In the big data era,the surge in network traffic volume poses challenges for network management and cybersecurity.Network Traffic Classification(NTC)employs deep learning to categorize traffic data,aiding security and...In the big data era,the surge in network traffic volume poses challenges for network management and cybersecurity.Network Traffic Classification(NTC)employs deep learning to categorize traffic data,aiding security and analysis systems as Intrusion Detection Systems(IDS)and Intrusion Prevention Systems(IPS).However,current NTC methods,based on isolated network simulations,usually fail to adapt to new protocols and applications and ignore the effects of network conditions and user behavior on traffic patterns.To improve network traffic management insights,federated learning frameworks have been proposed to aggregate diverse traffic data for collaborative model training.This approach faces challenges like data integrity,label noise,packet loss,and skewed data distributions.While label noise can be mitigated through the use of sophisticated traffic labeling tools,other issues such as packet loss and skewed data distributions encountered in Network Packet Brokers(NPB)can severely impede the efficacy of federated learning algorithms.In this paper,we introduced the Robust Traffic Classifier with Federated Contrastive Learning(FC-RTC),combining federated and contrastive learning methods.Using the Supcon-Loss function from contrastive learning,FC-RTC distinguishes between similar and dissimilar samples.Training by sample pairs,FC-RTC effectively updates when receiving corrupted traffic data with packet loss or disorder.In cases of sample imbalance,contrastive loss functions for similar samples reduce model bias towards higher proportion data.By addressing uneven data distribution and packet loss,our system enhances its capability to adapt and perform accurately in real-world network traffic analysis,meeting the specific demands of this complex field.展开更多
Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the ...Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.展开更多
A groundbreaking method is introduced to leverage machine learn-ing algorithms to revolutionize the prediction of success rates for science fiction films.In the captivating world of the film industry,extensive researc...A groundbreaking method is introduced to leverage machine learn-ing algorithms to revolutionize the prediction of success rates for science fiction films.In the captivating world of the film industry,extensive research and accurate forecasting are vital to anticipating a movie’s triumph prior to its debut.Our study aims to harness the power of available data to estimate a film’s early success rate.With the vast resources offered by the internet,we can access a plethora of movie-related information,including actors,directors,critic reviews,user reviews,ratings,writers,budgets,genres,Facebook likes,YouTube views for movie trailers,and Twitter followers.The first few weeks of a film’s release are crucial in determining its fate,and online reviews and film evaluations profoundly impact its opening-week earnings.Hence,our research employs advanced supervised machine learning techniques to predict a film’s triumph.The Internet Movie Database(IMDb)is a comprehensive data repository for nearly all movies.A robust predictive classification approach is developed by employing various machine learning algorithms,such as fine,medium,coarse,cosine,cubic,and weighted KNN.To determine the best model,the performance of each feature was evaluated based on composite metrics.Moreover,the significant influences of social media platforms were recognized including Twitter,Instagram,and Facebook on shaping individuals’opinions.A hybrid success rating prediction model is obtained by integrating the proposed prediction models with sentiment analysis from available platforms.The findings of this study demonstrate that the chosen algorithms offer more precise estimations,faster execution times,and higher accuracy rates when compared to previous research.By integrating the features of existing prediction models and social media sentiment analysis models,our proposed approach provides a remarkably accurate prediction of a movie’s success.This breakthrough can help movie producers and marketers anticipate a film’s triumph before its release,allowing them to tailor their promotional activities accordingly.Furthermore,the adopted research lays the foundation for developing even more accurate prediction models,considering the ever-increasing significance of social media platforms in shaping individ-uals’opinions.In conclusion,this study showcases the immense potential of machine learning algorithms in predicting the success rate of science fiction films,opening new avenues for the film industry.展开更多
A novel adaptive support vector regression neural network (SVR-NN) is proposed, which combines respectively merits of support vector machines and a neural network. First, a support vector regression approach is appl...A novel adaptive support vector regression neural network (SVR-NN) is proposed, which combines respectively merits of support vector machines and a neural network. First, a support vector regression approach is applied to determine the initial structure and initial weights of the SVR-NN so that the network architecture is easily determined and the hidden nodes can adaptively be constructed based on support vectors. Furthermore, an annealing robust learning algorithm is presented to adjust these hidden node parameters as well as the weights of the SVR-NN. To test the validity of the proposed method, it is demonstrated that the adaptive SVR-NN can be used effectively for the identification of nonlinear dynamic systems. Simulation results show that the identification schemes based on the SVR-NN give considerably better performance and show faster learning in comparison to the previous neural network method.展开更多
Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate e...Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate each worker’s expertise, and aggregate over them to infer the latent true labels. In this paper, we show that as one of the major research directions, the noise transition matrix based worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation. To solve this problem, we propose a knowledge distillation framework (KD-Crowd) by combining the complementary strength of noise-model-free robust learning techniques and transition matrix based worker expertise modeling. The framework consists of two stages: in Stage 1, a noise-model-free robust student model is trained by treating the prediction of a transition matrix based crowdsourcing teacher model as noisy labels, aiming at correcting the teacher’s mistakes and obtaining better true label predictions;in Stage 2, we switch their roles, retraining a better crowdsourcing model using the crowds’ annotations supervised by the refined true label predictions given by Stage 1. Additionally, we propose one f-mutual information gain (MIG^(f)) based knowledge distillation loss, which finds the maximum information intersection between the student’s and teacher’s prediction. We show in experiments that MIG^(f) achieves obvious improvements compared to the regular KL divergence knowledge distillation loss, which tends to force the student to memorize all information of the teacher’s prediction, including its errors. We conduct extensive experiments showing that, as a universal framework, KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker expertise estimation.展开更多
We consider the problem of multi-task regression with time-varying low-rank patterns,where the collected data may be contaminated by heavy-tailed distributions and/or outliers.Our approach is based on a piecewise robu...We consider the problem of multi-task regression with time-varying low-rank patterns,where the collected data may be contaminated by heavy-tailed distributions and/or outliers.Our approach is based on a piecewise robust multi-task learning formulation,in which a robust loss function—not necessarily to be convex,but with a bounded derivative—is used,and each piecewise low-rank pattern is induced by a nuclear norm regularization term.We propose using the composite gradient descent algorithm to obtain stationary points within a data segment and employing the dynamic programming algorithm to determine the optimal segmentation.The theoretical properties of the detected number and time points of pattern shifts are studied under mild conditions.Numerical results confirm the effectiveness of our method.展开更多
Most real-world situations involve unavoidable measurement noises or perception errors which result in unsafe decision making or even casualty in autonomous driving.To address these issues and further improve safety,a...Most real-world situations involve unavoidable measurement noises or perception errors which result in unsafe decision making or even casualty in autonomous driving.To address these issues and further improve safety,automated driving is required to be capable of handling perception uncertainties.Here,this paper presents an observation-robust reinforcement learning against observational uncertainties to realize safe decision making for autonomous vehicles.Specifically,an adversarial agent is trained online to generate optimal adversarial attacks on observations,which attempts to amplify the average variation distance on perturbed policies.In addition,an observation-robust actor-critic approach is developed to enable the agent to learn the optimal policies and ensure that the changes of the policies perturbed by optimal adversarial attacks remain within a certain bound.Lastly,the safe decision making scheme is evaluated on a lane change task under complex highway traffic scenarios.The results show that the developed approach can ensure autonomous driving performance,as well as the policy robustness against adversarial attacks on observations.展开更多
Reinforcement learning(RL),one of three branches of machine learning,aims for autonomous learning and is now greatly driving the artificial intelligence development,especially in autonomous distributed systems,such as...Reinforcement learning(RL),one of three branches of machine learning,aims for autonomous learning and is now greatly driving the artificial intelligence development,especially in autonomous distributed systems,such as cooperative Boston Dynamics robots.However,robust RL has been a challenging problem of reliable aspects due to the gap between laboratory simulation and real world.Existing efforts have been made to approach this problem,such as performing random environmental perturbations in the learning process.However,one cannot guarantee to train with a positive perturbation as bad ones might bring failures to RL.In this work,we treat robust RL as a multi-task RL problem,and propose a curricular robust RL approach.We first present a generative adversarial network(GAN)based task generation model to iteratively output new tasks at the appropriate level of difficulty for the current policy.Furthermore,with these progressive tasks,we can realize curricular learning and finally obtain a robust policy.Extensive experiments in multiple environments demonstrate that our method improves the training stability and is robust to differences in training/test conditions.展开更多
Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data.Currently,many cross-modal retrieval methods have been proposed and have achieve...Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data.Currently,many cross-modal retrieval methods have been proposed and have achieved excellent results;however,these are trained with clean cross-modal pairs,which are semantically matched but costly,compared with easily available data with noise alignment(i.e.,paired but mismatched in semantics).When training these methods with noise-aligned data,the performance degrades dramatically.Therefore,we propose a robust cross-modal retrieval with alignment refurbishment(RCAR),which significantly reduces the impact of noise on the model.Specifically,RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable.Then,RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component.In addition,we define partial and complete noises in the noise-alignment paradigm.Experimental results show that,compared with the popular cross-modal retrieval methods,RCAR achieves more robust performance with both types of noise.展开更多
Transformers designed for natural language processing have originally been explored for computer vision in recent research. Various Vision Transformers(ViTs) play an increasingly important role in the field of image t...Transformers designed for natural language processing have originally been explored for computer vision in recent research. Various Vision Transformers(ViTs) play an increasingly important role in the field of image tasks such as computer vision, multimodal fusion and multimedia analysis. However, to obtain promising performance, most existing ViTs usually rely on artificially filtered high-quality images, which may suffer from inherent noise risk.Generally, such well-constructed images are not always available in every situation. To this end,we propose a Robust ViT(RViT) to focus on the relevant and robust representation learning for image classification tasks. Specifically, we first develop a novel Denoising VTUnet module,where we conceptualize the nonrobust noise as the uncertainty under the variational conditions.Furthermore, we design a fusion transformer backbone with a tailored fusion attention mechanism to perform image classification based on the extracted robust representations effectively. To demonstrate the superiority of our model, the compared experiments are conducted on several popular datasets. Benefiting from the sequence regularity of the Transformer and captured robust feature,the proposed method exceeds compared Transformer-based models with superior performance in visual tasks.展开更多
基金funded by the Institute of Information&CommunicationsTechnology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT),grant number 2021-0-01341.
文摘Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems.The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference,such as background noise,overlapping speech,and visual obstructions.The proposed framework employs a structured fusion approach,combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms.Audio data are transformed into mel-spectrogram representations,and visual data are represented as raw frame sequences.Spatial and temporal features are extracted through convolutional and transformer-based encoders,allowing the framework to capture complementary and hierarchical information fromboth sources.Across-modal attentionmodule enables selective emphasis on relevant signals while suppressing modality-specific noise.Performance is validated on a modified version of the AFEW dataset,in which controlled noise is introduced to emulate realistic conditions.The framework achieves higher classification accuracy than comparative baselines,confirming increased robustness under conditions of cross-modal disruption.This result demonstrates the suitability of the proposed method for deployment in practical emotion-aware technologies operating outside controlled environments.The study also contributes a systematic approach to fusion design and supports further exploration in the direction of resilientmultimodal emotion analysis frameworks.The source code is publicly available at https://github.com/asmoon002/AVER(accessed on 18 August 2025).
基金supported by the Joint Funds of the National Natural Science Foundation of China under grant No.U22B2025.
文摘In the big data era,the surge in network traffic volume poses challenges for network management and cybersecurity.Network Traffic Classification(NTC)employs deep learning to categorize traffic data,aiding security and analysis systems as Intrusion Detection Systems(IDS)and Intrusion Prevention Systems(IPS).However,current NTC methods,based on isolated network simulations,usually fail to adapt to new protocols and applications and ignore the effects of network conditions and user behavior on traffic patterns.To improve network traffic management insights,federated learning frameworks have been proposed to aggregate diverse traffic data for collaborative model training.This approach faces challenges like data integrity,label noise,packet loss,and skewed data distributions.While label noise can be mitigated through the use of sophisticated traffic labeling tools,other issues such as packet loss and skewed data distributions encountered in Network Packet Brokers(NPB)can severely impede the efficacy of federated learning algorithms.In this paper,we introduced the Robust Traffic Classifier with Federated Contrastive Learning(FC-RTC),combining federated and contrastive learning methods.Using the Supcon-Loss function from contrastive learning,FC-RTC distinguishes between similar and dissimilar samples.Training by sample pairs,FC-RTC effectively updates when receiving corrupted traffic data with packet loss or disorder.In cases of sample imbalance,contrastive loss functions for similar samples reduce model bias towards higher proportion data.By addressing uneven data distribution and packet loss,our system enhances its capability to adapt and perform accurately in real-world network traffic analysis,meeting the specific demands of this complex field.
文摘Reinforcement Learning(RL)algorithms enhance intelligence of air combat AutonomousManeuver Decision(AMD)policy,but they may underperform in target combat environmentswith disturbances.To enhance the robustness of the AMD strategy learned by RL,thisstudy proposes a Tube-based Robust RL(TRRL)method.First,this study introduces a tube todescribe reachable trajectories under disturbances,formulates a method for calculating tubes basedon sum-of-squares programming,and proposes the TRRL algorithm that enhances robustness byutilizing tube size as a quantitative indicator.Second,this study introduces offline techniques forregressing the tube size function and establishing a tube library before policy learning,aiming toeliminate complex online tube solving and reduce the computational burden during training.Furthermore,an analysis of the tube library demonstrates that the mitigated AMD strategy achievesgreater robustness,as smaller tube sizes correspond to more cautious actions.This finding highlightsthat TRRL enhances robustness by promoting a conservative policy.To effectively balanceaggressiveness and robustness,the proposed TRRL algorithm introduces a“laziness factor”as aweight of robustness.Finally,combat simulations in an environment with disturbances confirm thatthe AMD policy learned by the TRRL algorithm exhibits superior air combat performance comparedto selected robust RL baselines.
文摘A groundbreaking method is introduced to leverage machine learn-ing algorithms to revolutionize the prediction of success rates for science fiction films.In the captivating world of the film industry,extensive research and accurate forecasting are vital to anticipating a movie’s triumph prior to its debut.Our study aims to harness the power of available data to estimate a film’s early success rate.With the vast resources offered by the internet,we can access a plethora of movie-related information,including actors,directors,critic reviews,user reviews,ratings,writers,budgets,genres,Facebook likes,YouTube views for movie trailers,and Twitter followers.The first few weeks of a film’s release are crucial in determining its fate,and online reviews and film evaluations profoundly impact its opening-week earnings.Hence,our research employs advanced supervised machine learning techniques to predict a film’s triumph.The Internet Movie Database(IMDb)is a comprehensive data repository for nearly all movies.A robust predictive classification approach is developed by employing various machine learning algorithms,such as fine,medium,coarse,cosine,cubic,and weighted KNN.To determine the best model,the performance of each feature was evaluated based on composite metrics.Moreover,the significant influences of social media platforms were recognized including Twitter,Instagram,and Facebook on shaping individuals’opinions.A hybrid success rating prediction model is obtained by integrating the proposed prediction models with sentiment analysis from available platforms.The findings of this study demonstrate that the chosen algorithms offer more precise estimations,faster execution times,and higher accuracy rates when compared to previous research.By integrating the features of existing prediction models and social media sentiment analysis models,our proposed approach provides a remarkably accurate prediction of a movie’s success.This breakthrough can help movie producers and marketers anticipate a film’s triumph before its release,allowing them to tailor their promotional activities accordingly.Furthermore,the adopted research lays the foundation for developing even more accurate prediction models,considering the ever-increasing significance of social media platforms in shaping individ-uals’opinions.In conclusion,this study showcases the immense potential of machine learning algorithms in predicting the success rate of science fiction films,opening new avenues for the film industry.
文摘A novel adaptive support vector regression neural network (SVR-NN) is proposed, which combines respectively merits of support vector machines and a neural network. First, a support vector regression approach is applied to determine the initial structure and initial weights of the SVR-NN so that the network architecture is easily determined and the hidden nodes can adaptively be constructed based on support vectors. Furthermore, an annealing robust learning algorithm is presented to adjust these hidden node parameters as well as the weights of the SVR-NN. To test the validity of the proposed method, it is demonstrated that the adaptive SVR-NN can be used effectively for the identification of nonlinear dynamic systems. Simulation results show that the identification schemes based on the SVR-NN give considerably better performance and show faster learning in comparison to the previous neural network method.
基金supported by the National Key R&D Program of China(2022ZD0114801)the National Natural Science Foundation of China(Grant No.61906089)the Jiangsu Province Basic Research Program(BK20190408).
文摘Recently, crowdsourcing has established itself as an efficient labeling solution by distributing tasks to crowd workers. As the workers can make mistakes with diverse expertise, one core learning task is to estimate each worker’s expertise, and aggregate over them to infer the latent true labels. In this paper, we show that as one of the major research directions, the noise transition matrix based worker expertise modeling methods commonly overfit the annotation noise, either due to the oversimplified noise assumption or inaccurate estimation. To solve this problem, we propose a knowledge distillation framework (KD-Crowd) by combining the complementary strength of noise-model-free robust learning techniques and transition matrix based worker expertise modeling. The framework consists of two stages: in Stage 1, a noise-model-free robust student model is trained by treating the prediction of a transition matrix based crowdsourcing teacher model as noisy labels, aiming at correcting the teacher’s mistakes and obtaining better true label predictions;in Stage 2, we switch their roles, retraining a better crowdsourcing model using the crowds’ annotations supervised by the refined true label predictions given by Stage 1. Additionally, we propose one f-mutual information gain (MIG^(f)) based knowledge distillation loss, which finds the maximum information intersection between the student’s and teacher’s prediction. We show in experiments that MIG^(f) achieves obvious improvements compared to the regular KL divergence knowledge distillation loss, which tends to force the student to memorize all information of the teacher’s prediction, including its errors. We conduct extensive experiments showing that, as a universal framework, KD-Crowd substantially improves previous crowdsourcing methods on true label prediction and worker expertise estimation.
基金supported by the National Key R&D Program of China(Grant Nos.2022YFA1003703,2022YFA 1003800)the National Natural Science Foundation of China(Grant Nos.11925106,12231011,11931001,12226007,12326325)+2 种基金supported by the National Natural Science Foundation of China(Grant No.12301380)supported by the National Key R&D Program of China(Grant Nos.2021YFA1000100,2021YFA1000101,2022YFA1003800)the Natural Science Foundation of Shanghai(Grant No.23ZR1419400)。
文摘We consider the problem of multi-task regression with time-varying low-rank patterns,where the collected data may be contaminated by heavy-tailed distributions and/or outliers.Our approach is based on a piecewise robust multi-task learning formulation,in which a robust loss function—not necessarily to be convex,but with a bounded derivative—is used,and each piecewise low-rank pattern is induced by a nuclear norm regularization term.We propose using the composite gradient descent algorithm to obtain stationary points within a data segment and employing the dynamic programming algorithm to determine the optimal segmentation.The theoretical properties of the detected number and time points of pattern shifts are studied under mild conditions.Numerical results confirm the effectiveness of our method.
基金supported by Foundation of State Key Laboratory of Automotive Simulation and Control.
文摘Most real-world situations involve unavoidable measurement noises or perception errors which result in unsafe decision making or even casualty in autonomous driving.To address these issues and further improve safety,automated driving is required to be capable of handling perception uncertainties.Here,this paper presents an observation-robust reinforcement learning against observational uncertainties to realize safe decision making for autonomous vehicles.Specifically,an adversarial agent is trained online to generate optimal adversarial attacks on observations,which attempts to amplify the average variation distance on perturbed policies.In addition,an observation-robust actor-critic approach is developed to enable the agent to learn the optimal policies and ensure that the changes of the policies perturbed by optimal adversarial attacks remain within a certain bound.Lastly,the safe decision making scheme is evaluated on a lane change task under complex highway traffic scenarios.The results show that the developed approach can ensure autonomous driving performance,as well as the policy robustness against adversarial attacks on observations.
基金supported by the National Natural Science Foundation of China (Nos.61972025,61802389,61672092,U1811264,and 61966009)the National Key R&D Program of China (Nos.2020YFB1005604 and 2020YFB2103802).
文摘Reinforcement learning(RL),one of three branches of machine learning,aims for autonomous learning and is now greatly driving the artificial intelligence development,especially in autonomous distributed systems,such as cooperative Boston Dynamics robots.However,robust RL has been a challenging problem of reliable aspects due to the gap between laboratory simulation and real world.Existing efforts have been made to approach this problem,such as performing random environmental perturbations in the learning process.However,one cannot guarantee to train with a positive perturbation as bad ones might bring failures to RL.In this work,we treat robust RL as a multi-task RL problem,and propose a curricular robust RL approach.We first present a generative adversarial network(GAN)based task generation model to iteratively output new tasks at the appropriate level of difficulty for the current policy.Furthermore,with these progressive tasks,we can realize curricular learning and finally obtain a robust policy.Extensive experiments in multiple environments demonstrate that our method improves the training stability and is robust to differences in training/test conditions.
基金supported by the National Natural Science Foundation of China(No.12172186)。
文摘Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data.Currently,many cross-modal retrieval methods have been proposed and have achieved excellent results;however,these are trained with clean cross-modal pairs,which are semantically matched but costly,compared with easily available data with noise alignment(i.e.,paired but mismatched in semantics).When training these methods with noise-aligned data,the performance degrades dramatically.Therefore,we propose a robust cross-modal retrieval with alignment refurbishment(RCAR),which significantly reduces the impact of noise on the model.Specifically,RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable.Then,RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component.In addition,we define partial and complete noises in the noise-alignment paradigm.Experimental results show that,compared with the popular cross-modal retrieval methods,RCAR achieves more robust performance with both types of noise.
文摘Transformers designed for natural language processing have originally been explored for computer vision in recent research. Various Vision Transformers(ViTs) play an increasingly important role in the field of image tasks such as computer vision, multimodal fusion and multimedia analysis. However, to obtain promising performance, most existing ViTs usually rely on artificially filtered high-quality images, which may suffer from inherent noise risk.Generally, such well-constructed images are not always available in every situation. To this end,we propose a Robust ViT(RViT) to focus on the relevant and robust representation learning for image classification tasks. Specifically, we first develop a novel Denoising VTUnet module,where we conceptualize the nonrobust noise as the uncertainty under the variational conditions.Furthermore, we design a fusion transformer backbone with a tailored fusion attention mechanism to perform image classification based on the extracted robust representations effectively. To demonstrate the superiority of our model, the compared experiments are conducted on several popular datasets. Benefiting from the sequence regularity of the Transformer and captured robust feature,the proposed method exceeds compared Transformer-based models with superior performance in visual tasks.