This paper proposes a zero-shot based spatial recognition AI algorithm by fusing and developing multidimensional vision identification technology adapted to the situation in large indoor and underground spaces.With th...This paper proposes a zero-shot based spatial recognition AI algorithm by fusing and developing multidimensional vision identification technology adapted to the situation in large indoor and underground spaces.With the expansion of large shopping malls and underground urban spaces(UUS),there is an increasing need for new technologies that can quickly identify complex indoor structures and changes such as relocation,remodeling,and construction for the safety and management of citizens through the provision of the up-to-date indoor 3D site maps.The proposed algorithm utilizes data collected by an unmanned robot to create a 3D site map of the up-to-date indoor site and recognizes complex indoor spaces based on zero-shot learning.This research specifically addresses two major challenges:the difficulty of detecting walls and floors due to complex patterns and the difficulty of spatial perception due to unknown obstacles.The proposed algorithm addresses the limitations of the existing foundation model,detects floors and obstacles without expensive sensors,and improves the accuracy of spatial recognition by combining floor detection,vanishing point detection,and fusion obstacle detection algorithms.The experimental results show that the algorithm effectively detects the floor and obstacles in various indoor environments,with F1 scores of 0.96 and 0.93 in the floor detection and obstacle detection experiments,respectively.展开更多
With the availability of high-performance computing technology and the development of advanced numerical simulation methods, Computational Fluid Dynamics (CFD) is becoming more and more practical and efficient in engi...With the availability of high-performance computing technology and the development of advanced numerical simulation methods, Computational Fluid Dynamics (CFD) is becoming more and more practical and efficient in engineering. As one of the high-precision representative algorithms, the high-order Discontinuous Galerkin Method (DGM) has not only attracted widespread attention from scholars in the CFD research community, but also received strong development. However, when DGM is extended to high-speed aerodynamic flow field calculations, non-physical numerical Gibbs oscillations near shock waves often significantly affect the numerical accuracy and even cause calculation failure. Data driven approaches based on machine learning techniques can be used to learn the characteristics of Gibbs noise, which motivates us to use it in high-speed DG applications. To achieve this goal, labeled data need to be generated in order to train the machine learning models. This paper proposes a new method for denoising modeling of Gibbs phenomenon using a machine learning technique, the zero-shot learning strategy, to eliminate acquiring large amounts of CFD data. The model adopts a graph convolutional network combined with graph attention mechanism to learn the denoising paradigm from synthetic Gibbs noise data and generalize to DGM numerical simulation data. Numerical simulation results show that the Gibbs denoising model proposed in this paper can suppress the numerical oscillation near shock waves in the high-order DGM. Our work automates the extension of DGM to high-speed aerodynamic flow field calculations with higher generalization and lower cost.展开更多
The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,thes...The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,these often struggle with flexibility and adaptability in unknown environments.On the other hand,recent advances in Reinforcement Learning offer promising approaches,yet a significant gap in the literature remains when it comes to generalization over a large number of parameters.This paper presents a unified,generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques.The novelty of the framework comes from the design of an observation space that accommodates different map sizes,an action masking scheme that guarantees safety and robustness while also serving as a learning-fromdemonstration technique during training,and a unique reward function that yields value functions that are size-invariant.These are coupled with a curriculum learning-based training strategy and parametric environment randomization,enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes,configurations,sensor payloads,and sub-tasks.Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training,outperforming a greedy heuristic by sixfold.Furthermore,in out-of-distribution environments,our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios,paving the way for generalizable and adaptable path-planning algorithms.展开更多
Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step...Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.展开更多
Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning dis...Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zeroshot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot finegrained classification.展开更多
Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof ...Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof different types of features and domain shift problems are two of the critical issues in zero-shot learning. Toaddress both of these issues, this paper proposes a new modeling structure. The traditional approach mappedsemantic features and visual features into the same feature space;based on this, a dual discriminator approachis used in the proposed model. This dual discriminator approach can further enhance the consistency betweensemantic and visual features. At the same time, this approach can also align unseen class semantic features andtraining set samples, providing a portion of information about the unseen classes. In addition, a new feature fusionmethod is proposed in the model. This method is equivalent to adding perturbation to the seen class features,which can reduce the degree to which the classification results in the model are biased towards the seen classes.At the same time, this feature fusion method can provide part of the information of the unseen classes, improvingits classification accuracy in generalized zero-shot learning and reducing domain bias. The proposed method isvalidated and compared with othermethods on four datasets, and fromthe experimental results, it can be seen thatthe method proposed in this paper achieves promising results.展开更多
Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Netw...Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments.展开更多
The goal of zero-shot recognition is to classify classes it has never seen before, which needs to build a bridge between seen and unseen classes through semantic embedding space. Therefore, semantic embedding space le...The goal of zero-shot recognition is to classify classes it has never seen before, which needs to build a bridge between seen and unseen classes through semantic embedding space. Therefore, semantic embedding space learning plays an important role in zero-shot recognition. Among existing works, semantic embedding space is mainly taken by user-defined attribute vectors. However, the discriminative information included in the user-defined attribute vector is limited. In this paper, we propose to learn an extra latent attribute space automatically to produce a more generalized and discriminative semantic embedded space. To prevent the bias problem, both user-defined attribute vector and latent attribute space are optimized by adversarial learning with auto-encoders. We also propose to reconstruct semantic patterns produced by explanatory graphs, which can make semantic embedding space more sensitive to usefully semantic information and less sensitive to useless information. The proposed method is evaluated on the AwA2 and CUB dataset. These results show that our proposed method achieves superior performance.展开更多
Fluorescence microscopy image(FMI)denoising faces critical challenges because of the compound mixed Poisson-Gaussian noise with strong spatial correlation and the impracticality of acquiring paired noisy/clean data in...Fluorescence microscopy image(FMI)denoising faces critical challenges because of the compound mixed Poisson-Gaussian noise with strong spatial correlation and the impracticality of acquiring paired noisy/clean data in dynamic biomedical scenarios.While supervised methods trained on synthetic noise(e.g.,Gaussian/Poisson)suffer from out-of-distribution generalization issues,existing self-supervised approaches degrade under real FMI noise because they oversimplify noise assumptions and computationally intensive deep architectures.In this work,we propose fluorescence micrograph to self(FM2S),a zero-shot denoiser that achieves efficient FMI denoising through three key innovations:1)A noise injection module that ensures training data sufficiency through adaptive Poisson-Gaussian synthesis while preserving spatial correlation and global statistics of FMI noise for robust model generalization;2)A two-stage proactive learning strategy that first recovers structural priors via predenoised targets and then refines high-frequency details through noise distribution alignment;3)An ultralight-weight network(3.5 k parameters)enabling rapid convergence with 270×faster training and inference than state-of-the-art(SOTA).Extensive experiments across FMI datasets demonstrate FM2S’superiority:It outperforms CVF-SID by 1.4 dB in peak signal-to-noise ratio(PSNR)on average while requiring 0.1%of the parameters of the AP-BSN.Notably,FM2S maintains stable performance across varying noise levels,indicating its practicality for microscopy platforms with diverse sensor characteristics.The code and datasets can be found at https://github.com/Danielement321/FM2S.展开更多
In recent years,multi-label zero-shot learning(ML-ZSL)has garnered increasing attention because of its wide range of potential applications,such as image annotation,text classification,and bioinformatics.The central c...In recent years,multi-label zero-shot learning(ML-ZSL)has garnered increasing attention because of its wide range of potential applications,such as image annotation,text classification,and bioinformatics.The central challenge in ML-ZSL lies in predicting multiple labels for unseen classes without requiring any labeled training data,which contrasts with conventional supervised learning paradigms.However,existing methods face several significant challenges.These include the substantial semantic gap between different modalities,which impedes effective knowledge transfer,and the intricate and typically complex relationships among multiple labels,making it difficult to model them in a meaningful and accurate manner.To overcome these challenges,we propose a graph-augmented multimodal chain-of-thought(GMCoT)reasoning approach.The proposed method combines the strengths of multimodal large language models with graph-based structures,significantly enhancing the reasoning process involved in multi-label prediction.First,a novel multimodal chain-of-thought reasoning framework is presented which imitates human-like step-by-step reasoning to produce multi-label predictions.Second,a technique is presented for integrating label graphs into the reasoning process.This technique enables the capture of complex semantic relationships among labels,thereby improving the accuracy and consistency of multi-label generation.Comprehensive experiments on benchmark datasets demonstrate that the proposed GMCoT approach outperforms state-of-the-art methods in ML-ZSL.展开更多
Zero-shot learning(ZSL)is an important and rapidly growing area of machine learning that aims to recognize new classes without prior training data.Despite its significance,ZSL has faced challenges with overfitting in ...Zero-shot learning(ZSL)is an important and rapidly growing area of machine learning that aims to recognize new classes without prior training data.Despite its significance,ZSL has faced challenges with overfitting in embedding-based methods and limitations in traditional one-directional attention(ODA)based approaches.To bridge these gaps,this paper proposes the use of bi-directional attention(BDA)to integrate insights from both embedding and attention-based approaches.The proposed BDA system consists of a bi-directional attention network(BDAN)and a synthesized visual embedding network(SVEN)that facilitates visual-semantic interaction for ZSL classification.More specifically,the BDAN employs region self-attention(RSA),semantic synthesis attention(SSA),and visual synthesis attention(VSA)to overcome the overfitting issue in embedding methods and enhance transferability,to associate visual features with semantic property information,and to learn locally improved visual features.Extensive testing on CUB,SUN,and AWA2 datasets confirm the superiority of our proposed method over traditional approaches.展开更多
Entity linking(EL)plays a crucial role in natural language processing(NLP)NLP tasks by linking ambiguous entity mentions to relevant entities in a knowledge base.Due to the inconsistency in data distribution across di...Entity linking(EL)plays a crucial role in natural language processing(NLP)NLP tasks by linking ambiguous entity mentions to relevant entities in a knowledge base.Due to the inconsistency in data distribution across diverse domains,it is difficult to accurately estimate the overall data distribution of the target domain,resulting in the zero-shot scenarios with a significant decrease in generalization performance.Currently,existing works primarily focus on sampling and incorporating fine-grained information to deal with above issue.Unfortunately,they may face either significant computational cost of negative samples for sampling strategy,or shortcomings in interaction between coarse and fine-grained information.To tackle these challenges,in this paper,we propose a Multi-Task Framework with Anchor Point Sampling(MAPS).Specifically,for the anchor point sampling(APS)part,with considering fine-grained information,we pre-bind mention-entity pairs based on prior conditions(e.g.,entity type)to introduce challenging negative samples and modifies the conditional distribution.In this way,the optimal trade-off between computational effectiveness and efficiency will be reached.Moreover,we propose a novel multi-task framework that shares coarse-grained information at a lower level,and utilizes multiple extractors to extract fine-grained information at a higher level.By combining the multi-task framework and various APS approaches,comprehensive fusion of coarse and fine-grained information will be finally achieved.Experimental results on the benchmark dataset ZESHEL demonstrate that MAPS significantly outperforms the competitive baselines.展开更多
Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep lea...Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep learning and increased computational power have led to significant improvements in object detection systems,achieving high recognition accuracy on benchmark datasets.However,these systems remain limited in real-world applications due to the scarcity of labeled training samples,making it difficult to detect unseen classes.To address this,researchers have explored various approaches,yielding promising progress.This article provides a comprehensive review of the current state of ZSD,distinguishing four related methods—zero-shot,open-vocabulary,open-set,and open-world approaches—based on task objectives and data usage.We highlight representative methods,discuss the technical challenges within each framework,and summarize the commonly used evaluation metrics,benchmark datasets,and experimental results.Our review aims to offer readers a clear overview of the latest developments and performance trends in ZSD.展开更多
Knowledge-based Visual Question Answering(VQA)is a challenging task that requires models to access external knowledge for reasoning.Large Language Models(LLMs)have recently been employed for zero-shot knowledge-based ...Knowledge-based Visual Question Answering(VQA)is a challenging task that requires models to access external knowledge for reasoning.Large Language Models(LLMs)have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities.However,LLMs are commonly perceived as implicit knowledge bases,and their generative and in-context learning potential remains underutilized.Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts.In light of this,we propose Knowledge Generation with Frozen Language Models(KGFLM),a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA.Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order,thereby activating the frozen LLM to produce more useful knowledge statements for better predictions.The generated knowledge statements can also serve as interpretable rationales.In our method,the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question,without requiring additional annotations.Furthermore,a series of experiments are conducted on A-OKVQA and OKVQA datasets.The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.展开更多
Predicting the behavior of renewable energy systems requires models capable of generating accurate forecasts from limited historical data,a challenge that becomes especially pronounced when commissioning new facil-iti...Predicting the behavior of renewable energy systems requires models capable of generating accurate forecasts from limited historical data,a challenge that becomes especially pronounced when commissioning new facil-ities where operational records are scarce.This review aims to synthesize recent progress in data-efficient deep learning approaches for addressing such“cold-start”forecasting problems.It primarily covers three interrelated domains—solar photovoltaic(PV),wind power,and electrical load forecasting—where data scarcity and operational variability are most critical,while also including representative studies on hydropower and carbon emission prediction to provide a broader systems perspective.To this end,we examined trends from over 150 predominantly peer-reviewed studies published between 2019 and mid-2025,highlighting advances in zero-shot and few-shot meta-learning frameworks that enable rapid model adaptation with minimal labeled data.Moreover,transfer learning approaches combined with spatiotemporal graph neural networks have been employed to transfer knowledge from existing energy assets to new,data-sparse environments,effectively capturing hidden dependencies among geographic features,meteorological dynamics,and grid structures.Synthetic data generation has further proven valuable for expanding training samples and mitigating overfitting in cold-start scenarios.In addition,large language models and explainable artificial intelligence(XAI)—notably conversational XAI systems—have been used to interpret and communicate complex model behaviors in accessible terms,fostering operator trust from the earliest deployment stages.By consolidating methodological advances,unresolved challenges,and open-source resources,this review provides a coherent overview of deep learning strategies that can shorten the data-sparse ramp-up period of new energy infrastructures and accelerate the transition toward resilient,low-carbon electricity grids.展开更多
基金supported by Kyonggi University Research Grant 2024.
文摘This paper proposes a zero-shot based spatial recognition AI algorithm by fusing and developing multidimensional vision identification technology adapted to the situation in large indoor and underground spaces.With the expansion of large shopping malls and underground urban spaces(UUS),there is an increasing need for new technologies that can quickly identify complex indoor structures and changes such as relocation,remodeling,and construction for the safety and management of citizens through the provision of the up-to-date indoor 3D site maps.The proposed algorithm utilizes data collected by an unmanned robot to create a 3D site map of the up-to-date indoor site and recognizes complex indoor spaces based on zero-shot learning.This research specifically addresses two major challenges:the difficulty of detecting walls and floors due to complex patterns and the difficulty of spatial perception due to unknown obstacles.The proposed algorithm addresses the limitations of the existing foundation model,detects floors and obstacles without expensive sensors,and improves the accuracy of spatial recognition by combining floor detection,vanishing point detection,and fusion obstacle detection algorithms.The experimental results show that the algorithm effectively detects the floor and obstacles in various indoor environments,with F1 scores of 0.96 and 0.93 in the floor detection and obstacle detection experiments,respectively.
基金co-supported by the Aeronautical Science Foundation of China(Nos.2018ZA52002,2019ZA052011).
文摘With the availability of high-performance computing technology and the development of advanced numerical simulation methods, Computational Fluid Dynamics (CFD) is becoming more and more practical and efficient in engineering. As one of the high-precision representative algorithms, the high-order Discontinuous Galerkin Method (DGM) has not only attracted widespread attention from scholars in the CFD research community, but also received strong development. However, when DGM is extended to high-speed aerodynamic flow field calculations, non-physical numerical Gibbs oscillations near shock waves often significantly affect the numerical accuracy and even cause calculation failure. Data driven approaches based on machine learning techniques can be used to learn the characteristics of Gibbs noise, which motivates us to use it in high-speed DG applications. To achieve this goal, labeled data need to be generated in order to train the machine learning models. This paper proposes a new method for denoising modeling of Gibbs phenomenon using a machine learning technique, the zero-shot learning strategy, to eliminate acquiring large amounts of CFD data. The model adopts a graph convolutional network combined with graph attention mechanism to learn the denoising paradigm from synthetic Gibbs noise data and generalize to DGM numerical simulation data. Numerical simulation results show that the Gibbs denoising model proposed in this paper can suppress the numerical oscillation near shock waves in the high-order DGM. Our work automates the extension of DGM to high-speed aerodynamic flow field calculations with higher generalization and lower cost.
基金supported by project RELIABLE(PTDC/EEI-AUT/3522/2020)R&D Unit SYSTEC-Base(UIDB001472020)+1 种基金Programmatic(UIDP001472020)funds-and Associate Laboratory Advanced Production and Intelligent Systems ARISE-LAP01122020funded by national funds through the FCT/MCTES(PIDDAC).
文摘The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,these often struggle with flexibility and adaptability in unknown environments.On the other hand,recent advances in Reinforcement Learning offer promising approaches,yet a significant gap in the literature remains when it comes to generalization over a large number of parameters.This paper presents a unified,generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques.The novelty of the framework comes from the design of an observation space that accommodates different map sizes,an action masking scheme that guarantees safety and robustness while also serving as a learning-fromdemonstration technique during training,and a unique reward function that yields value functions that are size-invariant.These are coupled with a curriculum learning-based training strategy and parametric environment randomization,enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes,configurations,sensor payloads,and sub-tasks.Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training,outperforming a greedy heuristic by sixfold.Furthermore,in out-of-distribution environments,our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios,paving the way for generalizable and adaptable path-planning algorithms.
基金National Natural Science Foundation of China(No.62176052)。
文摘Large language models(LLMs)have demonstrated remarkable generalization abilities across multiple tasks in natural language processing(NLP).For multi-step reasoning tasks,chain-of-thought(CoT)prompting facilitates step-by-step thinking,leading to improved performance.However,despite significant advancements in LLMs,current CoT prompting performs suboptimally on smaller-scale models that have fewer parameters.Additionally,the common paradigm of few-shot CoT prompting relies on a set of manual demonstrations,with performance contingent on the quality of these annotations and varying with task-specific requirements.To address these limitations,we propose a select-and-answer prompting method(SAP)to enhance language model performance on reasoning tasks without the need for manual demonstrations.This method comprises two primary steps:guiding the model to conduct preliminary analysis and generate several candidate answers based on the prompting;allowing the model to provide final answers derived from these candidate answers.The proposed prompting strategy is evaluated across two language models of varying sizes and six datasets.On ChatGLM-6B,SAP consistently outperforms few-shot CoT across all datasets.For GPT-3.5,SAP achieves comparable performance to few-shot CoT and outperforms zero-shot CoT in most cases.These experimental results indicate that SAP can significantly improve the accuracy of language models in reasoning tasks.
基金supported by National Basic Research Program of China (973 Program) (No. 2015CB352502)National Nature Science Foundation of China (No. 61573026)Beijing Nature Science Foundation (No. L172037)
文摘Fine-grained image classification, which aims to distinguish images with subtle distinctions, is a challenging task for two main reasons: lack of sufficient training data for every class and difficulty in learning discriminative features for representation. In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i.e., zeroshot fine-grained classification. In the first feature learning phase, we finetune deep convolutional neural networks using hierarchical semantic structure among fine-grained classes to extract discriminative deep visual features. Meanwhile, a domain adaptation structure is induced into deep convolutional neural networks to avoid domain shift from training data to test data. In the second label inference phase, a semantic directed graph is constructed over attributes of fine-grained classes. Based on this graph, we develop a label propagation algorithm to infer the labels of images in the unseen classes. Experimental results on two benchmark datasets demonstrate that our model outperforms the state-of-the-art zero-shot learning models. In addition, the features obtained by our feature learning model also yield significant gains when they are used by other zero-shot learning models, which shows the flexility of our model in zero-shot finegrained classification.
文摘Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof different types of features and domain shift problems are two of the critical issues in zero-shot learning. Toaddress both of these issues, this paper proposes a new modeling structure. The traditional approach mappedsemantic features and visual features into the same feature space;based on this, a dual discriminator approachis used in the proposed model. This dual discriminator approach can further enhance the consistency betweensemantic and visual features. At the same time, this approach can also align unseen class semantic features andtraining set samples, providing a portion of information about the unseen classes. In addition, a new feature fusionmethod is proposed in the model. This method is equivalent to adding perturbation to the seen class features,which can reduce the degree to which the classification results in the model are biased towards the seen classes.At the same time, this feature fusion method can provide part of the information of the unseen classes, improvingits classification accuracy in generalized zero-shot learning and reducing domain bias. The proposed method isvalidated and compared with othermethods on four datasets, and fromthe experimental results, it can be seen thatthe method proposed in this paper achieves promising results.
文摘Deep metric learning is one of the recommended methods for the challenge of supporting few/zero-shot learning by deep networks.It depends on building a Siamese architecture of two homogeneous Convolutional Neural Networks(CNNs)for learning a distance function that can map input data from the input space to the feature space.Instead of determining the class of each sample,the Siamese architecture deals with the existence of a few training samples by deciding if the samples share the same class identity or not.The traditional structure for the Siamese architecture was built by forming two CNNs from scratch with randomly initialized weights and trained by binary cross-entropy loss.Building two CNNs from scratch is a trial and error and time-consuming phase.In addition,training with binary crossentropy loss sometimes leads to poor margins.In this paper,a novel Siamese network is proposed and applied to few/zero-shot Handwritten Character Recognition(HCR)tasks.The novelties of the proposed network are in.1)Utilizing transfer learning and using the pre-trained AlexNet as a feature extractor in the Siamese architecture.Fine-tuning a pre-trained network is typically faster and easier than building from scratch.2)Training the Siamese architecture with contrastive loss instead of the binary cross-entropy.Contrastive loss helps the network to learn a nonlinear mapping function that enables it to map the extracted features in the vector space with an optimal way.The proposed network is evaluated on the challenging Chars74K datasets by conducting two experiments.One is for testing the proposed network in few-shot learning while the other is for testing it in zero-shot learning.The recognition accuracy of the proposed network reaches to 85.6%and 82%in few-and zero-shot learning respectively.In addition,a comparison between the performance of the proposed Siamese network and the traditional Siamese CNNs is conducted.The comparison results show that the proposed network achieves higher recognition results in less time.The proposed network reduces the training time from days to hours in both experiments.
文摘The goal of zero-shot recognition is to classify classes it has never seen before, which needs to build a bridge between seen and unseen classes through semantic embedding space. Therefore, semantic embedding space learning plays an important role in zero-shot recognition. Among existing works, semantic embedding space is mainly taken by user-defined attribute vectors. However, the discriminative information included in the user-defined attribute vector is limited. In this paper, we propose to learn an extra latent attribute space automatically to produce a more generalized and discriminative semantic embedded space. To prevent the bias problem, both user-defined attribute vector and latent attribute space are optimized by adversarial learning with auto-encoders. We also propose to reconstruct semantic patterns produced by explanatory graphs, which can make semantic embedding space more sensitive to usefully semantic information and less sensitive to useless information. The proposed method is evaluated on the AwA2 and CUB dataset. These results show that our proposed method achieves superior performance.
基金supported by the National Key R&D Program of China(No.2022YFC3400400).
文摘Fluorescence microscopy image(FMI)denoising faces critical challenges because of the compound mixed Poisson-Gaussian noise with strong spatial correlation and the impracticality of acquiring paired noisy/clean data in dynamic biomedical scenarios.While supervised methods trained on synthetic noise(e.g.,Gaussian/Poisson)suffer from out-of-distribution generalization issues,existing self-supervised approaches degrade under real FMI noise because they oversimplify noise assumptions and computationally intensive deep architectures.In this work,we propose fluorescence micrograph to self(FM2S),a zero-shot denoiser that achieves efficient FMI denoising through three key innovations:1)A noise injection module that ensures training data sufficiency through adaptive Poisson-Gaussian synthesis while preserving spatial correlation and global statistics of FMI noise for robust model generalization;2)A two-stage proactive learning strategy that first recovers structural priors via predenoised targets and then refines high-frequency details through noise distribution alignment;3)An ultralight-weight network(3.5 k parameters)enabling rapid convergence with 270×faster training and inference than state-of-the-art(SOTA).Extensive experiments across FMI datasets demonstrate FM2S’superiority:It outperforms CVF-SID by 1.4 dB in peak signal-to-noise ratio(PSNR)on average while requiring 0.1%of the parameters of the AP-BSN.Notably,FM2S maintains stable performance across varying noise levels,indicating its practicality for microscopy platforms with diverse sensor characteristics.The code and datasets can be found at https://github.com/Danielement321/FM2S.
基金supported by the Key R&D Program of Zhejiang Province(No.2024C01021)the National Regional Innovation and Development Joint Fund of China(No.U24A20254)the Leading Talents of Technological Innovation Program of Zhejiang Province(No.2023R5214)。
文摘In recent years,multi-label zero-shot learning(ML-ZSL)has garnered increasing attention because of its wide range of potential applications,such as image annotation,text classification,and bioinformatics.The central challenge in ML-ZSL lies in predicting multiple labels for unseen classes without requiring any labeled training data,which contrasts with conventional supervised learning paradigms.However,existing methods face several significant challenges.These include the substantial semantic gap between different modalities,which impedes effective knowledge transfer,and the intricate and typically complex relationships among multiple labels,making it difficult to model them in a meaningful and accurate manner.To overcome these challenges,we propose a graph-augmented multimodal chain-of-thought(GMCoT)reasoning approach.The proposed method combines the strengths of multimodal large language models with graph-based structures,significantly enhancing the reasoning process involved in multi-label prediction.First,a novel multimodal chain-of-thought reasoning framework is presented which imitates human-like step-by-step reasoning to produce multi-label predictions.Second,a technique is presented for integrating label graphs into the reasoning process.This technique enables the capture of complex semantic relationships among labels,thereby improving the accuracy and consistency of multi-label generation.Comprehensive experiments on benchmark datasets demonstrate that the proposed GMCoT approach outperforms state-of-the-art methods in ML-ZSL.
基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the ITRC(Information Technology Research Center)support program(IITP-2024-2020-0-01789)the Artificial Intelligence Convergence Innovation Human Resources Development(IITP-2024-RS-2023-00254592)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)the National Research Foundation,Singapore,and Ministry of National Development,Singapore,under its Cities of Tomorrow R&D Programme(CoT Award NRF-CoTV4-2020-9).
文摘Zero-shot learning(ZSL)is an important and rapidly growing area of machine learning that aims to recognize new classes without prior training data.Despite its significance,ZSL has faced challenges with overfitting in embedding-based methods and limitations in traditional one-directional attention(ODA)based approaches.To bridge these gaps,this paper proposes the use of bi-directional attention(BDA)to integrate insights from both embedding and attention-based approaches.The proposed BDA system consists of a bi-directional attention network(BDAN)and a synthesized visual embedding network(SVEN)that facilitates visual-semantic interaction for ZSL classification.More specifically,the BDAN employs region self-attention(RSA),semantic synthesis attention(SSA),and visual synthesis attention(VSA)to overcome the overfitting issue in embedding methods and enhance transferability,to associate visual features with semantic property information,and to learn locally improved visual features.Extensive testing on CUB,SUN,and AWA2 datasets confirm the superiority of our proposed method over traditional approaches.
基金supported in part by the grants from National Natural Science Foundation of China(No.62222213,U22B2059,62072423)the USTC Research Funds of the Double First-Class Initiative(No.YD2150002009).
文摘Entity linking(EL)plays a crucial role in natural language processing(NLP)NLP tasks by linking ambiguous entity mentions to relevant entities in a knowledge base.Due to the inconsistency in data distribution across diverse domains,it is difficult to accurately estimate the overall data distribution of the target domain,resulting in the zero-shot scenarios with a significant decrease in generalization performance.Currently,existing works primarily focus on sampling and incorporating fine-grained information to deal with above issue.Unfortunately,they may face either significant computational cost of negative samples for sampling strategy,or shortcomings in interaction between coarse and fine-grained information.To tackle these challenges,in this paper,we propose a Multi-Task Framework with Anchor Point Sampling(MAPS).Specifically,for the anchor point sampling(APS)part,with considering fine-grained information,we pre-bind mention-entity pairs based on prior conditions(e.g.,entity type)to introduce challenging negative samples and modifies the conditional distribution.In this way,the optimal trade-off between computational effectiveness and efficiency will be reached.Moreover,we propose a novel multi-task framework that shares coarse-grained information at a lower level,and utilizes multiple extractors to extract fine-grained information at a higher level.By combining the multi-task framework and various APS approaches,comprehensive fusion of coarse and fine-grained information will be finally achieved.Experimental results on the benchmark dataset ZESHEL demonstrate that MAPS significantly outperforms the competitive baselines.
基金supported by the National Natural Science Foundation of China(Nos.62106150 and 62272315)the Open Fund of National Engineering Laboratory for Big Data System Computing Technology(No.SZU-BDSC-OF2024-22)+1 种基金the Open Research Fund of Anhui Province Key Laboratory of Machine Vision Inspection(No.KLMVI-2023-HIT-01)the Director Fund of Guangdong Laboratory of Artificial Intelligence and Digital Economy(Shenzhen)(No.24420001).
文摘Zero-Shot object Detection(ZSD),one of the most challenging problems in the field of object detection,aims to accurately identify new categories that are not encountered during training.Recent advancements in deep learning and increased computational power have led to significant improvements in object detection systems,achieving high recognition accuracy on benchmark datasets.However,these systems remain limited in real-world applications due to the scarcity of labeled training samples,making it difficult to detect unseen classes.To address this,researchers have explored various approaches,yielding promising progress.This article provides a comprehensive review of the current state of ZSD,distinguishing four related methods—zero-shot,open-vocabulary,open-set,and open-world approaches—based on task objectives and data usage.We highlight representative methods,discuss the technical challenges within each framework,and summarize the commonly used evaluation metrics,benchmark datasets,and experimental results.Our review aims to offer readers a clear overview of the latest developments and performance trends in ZSD.
基金supported by the National Natural Science Foundation of China(No.62271125).
文摘Knowledge-based Visual Question Answering(VQA)is a challenging task that requires models to access external knowledge for reasoning.Large Language Models(LLMs)have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities.However,LLMs are commonly perceived as implicit knowledge bases,and their generative and in-context learning potential remains underutilized.Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts.In light of this,we propose Knowledge Generation with Frozen Language Models(KGFLM),a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA.Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order,thereby activating the frozen LLM to produce more useful knowledge statements for better predictions.The generated knowledge statements can also serve as interpretable rationales.In our method,the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question,without requiring additional annotations.Furthermore,a series of experiments are conducted on A-OKVQA and OKVQA datasets.The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.
文摘Predicting the behavior of renewable energy systems requires models capable of generating accurate forecasts from limited historical data,a challenge that becomes especially pronounced when commissioning new facil-ities where operational records are scarce.This review aims to synthesize recent progress in data-efficient deep learning approaches for addressing such“cold-start”forecasting problems.It primarily covers three interrelated domains—solar photovoltaic(PV),wind power,and electrical load forecasting—where data scarcity and operational variability are most critical,while also including representative studies on hydropower and carbon emission prediction to provide a broader systems perspective.To this end,we examined trends from over 150 predominantly peer-reviewed studies published between 2019 and mid-2025,highlighting advances in zero-shot and few-shot meta-learning frameworks that enable rapid model adaptation with minimal labeled data.Moreover,transfer learning approaches combined with spatiotemporal graph neural networks have been employed to transfer knowledge from existing energy assets to new,data-sparse environments,effectively capturing hidden dependencies among geographic features,meteorological dynamics,and grid structures.Synthetic data generation has further proven valuable for expanding training samples and mitigating overfitting in cold-start scenarios.In addition,large language models and explainable artificial intelligence(XAI)—notably conversational XAI systems—have been used to interpret and communicate complex model behaviors in accessible terms,fostering operator trust from the earliest deployment stages.By consolidating methodological advances,unresolved challenges,and open-source resources,this review provides a coherent overview of deep learning strategies that can shorten the data-sparse ramp-up period of new energy infrastructures and accelerate the transition toward resilient,low-carbon electricity grids.