Addressing graph combinatorial optimization problems often poses significant challenges due to the difficulty and high cost of obtaining supervised labels.As a result,unsupervised algorithms have garnered increasing a...Addressing graph combinatorial optimization problems often poses significant challenges due to the difficulty and high cost of obtaining supervised labels.As a result,unsupervised algorithms have garnered increasing attention from researchers.In this paper,we propose a novel unsupervised framework that leverages contrastive learning to address these challenges.Drawing inspiration from traditional exact algorithms,we introduce a vertex-based degree-aware data augmentation method that enables the progressive learning of graph structure features.Furthermore,we incorporate optimal transport theory by using distance measures as the contrastive loss,thereby enhancing the model′s ability to capture local graph structures.Extensive experiments demonstrate the superior performance of our approach in terms of both solution accuracy and inference speed on most graph combinatorial optimization problems,particularly in large-scale graph problems and scenarios where training samples are scarce.展开更多
Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in b...Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in biometric recognition systems.This transformation is repeatable enabling subsequent biometric comparisons.This paper introduces a new idea to be exploited as a transformation function for cancelable biometrics aimed at protecting templates against iterative optimization attacks.Our proposed scheme is based on time-varying keys(random biometrics in our case)and morphing transformations.An experimental implementation of the proposed scheme is given for face biometrics.The results confirm that the proposed approach is able to withstand leakage attacks while improving the recognition performance.展开更多
With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in ...With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in a collaborative but private way without exchanging data.However,most FL clients are currently unimodal.With the rise of edge computing,various types of sensors and wearable devices generate a large amount of data from different modalities,which has inspired research efforts in multimodal federated learning(MMFL).In this survey,we explore the area of MMFL to address the fundamental challenges of FL on multimodal data.First,we analyse the key motivations for MMFL.Second,the currently proposed MMFL methods are technically classified according to the modality distributions and modality annotations in MMFL.Then,we discuss the datasets and application scenarios of MMFL.Finally,we highlight the limitations and challenges of MMFL and provide insights and methods for future research.展开更多
In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of...In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, named entity recognition (NER), and chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have witnessed a rising trend of paradigm shift, which is solving one NLP task in a new paradigm by reformulating the task. The paradigm shift has achieved great success on many tasks and is becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.展开更多
Motion information is a crucial cue to build a robust tracker,especially in handling object occlusion and fast drift caused by cameras and objects.However,it has not been fully exploited.In this study,we attempt to ex...Motion information is a crucial cue to build a robust tracker,especially in handling object occlusion and fast drift caused by cameras and objects.However,it has not been fully exploited.In this study,we attempt to exploit motion cues to guide visual trackers without bells and whistles.First,we decouple motion into two types:camera motion and object motion.Then,we predict them individually via the proposed camera motion modeling and object trajectory prediction.Each module contains a motion detector and a verifier.As for camera motion,we apply the off-the-shelf keypoint matching method to detect camera movement and propose a novel self-supervised camera motion verifier to validate its confidence.Given the previous object trajectory,object trajectory prediction aims to predict the future location of the target and select a reliable trajectory to handle fast object motion and occlusion.Numerous experiments on several mainstream tracking datasets,including OTB100,DTB70,TC128,UAV123,VOT2018 and GOT10k,demonstrate the effectiveness and generalization ability of our module,with real-time speed.展开更多
Correction to: Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-worldApplicationsDOI: 10.100 7/s11633-023-1385-0Authors: Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li ChengThe ar...Correction to: Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-worldApplicationsDOI: 10.100 7/s11633-023-1385-0Authors: Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li ChengThe article Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-world Applications,written by Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li Cheng, was originally published withoutOpen Access. After publication, the authors decided toopt for Open Choice and to make the article an Open Accesspublication. Therefore, the copyright of the article has been changed to The Author(s) 2024 and thearticle is forthwith distributed under the terms of theCreative Commons Attribution 4.0 International License(http:/ /creativecommons.org/licenses/by/4.0/), which permitsuse, duplication, adaptation, distribution and reproductionin any medium or format, as long as you give appropriatecredit to the original author(s) and the source,provide a link to the Creative Commons license, and indicateif changes were made.展开更多
Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing...Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing and analyzing the brain.To lever-age the complementary representations of different modalities,multimodal fusion is consequently needed to dig out both inter-modality and intra-modality information.With the exploited rich information,it is becoming popular to combine multiple modality data to ex-plore the structural and functional characteristics of the brain in both health and disease status.In this paper,we first review a wide spectrum of advanced machine learning methodologies for fusing multimodal brain imaging data,broadly categorized into unsupervised and supervised learning strategies.Followed by this,some representative applications are discussed,including how they help to under-stand the brain arealization,how they improve the prediction of behavioral phenotypes and brain aging,and how they accelerate the biomarker exploration of brain diseases.Finally,we discuss some exciting emerging trends and important future directions.Collectively,we intend to offer a comprehensive overview of brain imaging fusion methods and their successful applications,along with the chal-lenges imposed by multi-scale and big data,which arises an urgent demand on developing new models and platforms.展开更多
Correction to:The Life Cycle of Knowledge in Big Language Models:A Survey DOI:10.1007/s11633-023-1416-x Authors:Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun The article The Life Cycle of Knowledge in Big Language Models:A S...Correction to:The Life Cycle of Knowledge in Big Language Models:A Survey DOI:10.1007/s11633-023-1416-x Authors:Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun The article The Life Cycle of Knowledge in Big Language Models:A Survey,written by Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun,was originally published without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication.展开更多
This paper introduces the system of game-theoretic interactions,which connects both the explanation of knowledge encoded in a deep neural networks(DNN)and the explanation of the representation power of a DNN.In this s...This paper introduces the system of game-theoretic interactions,which connects both the explanation of knowledge encoded in a deep neural networks(DNN)and the explanation of the representation power of a DNN.In this system,we define two gametheoretic interaction indexes,namely the multi-order interaction and the multivariate interaction.More crucially,we use these interaction indexes to explain feature representations encoded in a DNN from the following four aspects:(1)Quantifying knowledge concepts encoded by a DNN;(2)Exploring how a DNN encodes visual concepts,and extracting prototypical concepts encoded in the DNN;(3)Learning optimal baseline values for the Shapley value,and providing a unified perspective to compare fourteen different attribution methods;(4)Theoretically explaining the representation bottleneck of DNNs.Furthermore,we prove the relationship between the interaction encoded in a DNN and the representation power of a DNN(e.g.,generalization power,adversarial transferability,and adversarial robustness).In this way,game-theoretic interactions successfully bridge the gap between“the explanation of knowledge concepts encoded in a DNN”and"the explanation of the representation capacity of a DNN"as a unified explanation.展开更多
The conversation machine comprehension(MC)task aims to answer questions in the multi-turn conversation for a single passage.However,recent approaches don’t exploit information from historical conversations effectivel...The conversation machine comprehension(MC)task aims to answer questions in the multi-turn conversation for a single passage.However,recent approaches don’t exploit information from historical conversations effectively,which results in some references and ellipsis in the current question cannot be recognized.In addition,these methods do not consider the rich semantic relationships between words when reasoning about the passage text.In this paper,we propose a novel model GraphFlow+,which constructs a context graph for each conversation turn and uses a unique recurrent graph neural network(GNN)to model the temporal dependencies between the context graphs of each turn.Specifically,we exploit three different ways to construct text graphs,including the dynamic graph,static graph,and hybrid graph that combines the two.Our experiments on CoQA,QuAC and DoQA show that the GraphFlow+model can outperform the state-of-the-art approaches.展开更多
Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“sm...Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“small goals”within a hierarchical decision model.However,HER does not consider the value of different episodes for agent learning.In this paper,we propose SPAHER,a framework for prioritizing hindsight experiences based on spatial position attention.SPAHER allows the agent to prioritize more valuable experiences in a manipulation task.It achieves this by calculating transition and trajectory spatial position functions to determine the value of each episode for experience replays.We evaluate SPAHER on eight robot manipulation tasks in the Fetch and Hand environments provided by OpenAI Gym.Simulation results show that our method improves the final mean success rate by an average of 3.63%compared to HER,especially in challenging Hand environments.Notably,these improvements are achieved without any increase in computation time.展开更多
Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios suc...Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness.展开更多
Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions du...Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames.Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution.However,synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream,and the initial brightness is unknown.In this paper,we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion,which uses event streams to provide complete scene information and frames to provide color and texture information.An event stream encoder based on the spiking neural network(SNN)is proposed to encode and denoise the event stream efficiently.A comparison loss is proposed to generate clearer results.Experimental results on a largescale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.展开更多
In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decod...In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable.展开更多
A configurable U-Net architecture is trained to solve the multi-scale elliptical partial differential equations.The motivation is to improve the computational cost of the numerical solution of Navier-Stokes equations...A configurable U-Net architecture is trained to solve the multi-scale elliptical partial differential equations.The motivation is to improve the computational cost of the numerical solution of Navier-Stokes equations–the governing equations for fluid dynamics.Building on the underlying concept of V-Cycle multigrid methods,a neural network framework using U-Net architecture is optimized to solve the Poisson equation and Helmholtz equations–the characteristic form of the discretized Navier-Stokes equations.The results demonstrate the optimized U-Net captures the high dimensional mathematical features of the elliptical operator and with a better convergence than the multigrid method.The optimal performance between the errors and the FLOPS is the(3,2,5)case with 3 stacks of UNets,with 2 initial features,5 depth layers and with ELU activation.Further,by training the network with the multi-scale synthetic data the finer features of the physical system are captured.展开更多
Correction to:GraphFM:Graph Factorization Machines for Feature Interaction Modelling DOI:10.1007/s11633-024-1505-5 Authors:Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang The article GraphFM:Graph Factoriza...Correction to:GraphFM:Graph Factorization Machines for Feature Interaction Modelling DOI:10.1007/s11633-024-1505-5 Authors:Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang The article GraphFM:Graph Factorization Machines for Feature Interaction Modelling,written by Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang,was originally published without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication.展开更多
We study the task of automated house design,which aims to automatically generate 3D houses from user requirements.However,in the automatic system,it is non-trivial due to the intrinsic complexity of house designing:1)...We study the task of automated house design,which aims to automatically generate 3D houses from user requirements.However,in the automatic system,it is non-trivial due to the intrinsic complexity of house designing:1)the understanding of user requirements,where the users can hardly provide high-quality requirements without any professional knowledge;2)the design of house plan,which mainly focuses on how to capture the effective information from user requirements.To address the above issues,we propose an automatic house design framework,called auto-3D-house design(A3HD).Unlike the previous works that consider the user requirements in an unstructured way(e.g.,natural language),we carefully design a structured list that divides the requirements into three parts(i.e.,layout,outline,and style),which focus on the attributes of rooms,the outline of the building,and the style of decoration,respectively.Following the processing of architects,we construct a bubble diagram(i.e.,graph)that covers the rooms′attributes and relations under the constraint of outline.In addition,we take each outline as a combination of points and orders,ensuring that it can represent the outlines with arbitrary shapes.Then,we propose a graph feature generation module(GFGM)to capture layout features from the bubble diagrams and an outline feature generation module(OFGM)for outline features.Finally,we render 3D houses according to the given style requirements in a rule-based method.Experiments on two benchmark datasets(i.e.,RPLAN and T3HM)demonstrate the effectiveness of our A3HD in terms of both quantitative and qualitative evaluation metrics.展开更多
Inspired by eagle eye mechanisms,the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle(UAV)with a mechanical arm.In t...Inspired by eagle eye mechanisms,the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle(UAV)with a mechanical arm.In this paper,a novel eagle-eye inspired multi-camera sensor and a saliency detection method are proposed.A combined camera system is built by simulating the double fovea structure on the eagle retina.A saliency target detection method based on the eagle midbrain inhibition mechanism is proposed by measuring the static saliency information and dynamic features.Thus,salient targets can be accurately detected through the collaborative work between different cameras of the proposed multi-camera sensor.Experimental results show that the eagle-eye inspired visual system is able to continuously detect targets in outdoor scenes and that the proposed algorithm has a strong inhibitory effect on moving background interference.展开更多
Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often...Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.展开更多
Machine intelligence is a fast-developing field and often used as a synonym for artificial intelligence but with a stronger application orientation. Machine intelligence algorithms are not forms of alchemy, but rather...Machine intelligence is a fast-developing field and often used as a synonym for artificial intelligence but with a stronger application orientation. Machine intelligence algorithms are not forms of alchemy, but rather useful tools of improving the world we live in. With easier access to data and more affordable computational power,discoveries in research labs are being transformed into real-world applications and therefore creating enormous social and economic values.展开更多
文摘Addressing graph combinatorial optimization problems often poses significant challenges due to the difficulty and high cost of obtaining supervised labels.As a result,unsupervised algorithms have garnered increasing attention from researchers.In this paper,we propose a novel unsupervised framework that leverages contrastive learning to address these challenges.Drawing inspiration from traditional exact algorithms,we introduce a vertex-based degree-aware data augmentation method that enables the progressive learning of graph structure features.Furthermore,we incorporate optimal transport theory by using distance measures as the contrastive loss,thereby enhancing the model′s ability to capture local graph structures.Extensive experiments demonstrate the superior performance of our approach in terms of both solution accuracy and inference speed on most graph combinatorial optimization problems,particularly in large-scale graph problems and scenarios where training samples are scarce.
基金This work was supported by PRIMA(No.H2020-MSCA-ITN-2019-860315)TRESPASS-ETN(No.H2020-MSCA-ITN-2019-860813)+1 种基金BBforTAI(No.PID2021-127641OB-I00 MICINN/FEDER)INTER-ACTION(No.PID2021-126521OB-I00 MICINN/FEDER).M.Ghafourian was supported by PRIMA and I.Serna was supported by an FPI fellowship from University Autonoma de Madrid,Spain.
文摘Cancelable biometrics are a group of techniques to transform the input biometric to an irreversible feature intentionally using a transformation function and usually a key in order to provide security and privacy in biometric recognition systems.This transformation is repeatable enabling subsequent biometric comparisons.This paper introduces a new idea to be exploited as a transformation function for cancelable biometrics aimed at protecting templates against iterative optimization attacks.Our proposed scheme is based on time-varying keys(random biometrics in our case)and morphing transformations.An experimental implementation of the proposed scheme is given for face biometrics.The results confirm that the proposed approach is able to withstand leakage attacks while improving the recognition performance.
基金supported by the National Natural Science Foundation of China(No.62036006)the Fundamental Research Funds for the Central Universities,Chinathe Innovation Fund of Xidian University,China.
文摘With the growing awareness of data privacy,federated learning(FL)has gained increasing attention in recent years as a major paradigm for training models with privacy protection in mind,which allows building models in a collaborative but private way without exchanging data.However,most FL clients are currently unimodal.With the rise of edge computing,various types of sensors and wearable devices generate a large amount of data from different modalities,which has inspired research efforts in multimodal federated learning(MMFL).In this survey,we explore the area of MMFL to address the fundamental challenges of FL on multimodal data.First,we analyse the key motivations for MMFL.Second,the currently proposed MMFL methods are technically classified according to the modality distributions and modality annotations in MMFL.Then,we discuss the datasets and application scenarios of MMFL.Finally,we highlight the limitations and challenges of MMFL and provide insights and methods for future research.
基金supported by National Natural Science Foundation of China(No.62022027).
文摘In the era of deep learning, modeling for most natural language processing (NLP) tasks has converged into several mainstream paradigms. For example, we usually adopt the sequence labeling paradigm to solve a bundle of tasks such as POS-tagging, named entity recognition (NER), and chunking, and adopt the classification paradigm to solve tasks like sentiment analysis. With the rapid progress of pre-trained language models, recent years have witnessed a rising trend of paradigm shift, which is solving one NLP task in a new paradigm by reformulating the task. The paradigm shift has achieved great success on many tasks and is becoming a promising way to improve model performance. Moreover, some of these paradigms have shown great potential to unify a large number of NLP tasks, making it possible to build a single model to handle diverse tasks. In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.
基金supported in part by National Natural Science Foundation of China(Nos.62293542 and 62022021)in part by Central Guidance on Local Science and Technology Development Fund of Liaoning Province,China(No.2022JH6/100100026)+1 种基金in part by the National Defense Basic Scientific Research Program(No.WDZC20215250205)in part by Fundamental Research Funds for the Central Universities,China(No.DUT22QN228).
文摘Motion information is a crucial cue to build a robust tracker,especially in handling object occlusion and fast drift caused by cameras and objects.However,it has not been fully exploited.In this study,we attempt to exploit motion cues to guide visual trackers without bells and whistles.First,we decouple motion into two types:camera motion and object motion.Then,we predict them individually via the proposed camera motion modeling and object trajectory prediction.Each module contains a motion detector and a verifier.As for camera motion,we apply the off-the-shelf keypoint matching method to detect camera movement and propose a novel self-supervised camera motion verifier to validate its confidence.Given the previous object trajectory,object trajectory prediction aims to predict the future location of the target and select a reliable trajectory to handle fast object motion and occlusion.Numerous experiments on several mainstream tracking datasets,including OTB100,DTB70,TC128,UAV123,VOT2018 and GOT10k,demonstrate the effectiveness and generalization ability of our module,with real-time speed.
文摘Correction to: Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-worldApplicationsDOI: 10.100 7/s11633-023-1385-0Authors: Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li ChengThe article Segment Anything Is Not Always Perfect:An Investigation of SAM on Different Real-world Applications,written by Wei Ji, Jingjing Li, Qi Bi, Tingwei Liu,Wenbo Li, Li Cheng, was originally published withoutOpen Access. After publication, the authors decided toopt for Open Choice and to make the article an Open Accesspublication. Therefore, the copyright of the article has been changed to The Author(s) 2024 and thearticle is forthwith distributed under the terms of theCreative Commons Attribution 4.0 International License(http:/ /creativecommons.org/licenses/by/4.0/), which permitsuse, duplication, adaptation, distribution and reproductionin any medium or format, as long as you give appropriatecredit to the original author(s) and the source,provide a link to the Creative Commons license, and indicateif changes were made.
文摘Neuroimaging data typically include multiple modalities,such as structural or functional magnetic resonance imaging,dif-fusion tensor imaging,and positron emission tomography,which provide multiple views for observing and analyzing the brain.To lever-age the complementary representations of different modalities,multimodal fusion is consequently needed to dig out both inter-modality and intra-modality information.With the exploited rich information,it is becoming popular to combine multiple modality data to ex-plore the structural and functional characteristics of the brain in both health and disease status.In this paper,we first review a wide spectrum of advanced machine learning methodologies for fusing multimodal brain imaging data,broadly categorized into unsupervised and supervised learning strategies.Followed by this,some representative applications are discussed,including how they help to under-stand the brain arealization,how they improve the prediction of behavioral phenotypes and brain aging,and how they accelerate the biomarker exploration of brain diseases.Finally,we discuss some exciting emerging trends and important future directions.Collectively,we intend to offer a comprehensive overview of brain imaging fusion methods and their successful applications,along with the chal-lenges imposed by multi-scale and big data,which arises an urgent demand on developing new models and platforms.
文摘Correction to:The Life Cycle of Knowledge in Big Language Models:A Survey DOI:10.1007/s11633-023-1416-x Authors:Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun The article The Life Cycle of Knowledge in Big Language Models:A Survey,written by Boxi Cao,Hongyu Lin,Xianpei Han,Le Sun,was originally published without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication.
基金supported by National Science and Technology Major Project(No.2021ZD0111602)the National Nature Science Foundation of China(Nos.62276165 and U19B2043)Shanghai Natural Science Foundation,China(Nos.21JC1403800 and 21ZR1434600).
文摘This paper introduces the system of game-theoretic interactions,which connects both the explanation of knowledge encoded in a deep neural networks(DNN)and the explanation of the representation power of a DNN.In this system,we define two gametheoretic interaction indexes,namely the multi-order interaction and the multivariate interaction.More crucially,we use these interaction indexes to explain feature representations encoded in a DNN from the following four aspects:(1)Quantifying knowledge concepts encoded by a DNN;(2)Exploring how a DNN encodes visual concepts,and extracting prototypical concepts encoded in the DNN;(3)Learning optimal baseline values for the Shapley value,and providing a unified perspective to compare fourteen different attribution methods;(4)Theoretically explaining the representation bottleneck of DNNs.Furthermore,we prove the relationship between the interaction encoded in a DNN and the representation power of a DNN(e.g.,generalization power,adversarial transferability,and adversarial robustness).In this way,game-theoretic interactions successfully bridge the gap between“the explanation of knowledge concepts encoded in a DNN”and"the explanation of the representation capacity of a DNN"as a unified explanation.
文摘The conversation machine comprehension(MC)task aims to answer questions in the multi-turn conversation for a single passage.However,recent approaches don’t exploit information from historical conversations effectively,which results in some references and ellipsis in the current question cannot be recognized.In addition,these methods do not consider the rich semantic relationships between words when reasoning about the passage text.In this paper,we propose a novel model GraphFlow+,which constructs a context graph for each conversation turn and uses a unique recurrent graph neural network(GNN)to model the temporal dependencies between the context graphs of each turn.Specifically,we exploit three different ways to construct text graphs,including the dynamic graph,static graph,and hybrid graph that combines the two.Our experiments on CoQA,QuAC and DoQA show that the GraphFlow+model can outperform the state-of-the-art approaches.
基金supported by the Natural Science Foundation of Shaanxi Province,China(No.2022JQ-661)the Project of Science and Technology Development Plan in Hangzhou,China(No.202202B38)the Xidian-FIAS International Joint Research Center,China.
文摘Sparse rewards pose significant challenges in deep reinforcement learning as agents struggle to learn from experiences with limited reward signals.Hindsight experience replay(HER)addresses this problem by creating“small goals”within a hierarchical decision model.However,HER does not consider the value of different episodes for agent learning.In this paper,we propose SPAHER,a framework for prioritizing hindsight experiences based on spatial position attention.SPAHER allows the agent to prioritize more valuable experiences in a manipulation task.It achieves this by calculating transition and trajectory spatial position functions to determine the value of each episode for experience replays.We evaluate SPAHER on eight robot manipulation tasks in the Fetch and Hand environments provided by OpenAI Gym.Simulation results show that our method improves the final mean success rate by an average of 3.63%compared to HER,especially in challenging Hand environments.Notably,these improvements are achieved without any increase in computation time.
基金National Science Foundation(NSF),USA(No.IIS-1909702)Army Research Office(ARO),USA(No.W911NF21-1-0198)Department of Homeland Security(DNS)CINA,USA(No.E205949D).
文摘Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness.
基金supported by National Natural Science Funds of China (Nos. 62088102 and 62021002)Beijing Natural Science Foundation, China (No. 4222025)
文摘Seeing through dense occlusions and reconstructing scene images is an important but challenging task.Traditional framebased image de-occlusion methods may lead to fatal errors when facing extremely dense occlusions due to the lack of valid information available from the limited input occluded frames.Event cameras are bio-inspired vision sensors that record the brightness changes at each pixel asynchronously with high temporal resolution.However,synthesizing images solely from event streams is ill-posed since only the brightness changes are recorded in the event stream,and the initial brightness is unknown.In this paper,we propose an event-enhanced multi-modal fusion hybrid network for image de-occlusion,which uses event streams to provide complete scene information and frames to provide color and texture information.An event stream encoder based on the spiking neural network(SNN)is proposed to encode and denoise the event stream efficiently.A comparison loss is proposed to generate clearer results.Experimental results on a largescale event-based and frame-based image de-occlusion dataset demonstrate that our proposed method achieves state-of-the-art performance.
基金supported in part by the National Key R&D Program of China(No.2022ZD0116500)in part by the National Natural Science Foundation of China(No.62206284)。
文摘In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable.
文摘A configurable U-Net architecture is trained to solve the multi-scale elliptical partial differential equations.The motivation is to improve the computational cost of the numerical solution of Navier-Stokes equations–the governing equations for fluid dynamics.Building on the underlying concept of V-Cycle multigrid methods,a neural network framework using U-Net architecture is optimized to solve the Poisson equation and Helmholtz equations–the characteristic form of the discretized Navier-Stokes equations.The results demonstrate the optimized U-Net captures the high dimensional mathematical features of the elliptical operator and with a better convergence than the multigrid method.The optimal performance between the errors and the FLOPS is the(3,2,5)case with 3 stacks of UNets,with 2 initial features,5 depth layers and with ELU activation.Further,by training the network with the multi-scale synthetic data the finer features of the physical system are captured.
文摘Correction to:GraphFM:Graph Factorization Machines for Feature Interaction Modelling DOI:10.1007/s11633-024-1505-5 Authors:Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang The article GraphFM:Graph Factorization Machines for Feature Interaction Modelling,written by Shu Wu,Zekun Li,Yunyue Su,Zeyu Cui,Xiaoyu Zhang,Liang Wang,was originally published without Open Access.After publication,the authors decided to opt for Open Choice and to make the article an Open Access publication.
基金supported by the National Natural Science Foundation of China(NSFC)(No.62072190)TCL Science and Technology Innovation Fund,China.
文摘We study the task of automated house design,which aims to automatically generate 3D houses from user requirements.However,in the automatic system,it is non-trivial due to the intrinsic complexity of house designing:1)the understanding of user requirements,where the users can hardly provide high-quality requirements without any professional knowledge;2)the design of house plan,which mainly focuses on how to capture the effective information from user requirements.To address the above issues,we propose an automatic house design framework,called auto-3D-house design(A3HD).Unlike the previous works that consider the user requirements in an unstructured way(e.g.,natural language),we carefully design a structured list that divides the requirements into three parts(i.e.,layout,outline,and style),which focus on the attributes of rooms,the outline of the building,and the style of decoration,respectively.Following the processing of architects,we construct a bubble diagram(i.e.,graph)that covers the rooms′attributes and relations under the constraint of outline.In addition,we take each outline as a combination of points and orders,ensuring that it can represent the outlines with arbitrary shapes.Then,we propose a graph feature generation module(GFGM)to capture layout features from the bubble diagrams and an outline feature generation module(OFGM)for outline features.Finally,we render 3D houses according to the given style requirements in a rule-based method.Experiments on two benchmark datasets(i.e.,RPLAN and T3HM)demonstrate the effectiveness of our A3HD in terms of both quantitative and qualitative evaluation metrics.
基金supported by National Natural Science Foundation of China(Nos.T2121003,U1913602 and U19B2033)Science and Technology Innovation 2030−Key Project of“New Generation Artificial Intelligence”,China(No.2018AAA0100803).
文摘Inspired by eagle eye mechanisms,the structure and information processing characteristics of the eagle′s visual system are used for the target capture task of an unmanned aerial vehicle(UAV)with a mechanical arm.In this paper,a novel eagle-eye inspired multi-camera sensor and a saliency detection method are proposed.A combined camera system is built by simulating the double fovea structure on the eagle retina.A saliency target detection method based on the eagle midbrain inhibition mechanism is proposed by measuring the static saliency information and dynamic features.Thus,salient targets can be accurately detected through the collaborative work between different cameras of the proposed multi-camera sensor.Experimental results show that the eagle-eye inspired visual system is able to continuously detect targets in outdoor scenes and that the proposed algorithm has a strong inhibitory effect on moving background interference.
文摘Cross-modal image-text retrieval is a fundamental task in bridging vision and language. It faces two main challenges that are typically not well addressed in previous works. 1) Generalizability: Existing methods often assume a strong semantic correlation between each text-image pair, which are thus difficult to generalize to real-world scenarios where the weak correlation dominates. 2) Efficiency: Many latest works adopt the single-tower architecture with heavy detectors, which are inefficient during the inference stage because the costly computation needs to be repeated for each text-image pair. In this work, to overcome these two challenges, we propose a two-tower cross-modal contrastive learning (CMCL) framework. Specifically, we first devise a two-tower architecture, which enables a unified feature space for the text and image modalities to be directly compared with each other, alleviating the heavy computation during inference. We further introduce a simple yet effective module named multi-grid split (MGS) to learn fine-grained image features without using detectors. Last but not the least, we deploy a cross-modal contrastive loss on the global image/text features to learn their weak correlation and thus achieve high generalizability. To validate that our CMCL can be readily generalized to real-world scenarios, we construct a large multi-source image-text dataset called weak semantic correlation dataset (WSCD). Extensive experiments show that our CMCL outperforms the state-of-the-arts while being much more efficient.
文摘Machine intelligence is a fast-developing field and often used as a synonym for artificial intelligence but with a stronger application orientation. Machine intelligence algorithms are not forms of alchemy, but rather useful tools of improving the world we live in. With easier access to data and more affordable computational power,discoveries in research labs are being transformed into real-world applications and therefore creating enormous social and economic values.