Learning disentangled representation of data is a key problem in deep learning.Specifically,disentangling 2D facial landmarks into different factors(e.g.,identity and expression)is widely used in the applications of f...Learning disentangled representation of data is a key problem in deep learning.Specifically,disentangling 2D facial landmarks into different factors(e.g.,identity and expression)is widely used in the applications of face reconstruction,face reenactment and talking head et al..However,due to the sparsity of landmarks and the lack of accurate labels for the factors,it is hard to learn the disentangled representation of landmarks.To address these problem,we propose a simple and effective model named FLD-VAE to disentangle arbitrary facial landmarks into identity and expression latent representations,which is based on a Variational Autoencoder framework.Besides,we propose three invariant loss functions in both latent and data levels to constrain the invariance of representations during training stage.Moreover,we implement an identity preservation loss to further enhance the representation ability of identity factor.To the best of our knowledge,this is the first work to end-to-end disentangle identity and expression factors simultaneously from one single facial landmark.展开更多
In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decod...In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable.展开更多
Subdivision is a widely used technique for mesh refinement.Classic methods rely on fixed manually defined weighting rules and struggle to generate a finer mesh with appropriate details,while advanced neural subdivisio...Subdivision is a widely used technique for mesh refinement.Classic methods rely on fixed manually defined weighting rules and struggle to generate a finer mesh with appropriate details,while advanced neural subdivision methods achieve data-driven nonlinear subdivision but lack robustness,suffering from limited subdivision levels and artifacts on novel shapes.To address these issues,this paper introduces a neural mesh refinement(NMR)method that uses the geometric structural priors learned from fine meshes to adaptively refine coarse meshes through subdivision,demonstrating robust generalization.Our key insight is that it is necessary to disentangle the network from non-structural information such as scale,rotation,and translation,enabling the network to focus on learning and applying the structural priors of local patches for adaptive refinement.For this purpose,we introduce an intrinsic structure descriptor and a locally adaptive neural filter.The intrinsic structure descriptor excludes the non-structural information to align local patches,thereby stabilizing the input feature space and enabling the network to robustly extract structural priors.The proposed neural filter,using a graph attention mechanism,extracts local structural features and adapts learned priors to local patches.Additionally,we observe that Charbonnier loss can alleviate over-smoothing compared to L2 loss.By combining these design choices,our method gains robust geometric learning and locally adaptive capabilities,enhancing generalization to various situations such as unseen shapes and arbitrary refinement levels.We evaluate our method on a diverse set of complex three-dimensional(3D)shapes,and experimental results show that it outperforms existing subdivision methods in terms of geometry quality.See https://zhuzhiwei99.github.io/NeuralMeshRefinement for the project page.展开更多
Natural Language Inference(NLI)seeks to deduce the relations of two texts:a premise and a hypothesis.These two texts may share similar or different basic contexts,while three distinct reasoning factors emerge in the i...Natural Language Inference(NLI)seeks to deduce the relations of two texts:a premise and a hypothesis.These two texts may share similar or different basic contexts,while three distinct reasoning factors emerge in the inference from premise to hypothesis:entailment,neutrality,and contradiction.However,the entanglement of the reasoning factor with the basic context in the learned representation space often complicates the task of NLI models,hindering accurate classification and determination based on the reasoning factors.In this study,drawing inspiration from the successful application of disentangled variational autoencoders in other areas,we separate and extract the reasoning factor from the basic context of NLI data through latent variational inference.Meanwhile,we employ mutual information estimation when optimizing Variational AutoEncoders(VAE)-disentangled reasoning factors further.Leveraging disentanglement optimization in NLI,our proposed a Directed NLI(DNLI)model demonstrates excellent performance compared to state-of-the-art baseline models in experiments on three widely used datasets:Stanford Natural Language Inference(SNLI),Multi-genre Natural Language Inference(MNLI),and Adversarial Natural Language Inference(ANLI).It particularly achieves the best average validation scores,showing significant improvements over the second-best models.Notably,our approach effectively addresses the interpretability challenges commonly encountered in NLI methods.展开更多
As an important subject of natural language generation,Controllable Text Generation(CTG)focuses on integrating additional constraints and controls while generating texts and has attracted a lot of attention.Existing c...As an important subject of natural language generation,Controllable Text Generation(CTG)focuses on integrating additional constraints and controls while generating texts and has attracted a lot of attention.Existing controllable text generation approaches mainly capture the statistical association implied within training texts,but generated texts lack causality consideration.This paper intends to review recent CTG approaches from a causal perspective.Firstly,according to previous research on basic types of CTG models,it is discovered that their essence is to obtain the association,and then four kinds of challenges caused by absence of causality are introduced.Next,this paper reviews the improvements to address these challenges from four aspects,namely representation disentanglement,causal inference,knowledge enhancement and multi-aspect CTG respectively.Additionally,this paper inspects existing evaluations of CTG,especially evaluations for causality of CTG.Finally,this review discusses some future research directions for the causality improvement of CTG and makes a conclusion.展开更多
representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and contr...representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.展开更多
基金Supported by the National Natural Science Foundation of China(61210007).
文摘Learning disentangled representation of data is a key problem in deep learning.Specifically,disentangling 2D facial landmarks into different factors(e.g.,identity and expression)is widely used in the applications of face reconstruction,face reenactment and talking head et al..However,due to the sparsity of landmarks and the lack of accurate labels for the factors,it is hard to learn the disentangled representation of landmarks.To address these problem,we propose a simple and effective model named FLD-VAE to disentangle arbitrary facial landmarks into identity and expression latent representations,which is based on a Variational Autoencoder framework.Besides,we propose three invariant loss functions in both latent and data levels to constrain the invariance of representations during training stage.Moreover,we implement an identity preservation loss to further enhance the representation ability of identity factor.To the best of our knowledge,this is the first work to end-to-end disentangle identity and expression factors simultaneously from one single facial landmark.
基金supported in part by the National Key R&D Program of China(No.2022ZD0116500)in part by the National Natural Science Foundation of China(No.62206284)。
文摘In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable.
基金Project supported by the National Natural Science Foundation of China(Nos.62071427 and U21B2004)。
文摘Subdivision is a widely used technique for mesh refinement.Classic methods rely on fixed manually defined weighting rules and struggle to generate a finer mesh with appropriate details,while advanced neural subdivision methods achieve data-driven nonlinear subdivision but lack robustness,suffering from limited subdivision levels and artifacts on novel shapes.To address these issues,this paper introduces a neural mesh refinement(NMR)method that uses the geometric structural priors learned from fine meshes to adaptively refine coarse meshes through subdivision,demonstrating robust generalization.Our key insight is that it is necessary to disentangle the network from non-structural information such as scale,rotation,and translation,enabling the network to focus on learning and applying the structural priors of local patches for adaptive refinement.For this purpose,we introduce an intrinsic structure descriptor and a locally adaptive neural filter.The intrinsic structure descriptor excludes the non-structural information to align local patches,thereby stabilizing the input feature space and enabling the network to robustly extract structural priors.The proposed neural filter,using a graph attention mechanism,extracts local structural features and adapts learned priors to local patches.Additionally,we observe that Charbonnier loss can alleviate over-smoothing compared to L2 loss.By combining these design choices,our method gains robust geometric learning and locally adaptive capabilities,enhancing generalization to various situations such as unseen shapes and arbitrary refinement levels.We evaluate our method on a diverse set of complex three-dimensional(3D)shapes,and experimental results show that it outperforms existing subdivision methods in terms of geometry quality.See https://zhuzhiwei99.github.io/NeuralMeshRefinement for the project page.
基金supported by the National Key R&D Program of China(No.2022ZD0160703)the National Natural Science Foundation of China(Nos.62202422 and 62071330)+1 种基金the Natural Science Foundation of Shandong Province(No.ZR2021MH227)the Shanghai Artificial Intelligence Laboratory.
文摘Natural Language Inference(NLI)seeks to deduce the relations of two texts:a premise and a hypothesis.These two texts may share similar or different basic contexts,while three distinct reasoning factors emerge in the inference from premise to hypothesis:entailment,neutrality,and contradiction.However,the entanglement of the reasoning factor with the basic context in the learned representation space often complicates the task of NLI models,hindering accurate classification and determination based on the reasoning factors.In this study,drawing inspiration from the successful application of disentangled variational autoencoders in other areas,we separate and extract the reasoning factor from the basic context of NLI data through latent variational inference.Meanwhile,we employ mutual information estimation when optimizing Variational AutoEncoders(VAE)-disentangled reasoning factors further.Leveraging disentanglement optimization in NLI,our proposed a Directed NLI(DNLI)model demonstrates excellent performance compared to state-of-the-art baseline models in experiments on three widely used datasets:Stanford Natural Language Inference(SNLI),Multi-genre Natural Language Inference(MNLI),and Adversarial Natural Language Inference(ANLI).It particularly achieves the best average validation scores,showing significant improvements over the second-best models.Notably,our approach effectively addresses the interpretability challenges commonly encountered in NLI methods.
基金supported in part by the National Key Research and Development Program of China(2022YFB4501704)in part by the Shanghai Science and Technology Innovation Action Plan Project(22511100700).
文摘As an important subject of natural language generation,Controllable Text Generation(CTG)focuses on integrating additional constraints and controls while generating texts and has attracted a lot of attention.Existing controllable text generation approaches mainly capture the statistical association implied within training texts,but generated texts lack causality consideration.This paper intends to review recent CTG approaches from a causal perspective.Firstly,according to previous research on basic types of CTG models,it is discovered that their essence is to obtain the association,and then four kinds of challenges caused by absence of causality are introduced.Next,this paper reviews the improvements to address these challenges from four aspects,namely representation disentanglement,causal inference,knowledge enhancement and multi-aspect CTG respectively.Additionally,this paper inspects existing evaluations of CTG,especially evaluations for causality of CTG.Finally,this review discusses some future research directions for the causality improvement of CTG and makes a conclusion.
基金supported by the National Natural Science Foundation of China(Nos.61825103,62202349)the Natural Science Foundation of Hubei Province(Nos.2022CFB352,2020CFA001)the Key Research&Development of Hubei Province(No.2020BIB006).
文摘representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.