representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and contr...representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.展开更多
Subdivision is a widely used technique for mesh refinement.Classic methods rely on fixed manually defined weighting rules and struggle to generate a finer mesh with appropriate details,while advanced neural subdivisio...Subdivision is a widely used technique for mesh refinement.Classic methods rely on fixed manually defined weighting rules and struggle to generate a finer mesh with appropriate details,while advanced neural subdivision methods achieve data-driven nonlinear subdivision but lack robustness,suffering from limited subdivision levels and artifacts on novel shapes.To address these issues,this paper introduces a neural mesh refinement(NMR)method that uses the geometric structural priors learned from fine meshes to adaptively refine coarse meshes through subdivision,demonstrating robust generalization.Our key insight is that it is necessary to disentangle the network from non-structural information such as scale,rotation,and translation,enabling the network to focus on learning and applying the structural priors of local patches for adaptive refinement.For this purpose,we introduce an intrinsic structure descriptor and a locally adaptive neural filter.The intrinsic structure descriptor excludes the non-structural information to align local patches,thereby stabilizing the input feature space and enabling the network to robustly extract structural priors.The proposed neural filter,using a graph attention mechanism,extracts local structural features and adapts learned priors to local patches.Additionally,we observe that Charbonnier loss can alleviate over-smoothing compared to L2 loss.By combining these design choices,our method gains robust geometric learning and locally adaptive capabilities,enhancing generalization to various situations such as unseen shapes and arbitrary refinement levels.We evaluate our method on a diverse set of complex three-dimensional(3D)shapes,and experimental results show that it outperforms existing subdivision methods in terms of geometry quality.See https://zhuzhiwei99.github.io/NeuralMeshRefinement for the project page.展开更多
In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decod...In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable.展开更多
The problem of Point-Of-Interest(POI)recommendation,based on the user’s historical checkin records,determines whether a user checks in at specific POI.However,the user-POI data have a longtail distribution phenomenon...The problem of Point-Of-Interest(POI)recommendation,based on the user’s historical checkin records,determines whether a user checks in at specific POI.However,the user-POI data have a longtail distribution phenomenon.To mitigate the sparsity of check-in data,it is a good idea to exploit the sufficient attributes of POI and recommend POIs in both geography wise and category wise.Generally,this problem can be treated as two specific tasks with feature combination,ignoring cross-task dependencies and feature disentanglement.To address the aforementioned problems,this paper proposes a novel joint framework named InteractPOI,enabling two-stage interaction bewteen geographywise and category-wise POI recommendations.Specifically,this paper comprehensively considers the sequence effect and the neighbor effect both from geography wise and category wise.For the firststage interaction,we design a disentangled graph embedding model to distinguish different influencing factors from geography wise and category wise.For the second-stage interaction,we integrate a gating mechanism for feature fusion with a complementary algorithm for interactive optimization.Extensive experiments on two datasets demonstrate the superiority of the proposed model.展开更多
Natural Language Inference(NLI)seeks to deduce the relations of two texts:a premise and a hypothesis.These two texts may share similar or different basic contexts,while three distinct reasoning factors emerge in the i...Natural Language Inference(NLI)seeks to deduce the relations of two texts:a premise and a hypothesis.These two texts may share similar or different basic contexts,while three distinct reasoning factors emerge in the inference from premise to hypothesis:entailment,neutrality,and contradiction.However,the entanglement of the reasoning factor with the basic context in the learned representation space often complicates the task of NLI models,hindering accurate classification and determination based on the reasoning factors.In this study,drawing inspiration from the successful application of disentangled variational autoencoders in other areas,we separate and extract the reasoning factor from the basic context of NLI data through latent variational inference.Meanwhile,we employ mutual information estimation when optimizing Variational AutoEncoders(VAE)-disentangled reasoning factors further.Leveraging disentanglement optimization in NLI,our proposed a Directed NLI(DNLI)model demonstrates excellent performance compared to state-of-the-art baseline models in experiments on three widely used datasets:Stanford Natural Language Inference(SNLI),Multi-genre Natural Language Inference(MNLI),and Adversarial Natural Language Inference(ANLI).It particularly achieves the best average validation scores,showing significant improvements over the second-best models.Notably,our approach effectively addresses the interpretability challenges commonly encountered in NLI methods.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61825103,62202349)the Natural Science Foundation of Hubei Province(Nos.2022CFB352,2020CFA001)the Key Research&Development of Hubei Province(No.2020BIB006).
文摘representation that can identify and isolate different potential variables hidden in the highdimensional observations.Disentangled representation learning can capture information about a single change factor and control it by the corresponding potential subspace,providing a robust representation for complex changes in the data.In this paper,we first introduce and analyze the current status of research on disentangled representation and its causal mechanisms and summarize three crucial properties of disentangled representation.Then,disentangled representation learning algorithms are classified into four categories and outlined in terms of both mathematical description and applicability.Subsequently,the loss functions and objective evaluation metrics commonly used in existing work on disentangled representation are classified.Finally,the paper summarizes representative applications of disentangled representation learning in the field of remote sensing and discusses its future development.
基金Project supported by the National Natural Science Foundation of China(Nos.62071427 and U21B2004)。
文摘Subdivision is a widely used technique for mesh refinement.Classic methods rely on fixed manually defined weighting rules and struggle to generate a finer mesh with appropriate details,while advanced neural subdivision methods achieve data-driven nonlinear subdivision but lack robustness,suffering from limited subdivision levels and artifacts on novel shapes.To address these issues,this paper introduces a neural mesh refinement(NMR)method that uses the geometric structural priors learned from fine meshes to adaptively refine coarse meshes through subdivision,demonstrating robust generalization.Our key insight is that it is necessary to disentangle the network from non-structural information such as scale,rotation,and translation,enabling the network to focus on learning and applying the structural priors of local patches for adaptive refinement.For this purpose,we introduce an intrinsic structure descriptor and a locally adaptive neural filter.The intrinsic structure descriptor excludes the non-structural information to align local patches,thereby stabilizing the input feature space and enabling the network to robustly extract structural priors.The proposed neural filter,using a graph attention mechanism,extracts local structural features and adapts learned priors to local patches.Additionally,we observe that Charbonnier loss can alleviate over-smoothing compared to L2 loss.By combining these design choices,our method gains robust geometric learning and locally adaptive capabilities,enhancing generalization to various situations such as unseen shapes and arbitrary refinement levels.We evaluate our method on a diverse set of complex three-dimensional(3D)shapes,and experimental results show that it outperforms existing subdivision methods in terms of geometry quality.See https://zhuzhiwei99.github.io/NeuralMeshRefinement for the project page.
基金supported in part by the National Key R&D Program of China(No.2022ZD0116500)in part by the National Natural Science Foundation of China(No.62206284)。
文摘In the field of brain decoding research,reconstructing visual perception from neural recordings is a challenging but crucial task.With the use of superior algorithms,many methods have been dedicated to enhancing decoding performance.However,these models that map neural activities onto semantically entangled feature space are difficult to interpret.It is hard to understand the connections between neural activities and these abstract features.In this paper,we propose an interpretable neural decoding model that projects neural activities onto a semantically disentangled feature space with each dimension representing distinct visual attributes,such as gender and facial pose.A two-stage algorithm is designed to achieve this goal.First,a deep generative model learns semantically-disentangled image representations in an unsupervised way.Second,neural activities are linearly embedded into the semantic space,which the generator uses to reconstruct visual stimuli.Due to modality heterogeneity,it is challenging to learn such a neural embedded high-level semantic representation.We induce pixel,feature,and semantic alignment to ensure reconstruction quality.Three experimental fMRI datasets containing handwritten digits,characters,and human face stimuli are used to evaluate the neural decoding performance of our model.We also demonstrate the model interpretability through a reconstructed image editing application.The experimental results indicate that our model maintains a competitive decoding performance while remaining interpretable.
基金funded by the National Natural Science Foundation of China(Nos.62172090 and 62202209)the Key Laboratory of Computer Network and Information Integration of Ministry of Education of China(No.93K-9-2024-04)the Jiangsu Province Higher Education Basic Science(Natural Science)Foundation(No.24KJB520014).
文摘The problem of Point-Of-Interest(POI)recommendation,based on the user’s historical checkin records,determines whether a user checks in at specific POI.However,the user-POI data have a longtail distribution phenomenon.To mitigate the sparsity of check-in data,it is a good idea to exploit the sufficient attributes of POI and recommend POIs in both geography wise and category wise.Generally,this problem can be treated as two specific tasks with feature combination,ignoring cross-task dependencies and feature disentanglement.To address the aforementioned problems,this paper proposes a novel joint framework named InteractPOI,enabling two-stage interaction bewteen geographywise and category-wise POI recommendations.Specifically,this paper comprehensively considers the sequence effect and the neighbor effect both from geography wise and category wise.For the firststage interaction,we design a disentangled graph embedding model to distinguish different influencing factors from geography wise and category wise.For the second-stage interaction,we integrate a gating mechanism for feature fusion with a complementary algorithm for interactive optimization.Extensive experiments on two datasets demonstrate the superiority of the proposed model.
基金supported by the National Key R&D Program of China(No.2022ZD0160703)the National Natural Science Foundation of China(Nos.62202422 and 62071330)+1 种基金the Natural Science Foundation of Shandong Province(No.ZR2021MH227)the Shanghai Artificial Intelligence Laboratory.
文摘Natural Language Inference(NLI)seeks to deduce the relations of two texts:a premise and a hypothesis.These two texts may share similar or different basic contexts,while three distinct reasoning factors emerge in the inference from premise to hypothesis:entailment,neutrality,and contradiction.However,the entanglement of the reasoning factor with the basic context in the learned representation space often complicates the task of NLI models,hindering accurate classification and determination based on the reasoning factors.In this study,drawing inspiration from the successful application of disentangled variational autoencoders in other areas,we separate and extract the reasoning factor from the basic context of NLI data through latent variational inference.Meanwhile,we employ mutual information estimation when optimizing Variational AutoEncoders(VAE)-disentangled reasoning factors further.Leveraging disentanglement optimization in NLI,our proposed a Directed NLI(DNLI)model demonstrates excellent performance compared to state-of-the-art baseline models in experiments on three widely used datasets:Stanford Natural Language Inference(SNLI),Multi-genre Natural Language Inference(MNLI),and Adversarial Natural Language Inference(ANLI).It particularly achieves the best average validation scores,showing significant improvements over the second-best models.Notably,our approach effectively addresses the interpretability challenges commonly encountered in NLI methods.