To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing method...In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.展开更多
Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or ...Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.展开更多
Knowledge graphs(KGs)have been widely accepted as powerful tools for modeling the complex relationships between concepts and developing knowledge-based services.In recent years,researchers in the field of power system...Knowledge graphs(KGs)have been widely accepted as powerful tools for modeling the complex relationships between concepts and developing knowledge-based services.In recent years,researchers in the field of power systems have explored KGs to develop intelligent dispatching systems for increasingly large power grids.With multiple power grid dispatching knowledge graphs(PDKGs)constructed by different agencies,the knowledge fusion of different PDKGs is useful for providing more accurate decision supports.To achieve this,entity alignment that aims at connecting different KGs by identifying equivalent entities is a critical step.Existing entity alignment methods cannot integrate useful structural,attribute,and relational information while calculating entities’similarities and are prone to making many-to-one alignments,thus can hardly achieve the best performance.To address these issues,this paper proposes a collective entity alignment model that integrates three kinds of available information and makes collective counterpart assignments.This model proposes a novel knowledge graph attention network(KGAT)to learn the embeddings of entities and relations explicitly and calculates entities’similarities by adaptively incorporating the structural,attribute,and relational similarities.Then,we formulate the counterpart assignment task as an integer programming(IP)problem to obtain one-to-one alignments.We not only conduct experiments on a pair of PDKGs but also evaluate o ur model on three commonly used cross-lingual KGs.Experimental comparisons indicate that our model outperforms other methods and provides an effective tool for the knowledge fusion of PDKGs.展开更多
Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant resear...Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.展开更多
Entity alignment(EA)is crucial for knowledge fusion and integration,as it aims to match equivalent entities across different KGs.Recently,many neural-based EA methods have been proposed,focusing on developing various ...Entity alignment(EA)is crucial for knowledge fusion and integration,as it aims to match equivalent entities across different KGs.Recently,many neural-based EA methods have been proposed,focusing on developing various graph representation learning models to match entities in vector spaces.However,most real-world KGs are large-scale and contain rich structural and attribute information about entities,presenting challenges for current approaches designed primarily for small-and medium-sized KGs.To address the challenges of large-scale EA,this paper introduces a simple,effective,and scalable method based on language models.Our approach first leverages the capabilities of language models to encode entities'multi-view information into low-dimensional embeddings,identifying potential aligned entity pairs with high similarity.These candidates are then re-ranked using a global matching algorithm to produce the final alignments.Experimental results show that our method achieves state-of-the-art performance on real-world large-scale EA datasets,with superior accuracy and efficiency compared to existing methods.展开更多
Entity Alignment(EA)aims to identify equivalent entities across different Knowledge Graphs(KGs),enabling knowledge fusion and integration.In recent years,Graph Neural Networks(GNNs)have emerged as a powerful paradigm ...Entity Alignment(EA)aims to identify equivalent entities across different Knowledge Graphs(KGs),enabling knowledge fusion and integration.In recent years,Graph Neural Networks(GNNs)have emerged as a powerful paradigm for EA by leveraging structural information in KGs.However,most existing studies emphasize novel message passing mechanisms while overlooking other crucial GNN design components.This paper presents a comprehensive and systematic evaluation of GNN-based EA methods,focusing on three key aspects:message passing strategies,the number of GNN layers,and the construction of final entity representations.We benchmark a diverse set of GNN models originally developed for tasks such as node classification and knowledge graph completion,and we assess their adaptability to the EA task.Additionally,we explore the effectiveness of skip connection techniques,activation functions,and relational information integration.Our experiments,conducted on standard EA benchmarks including DBP15K and SRPRS,reveal several counterintuitive findings:(1)message passing is indispensable for EA;(2)many node classification GNNs are highly competitive for EA;(3)one or two GNN layers generally achieve optimal performance;and(4)activation functions have minimal impact,while skip connections significantly enhance results.This study provides a principled framework and empirical foundation for designing more effective GNN-based EA models.Code and datasets are publicly available at https://github.com/kg-bnu/GNN-EA.展开更多
Entity Alignment(EA)seeks to identify and match corresponding entities across different Knowledge Graphs(KGs),playing a crucial role in knowledge fusion and integration.Embedding-based entity alignment(EA)has recently...Entity Alignment(EA)seeks to identify and match corresponding entities across different Knowledge Graphs(KGs),playing a crucial role in knowledge fusion and integration.Embedding-based entity alignment(EA)has recently gained considerable attention,resulting in the emergence of many innovative approaches.Initially,these approaches concentrated on learning entity embeddings based on the structural features of knowledge graphs(KGs)as defined by relation triples.Subsequent methods have integrated entities'names and attributes as supplementary information to improve the embeddings used for EA.However,existing methods lack a deep semantic understanding of entity attributes and relations.In this paper,we propose a Large Language Model(LLM)based Entity Alignment method,LLM-Align,which explores the instruction-following and zero-shot capabilities of Large Language Models to infer alignments of entities.LLM-Align uses heuristic methods to select important attributes and relations of entities,and then feeds the selected triples of entities to an LLM to infer the alignment results.To guarantee the quality of alignment results,we design a multi-round voting mechanism to mitigate the hallucination and positional bias issues that occur with LLMs.Experiments on three EA datasets,demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.展开更多
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
基金funded by“the Fanying Special Program of the National Natural Science Foundation of China,grant number 62341307”“the Scientific research project of Jiangxi Provincial Department of Education,grant number GJJ200839”“theDoctoral startup fund of JiangxiUniversity of Technology,grant number 205200100402”.
文摘In multi-modal emotion recognition,excessive reliance on historical context often impedes the detection of emotional shifts,while modality heterogeneity and unimodal noise limit recognition performance.Existing methods struggle to dynamically adjust cross-modal complementary strength to optimize fusion quality and lack effective mechanisms to model the dynamic evolution of emotions.To address these issues,we propose a multi-level dynamic gating and emotion transfer framework for multi-modal emotion recognition.A dynamic gating mechanism is applied across unimodal encoding,cross-modal alignment,and emotion transfer modeling,substantially improving noise robustness and feature alignment.First,we construct a unimodal encoder based on gated recurrent units and feature-selection gating to suppress intra-modal noise and enhance contextual representation.Second,we design a gated-attention crossmodal encoder that dynamically calibrates the complementary contributions of visual and audio modalities to the dominant textual features and eliminates redundant information.Finally,we introduce a gated enhanced emotion transfer module that explicitly models the temporal dependence of emotional evolution in dialogues via transfer gating and optimizes continuity modeling with a comparative learning loss.Experimental results demonstrate that the proposed method outperforms state-of-the-art models on the public MELD and IEMOCAP datasets.
基金funded by Research Project,grant number BHQ090003000X03.
文摘Multi-modal Named Entity Recognition(MNER)aims to better identify meaningful textual entities by integrating information from images.Previous work has focused on extracting visual semantics at a fine-grained level,or obtaining entity related external knowledge from knowledge bases or Large Language Models(LLMs).However,these approaches ignore the poor semantic correlation between visual and textual modalities in MNER datasets and do not explore different multi-modal fusion approaches.In this paper,we present MMAVK,a multi-modal named entity recognition model with auxiliary visual knowledge and word-level fusion,which aims to leverage the Multi-modal Large Language Model(MLLM)as an implicit knowledge base.It also extracts vision-based auxiliary knowledge from the image formore accurate and effective recognition.Specifically,we propose vision-based auxiliary knowledge generation,which guides the MLLM to extract external knowledge exclusively derived from images to aid entity recognition by designing target-specific prompts,thus avoiding redundant recognition and cognitive confusion caused by the simultaneous processing of image-text pairs.Furthermore,we employ a word-level multi-modal fusion mechanism to fuse the extracted external knowledge with each word-embedding embedded from the transformerbased encoder.Extensive experimental results demonstrate that MMAVK outperforms or equals the state-of-the-art methods on the two classical MNER datasets,even when the largemodels employed have significantly fewer parameters than other baselines.
基金supported by the National Key R&D Program of China(2018AAA0101502)the Science and Technology Project of SGCC(State Grid Corporation of China):Fundamental Theory of Human-in-the-Loop Hybrid-Augmented Intelligence for Power Grid Dispatch and Control。
文摘Knowledge graphs(KGs)have been widely accepted as powerful tools for modeling the complex relationships between concepts and developing knowledge-based services.In recent years,researchers in the field of power systems have explored KGs to develop intelligent dispatching systems for increasingly large power grids.With multiple power grid dispatching knowledge graphs(PDKGs)constructed by different agencies,the knowledge fusion of different PDKGs is useful for providing more accurate decision supports.To achieve this,entity alignment that aims at connecting different KGs by identifying equivalent entities is a critical step.Existing entity alignment methods cannot integrate useful structural,attribute,and relational information while calculating entities’similarities and are prone to making many-to-one alignments,thus can hardly achieve the best performance.To address these issues,this paper proposes a collective entity alignment model that integrates three kinds of available information and makes collective counterpart assignments.This model proposes a novel knowledge graph attention network(KGAT)to learn the embeddings of entities and relations explicitly and calculates entities’similarities by adaptively incorporating the structural,attribute,and relational similarities.Then,we formulate the counterpart assignment task as an integer programming(IP)problem to obtain one-to-one alignments.We not only conduct experiments on a pair of PDKGs but also evaluate o ur model on three commonly used cross-lingual KGs.Experimental comparisons indicate that our model outperforms other methods and provides an effective tool for the knowledge fusion of PDKGs.
基金supported by the Natural Science Foundation of Liaoning Province(Grant No.2023-MSBA-070)the National Natural Science Foundation of China(Grant No.62302086).
文摘Multi-modal fusion technology gradually become a fundamental task in many fields,such as autonomous driving,smart healthcare,sentiment analysis,and human-computer interaction.It is rapidly becoming the dominant research due to its powerful perception and judgment capabilities.Under complex scenes,multi-modal fusion technology utilizes the complementary characteristics of multiple data streams to fuse different data types and achieve more accurate predictions.However,achieving outstanding performance is challenging because of equipment performance limitations,missing information,and data noise.This paper comprehensively reviews existing methods based onmulti-modal fusion techniques and completes a detailed and in-depth analysis.According to the data fusion stage,multi-modal fusion has four primary methods:early fusion,deep fusion,late fusion,and hybrid fusion.The paper surveys the three majormulti-modal fusion technologies that can significantly enhance the effect of data fusion and further explore the applications of multi-modal fusion technology in various fields.Finally,it discusses the challenges and explores potential research opportunities.Multi-modal tasks still need intensive study because of data heterogeneity and quality.Preserving complementary information and eliminating redundant information between modalities is critical in multi-modal technology.Invalid data fusion methods may introduce extra noise and lead to worse results.This paper provides a comprehensive and detailed summary in response to these challenges.
基金supported by the National Natural Science Foundation of China(No.62276026)。
文摘Entity alignment(EA)is crucial for knowledge fusion and integration,as it aims to match equivalent entities across different KGs.Recently,many neural-based EA methods have been proposed,focusing on developing various graph representation learning models to match entities in vector spaces.However,most real-world KGs are large-scale and contain rich structural and attribute information about entities,presenting challenges for current approaches designed primarily for small-and medium-sized KGs.To address the challenges of large-scale EA,this paper introduces a simple,effective,and scalable method based on language models.Our approach first leverages the capabilities of language models to encode entities'multi-view information into low-dimensional embeddings,identifying potential aligned entity pairs with high similarity.These candidates are then re-ranked using a global matching algorithm to produce the final alignments.Experimental results show that our method achieves state-of-the-art performance on real-world large-scale EA datasets,with superior accuracy and efficiency compared to existing methods.
基金supported by the National Natural Science Foundation of China(No.62276026)。
文摘Entity Alignment(EA)aims to identify equivalent entities across different Knowledge Graphs(KGs),enabling knowledge fusion and integration.In recent years,Graph Neural Networks(GNNs)have emerged as a powerful paradigm for EA by leveraging structural information in KGs.However,most existing studies emphasize novel message passing mechanisms while overlooking other crucial GNN design components.This paper presents a comprehensive and systematic evaluation of GNN-based EA methods,focusing on three key aspects:message passing strategies,the number of GNN layers,and the construction of final entity representations.We benchmark a diverse set of GNN models originally developed for tasks such as node classification and knowledge graph completion,and we assess their adaptability to the EA task.Additionally,we explore the effectiveness of skip connection techniques,activation functions,and relational information integration.Our experiments,conducted on standard EA benchmarks including DBP15K and SRPRS,reveal several counterintuitive findings:(1)message passing is indispensable for EA;(2)many node classification GNNs are highly competitive for EA;(3)one or two GNN layers generally achieve optimal performance;and(4)activation functions have minimal impact,while skip connections significantly enhance results.This study provides a principled framework and empirical foundation for designing more effective GNN-based EA models.Code and datasets are publicly available at https://github.com/kg-bnu/GNN-EA.
基金supported by the National Natural Science Foundation of China(No.62276026)。
文摘Entity Alignment(EA)seeks to identify and match corresponding entities across different Knowledge Graphs(KGs),playing a crucial role in knowledge fusion and integration.Embedding-based entity alignment(EA)has recently gained considerable attention,resulting in the emergence of many innovative approaches.Initially,these approaches concentrated on learning entity embeddings based on the structural features of knowledge graphs(KGs)as defined by relation triples.Subsequent methods have integrated entities'names and attributes as supplementary information to improve the embeddings used for EA.However,existing methods lack a deep semantic understanding of entity attributes and relations.In this paper,we propose a Large Language Model(LLM)based Entity Alignment method,LLM-Align,which explores the instruction-following and zero-shot capabilities of Large Language Models to infer alignments of entities.LLM-Align uses heuristic methods to select important attributes and relations of entities,and then feeds the selected triples of entities to an LLM to infer the alignment results.To guarantee the quality of alignment results,we design a multi-round voting mechanism to mitigate the hallucination and positional bias issues that occur with LLMs.Experiments on three EA datasets,demonstrating that our approach achieves state-of-the-art performance compared to existing EA methods.