Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and ...Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.展开更多
Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the intro...Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.展开更多
The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,...The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,which accounts for the advantage of the multi-modal knowledge graph.In the field of cross-modal retrieval platforms,multi-modal knowledge graphs can help to improve retrieval accuracy and efficiency because of the abundant relational infor-mation provided by knowledge graphs.The representation learning method is sig-nificant to the application of multi-modal knowledge graphs.This paper proposes a distributed collaborative vector retrieval platform(DCRL-KG)using the multi-modal knowledge graph VisualSem as the foundation to achieve efficient and high-precision multimodal data retrieval.Firstly,use distributed technology to classify and store the data in the knowledge graph to improve retrieval efficiency.Secondly,this paper uses BabelNet to expand the knowledge graph through multi-ple filtering processes and increase the diversification of information.Finally,this paper builds a variety of retrieval models to achieve the fusion of retrieval results through linear combination methods to achieve high-precision language retrieval and image retrieval.The paper uses sentence retrieval and image retrieval experi-ments to prove that the platform can optimize the storage structure of the multi-modal knowledge graph and have good performance in multi-modal space.展开更多
To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities...To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.展开更多
Mangroves are crucial to the ecological security of the Earth and human well-being.Their management,conservation,and restoration are of great importance and necessitate the support of spatio-temporal information and m...Mangroves are crucial to the ecological security of the Earth and human well-being.Their management,conservation,and restoration are of great importance and necessitate the support of spatio-temporal information and multidisciplinary knowledge such as biology and ecology.Traditional knowledge services such as plant atlas provide illustrated textual knowledge of mangroves.However,this kind of service is oriented to information retrieval and is incapable of effectively mining and utilizing fragmented knowledge from multi-source heterogeneous data,facing the problem of“massive data,rare knowledge”.Knowledge graphs are capable of extracting,organizing,and fusing the knowledge contained in massive data into semantic networks that can be understood and computed by computers.They provide a solution for the realization of intelligent knowledge services.Focusing on the urgent need for mangrove knowledge acquisition,formal representation,and intelligent services,this paper proposes a research prospect on mangrove knowledge graphs and knowledge services.We first analyze the similarities and differences between various domain-specific concepts of Tupu.On this basis,we define the mangrove knowledge graph as a large-scale knowledge base that integrates multi-disciplinary knowledge and spatio-temporal information with mangrove ecosystems as the core.Then,we propose a research framework for mangrove knowledge services that can realize the transformation from multi-modal data to intelligent knowledge services,including multiple research levels such as ubiquitous data sensing and aggregation,knowledge organization and graph construction,and intelligent mangrove knowledge services.Subsequently,the methods and workflow for constructing mangrove knowledge graphs are introduced.Finally,we discuss the challenges and possible future directions of mangrove knowledge services in the smart era,including the construction of a mangrove knowledge system that integrates the domain-specific characteristics and spatio-temporal features of mangroves,the exploration of knowledge extraction and fusion methods supported by large language models,and the development of intelligent knowledge applications for typical scenarios.展开更多
Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-dr...Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-driven applications on the Web(e.g.,news-reading and e-shopping)require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph(KG),which can take their performance to the next level.In light of this,in this paper,we identify a new research task:visual entity linking for fine-grained scene understanding.To accomplish the task,we first extract features of candidate entities from different modalities,i.e.,visual features,textual features,and KG features.Then,we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG.Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46%to 83.16%compared with baselines.展开更多
Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks, i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Li...Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks, i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Linking(ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions are linked to unseen entities in the knowledge graph, and the purpose of zero-shot setting is to realize robust linking in highly specialized domains. Simultaneously, the inference efficiency of existing models is low when there are many candidate entities. On this account, we propose a novel model that leverages visuallinguistic representation through the co-attentional mechanism to deal with the ZEMEL task, considering the trade-off between performance and efficiency of the model. We also build a dataset named ZEMELD for the new task, which contains multi-modal data resources collected from Wikipedia, and we annotate the entities as ground truth. Extensive experimental results on the dataset show that our proposed model is effective as it significantly improves the precision from 68.93% to 82.62% comparing with baselines in the ZEMEL task.展开更多
基金funded by Research Project,grant number BHQ090003000X03。
文摘Multi-modal knowledge graph completion(MMKGC)aims to complete missing entities or relations in multi-modal knowledge graphs,thereby discovering more previously unknown triples.Due to the continuous growth of data and knowledge and the limitations of data sources,the visual knowledge within the knowledge graphs is generally of low quality,and some entities suffer from the issue of missing visual modality.Nevertheless,previous studies of MMKGC have primarily focused on how to facilitate modality interaction and fusion while neglecting the problems of low modality quality and modality missing.In this case,mainstream MMKGC models only use pre-trained visual encoders to extract features and transfer the semantic information to the joint embeddings through modal fusion,which inevitably suffers from problems such as error propagation and increased uncertainty.To address these problems,we propose a Multi-modal knowledge graph Completion model based on Super-resolution and Detailed Description Generation(MMCSD).Specifically,we leverage a pre-trained residual network to enhance the resolution and improve the quality of the visual modality.Moreover,we design multi-level visual semantic extraction and entity description generation,thereby further extracting entity semantics from structural triples and visual images.Meanwhile,we train a variational multi-modal auto-encoder and utilize a pre-trained multi-modal language model to complement the missing visual features.We conducted experiments on FB15K-237 and DB13K,and the results showed that MMCSD can effectively perform MMKGC and achieve state-of-the-art performance.
基金National College Students’Training Programs of Innovation and Entrepreneurship,Grant/Award Number:S202210022060the CACMS Innovation Fund,Grant/Award Number:CI2021A00512the National Nature Science Foundation of China under Grant,Grant/Award Number:62206021。
文摘Media convergence works by processing information from different modalities and applying them to different domains.It is difficult for the conventional knowledge graph to utilise multi-media features because the introduction of a large amount of information from other modalities reduces the effectiveness of representation learning and makes knowledge graph inference less effective.To address the issue,an inference method based on Media Convergence and Rule-guided Joint Inference model(MCRJI)has been pro-posed.The authors not only converge multi-media features of entities but also introduce logic rules to improve the accuracy and interpretability of link prediction.First,a multi-headed self-attention approach is used to obtain the attention of different media features of entities during semantic synthesis.Second,logic rules of different lengths are mined from knowledge graph to learn new entity representations.Finally,knowledge graph inference is performed based on representing entities that converge multi-media features.Numerous experimental results show that MCRJI outperforms other advanced baselines in using multi-media features and knowledge graph inference,demonstrating that MCRJI provides an excellent approach for knowledge graph inference with converged multi-media features.
基金This work is supported by the Fundamental Research Funds for the Central Universities(Grant No.HIT.NSRIF.201714)Weihai Science and Technology Development Program(2016DX GJMS15)+1 种基金Weihai Scientific Research and Innovation Fund(2020)Key Research and Development Program in Shandong Provincial(2017GGX90103).
文摘The knowledge graph with relational abundant information has been widely used as the basic data support for the retrieval platforms.Image and text descriptions added to the knowledge graph enrich the node information,which accounts for the advantage of the multi-modal knowledge graph.In the field of cross-modal retrieval platforms,multi-modal knowledge graphs can help to improve retrieval accuracy and efficiency because of the abundant relational infor-mation provided by knowledge graphs.The representation learning method is sig-nificant to the application of multi-modal knowledge graphs.This paper proposes a distributed collaborative vector retrieval platform(DCRL-KG)using the multi-modal knowledge graph VisualSem as the foundation to achieve efficient and high-precision multimodal data retrieval.Firstly,use distributed technology to classify and store the data in the knowledge graph to improve retrieval efficiency.Secondly,this paper uses BabelNet to expand the knowledge graph through multi-ple filtering processes and increase the diversification of information.Finally,this paper builds a variety of retrieval models to achieve the fusion of retrieval results through linear combination methods to achieve high-precision language retrieval and image retrieval.The paper uses sentence retrieval and image retrieval experi-ments to prove that the platform can optimize the storage structure of the multi-modal knowledge graph and have good performance in multi-modal space.
基金partially supported by the National Natural Science Foundation of China under Grants 62471493 and 62402257(for conceptualization and investigation)partially supported by the Natural Science Foundation of Shandong Province,China under Grants ZR2023LZH017,ZR2024MF066,and 2023QF025(for formal analysis and validation)+1 种基金partially supported by the Open Foundation of Key Laboratory of Computing Power Network and Information Security,Ministry of Education,Qilu University of Technology(Shandong Academy of Sciences)under Grant 2023ZD010(for methodology and model design)partially supported by the Russian Science Foundation(RSF)Project under Grant 22-71-10095-P(for validation and results verification).
文摘To address the challenge of missing modal information in entity alignment and to mitigate information loss or bias arising frommodal heterogeneity during fusion,while also capturing shared information acrossmodalities,this paper proposes a Multi-modal Pre-synergistic Entity Alignmentmodel based on Cross-modalMutual Information Strategy Optimization(MPSEA).The model first employs independent encoders to process multi-modal features,including text,images,and numerical values.Next,a multi-modal pre-synergistic fusion mechanism integrates graph structural and visual modal features into the textual modality as preparatory information.This pre-fusion strategy enables unified perception of heterogeneous modalities at the model’s initial stage,reducing discrepancies during the fusion process.Finally,using cross-modal deep perception reinforcement learning,the model achieves adaptive multilevel feature fusion between modalities,supporting learningmore effective alignment strategies.Extensive experiments on multiple public datasets show that the MPSEA method achieves gains of up to 7% in Hits@1 and 8.2% in MRR on the FBDB15K dataset,and up to 9.1% in Hits@1 and 7.7% in MRR on the FBYG15K dataset,compared to existing state-of-the-art methods.These results confirm the effectiveness of the proposed model.
基金supported by the National Natural Science Foundation of China(Grant No.42301536)the National Key Research and Development Program of China(Grant No.2022YFF0711602)the GDAS’Project of Science and Technology Development(Grant Nos.2022GDASZH-2022010202,2022GDASZH2022020402-01&2022GDASZH-2022010111)。
文摘Mangroves are crucial to the ecological security of the Earth and human well-being.Their management,conservation,and restoration are of great importance and necessitate the support of spatio-temporal information and multidisciplinary knowledge such as biology and ecology.Traditional knowledge services such as plant atlas provide illustrated textual knowledge of mangroves.However,this kind of service is oriented to information retrieval and is incapable of effectively mining and utilizing fragmented knowledge from multi-source heterogeneous data,facing the problem of“massive data,rare knowledge”.Knowledge graphs are capable of extracting,organizing,and fusing the knowledge contained in massive data into semantic networks that can be understood and computed by computers.They provide a solution for the realization of intelligent knowledge services.Focusing on the urgent need for mangrove knowledge acquisition,formal representation,and intelligent services,this paper proposes a research prospect on mangrove knowledge graphs and knowledge services.We first analyze the similarities and differences between various domain-specific concepts of Tupu.On this basis,we define the mangrove knowledge graph as a large-scale knowledge base that integrates multi-disciplinary knowledge and spatio-temporal information with mangrove ecosystems as the core.Then,we propose a research framework for mangrove knowledge services that can realize the transformation from multi-modal data to intelligent knowledge services,including multiple research levels such as ubiquitous data sensing and aggregation,knowledge organization and graph construction,and intelligent mangrove knowledge services.Subsequently,the methods and workflow for constructing mangrove knowledge graphs are introduced.Finally,we discuss the challenges and possible future directions of mangrove knowledge services in the smart era,including the construction of a mangrove knowledge system that integrates the domain-specific characteristics and spatio-temporal features of mangroves,the exploration of knowledge extraction and fusion methods supported by large language models,and the development of intelligent knowledge applications for typical scenarios.
文摘Existing visual scene understanding methods mainly focus on identifying coarse-grained concepts about the visual objects and their relationships,largely neglecting fine-grained scene understanding.In fact,many data-driven applications on the Web(e.g.,news-reading and e-shopping)require accurate recognition of much less coarse concepts as entities and proper linking them to a knowledge graph(KG),which can take their performance to the next level.In light of this,in this paper,we identify a new research task:visual entity linking for fine-grained scene understanding.To accomplish the task,we first extract features of candidate entities from different modalities,i.e.,visual features,textual features,and KG features.Then,we design a deep modal-attention neural network-based learning-to-rank method which aggregates all features and maps visual objects to the entities in KG.Extensive experimental results on the newly constructed dataset show that our proposed method is effective as it significantly improves the accuracy performance from 66.46%to 83.16%compared with baselines.
文摘Multi-modal entity linking plays a crucial role in a wide range of knowledge-based modal-fusion tasks, i.e., multi-modal retrieval and multi-modal event extraction. We introduce the new ZEro-shot Multi-modal Entity Linking(ZEMEL) task, the format is similar to multi-modal entity linking, but multi-modal mentions are linked to unseen entities in the knowledge graph, and the purpose of zero-shot setting is to realize robust linking in highly specialized domains. Simultaneously, the inference efficiency of existing models is low when there are many candidate entities. On this account, we propose a novel model that leverages visuallinguistic representation through the co-attentional mechanism to deal with the ZEMEL task, considering the trade-off between performance and efficiency of the model. We also build a dataset named ZEMELD for the new task, which contains multi-modal data resources collected from Wikipedia, and we annotate the entities as ground truth. Extensive experimental results on the dataset show that our proposed model is effective as it significantly improves the precision from 68.93% to 82.62% comparing with baselines in the ZEMEL task.