News media profiling is helpful in preventing the spread of fake news at the source and maintaining a good media and news ecosystem.Most previous works only extract features and evaluate media from one dimension indep...News media profiling is helpful in preventing the spread of fake news at the source and maintaining a good media and news ecosystem.Most previous works only extract features and evaluate media from one dimension independently,ignoring the interconnections between different aspects.This paper proposes a novel news media bias and factuality profiling framework assisted by correlated features.This framework models the relationship and interaction between media bias and factuality,utilizing this relationship to assist in the prediction of profiling results.Our approach extracts features independently while aligning and fusing them through recursive convolu-tion and attention mechanisms,thus harnessing multi-scale interactive information across different dimensions and levels.This method improves the effectiveness of news media evaluation.Experimental results indicate that our proposed framework significantly outperforms existing methods,achieving the best performance in Accuracy and F1 score,improving by at least 1%compared to other methods.This paper further analyzes and discusses based on the experimental results.展开更多
Evidential Document-level Event Factuality Identification(EvDEFI)aims to predict the factual nature of an event and extract evidential sentences from the document precisely.Previous work usually limited to only predic...Evidential Document-level Event Factuality Identification(EvDEFI)aims to predict the factual nature of an event and extract evidential sentences from the document precisely.Previous work usually limited to only predicting the factuality of an event with respect to a document,and neglected the interpretability of the task.As a more fine-grained and interpretable task,EvDEFI is still in the early stage.The existing model only used shallow similarity calculation to extract evidences,and employed simple attentions without lexical features,which is quite coarse-grained.Therefore,we propose a novel EvDEFI model named Heterogeneous and Extractive Graph Attention Network(HEGAT),which can update representations of events and sentences by multi-view graph attentions based on tokens and various lexical features from both local and global levels.Experiments on EB-DEF-v2 corpus demonstrate that HEGAT model is superior to several competitive baselines and can validate the interpretability of the task.展开更多
近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucinati...近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucination)问题引起了学术界和工业界的广泛关注。如何有效检测大语言模型幻觉,成为确保其在文本生成等下游任务可靠、安全、可信应用的关键挑战。该研究着重对大语言模型幻觉检测方法进行综述:首先,介绍了大语言模型概念,进一步明确了幻觉的定义与分类,系统梳理了大语言模型从构建到部署应用全生命周期各环节的特点,并深入分析了幻觉的产生机制与诱因;其次,立足于实际应用需求,考虑到在不同任务场景下模型透明度的差异等因素,将幻觉检测方法划分为针对白盒模型和黑盒模型2类,并进行了重点梳理和深入对比;而后,分析总结了现阶段主流的幻觉检测基准,为后续开展幻觉检测奠定基础;最后,指出了大语言模型幻觉检测的各种潜在研究方法和新的挑战。展开更多
This paper focuses on document-level event factuality identification (DEFI), which predicts the factual nature of an event from the view of a document. As the document-level sub-task of event factuality identification...This paper focuses on document-level event factuality identification (DEFI), which predicts the factual nature of an event from the view of a document. As the document-level sub-task of event factuality identification (EFI), DEFI is a challenging and fundamental task in natural language processing (NLP). Currently, most existing studies focus on sentence-level event factuality identification (SEFI). However, DEFI is still in the early stage and related studies are quite limited. Previous work is heavily dependent on various NLP tools and annotated information, e.g., dependency trees, event triggers, speculative and negative cues, and does not consider filtering irrelevant and noisy texts that can lead to wrong results. To address these issues, this paper proposes a reinforced multi-granularity hierarchical network model: Reinforced Semantic Learning Network (RSLN), which means it can learn semantics from sentences and tokens at various levels of granularity and hierarchy. Since integrated with hierarchical reinforcement learning (HRL), the RSLN model is able to select relevant and meaningful sentences and tokens. Then, RSLN encodes the event and document according to these selected texts. To evaluate our model, based on the DLEF (Document-Level Event Factuality) corpus, we annotate the ExDLEF corpus as the benchmark dataset. Experimental results show that the RSLN model outperforms several state-of-the-arts.展开更多
基金funded by“the Fundamental Research Funds for the Central Universities”,No.CUC23ZDTJ005.
文摘News media profiling is helpful in preventing the spread of fake news at the source and maintaining a good media and news ecosystem.Most previous works only extract features and evaluate media from one dimension independently,ignoring the interconnections between different aspects.This paper proposes a novel news media bias and factuality profiling framework assisted by correlated features.This framework models the relationship and interaction between media bias and factuality,utilizing this relationship to assist in the prediction of profiling results.Our approach extracts features independently while aligning and fusing them through recursive convolu-tion and attention mechanisms,thus harnessing multi-scale interactive information across different dimensions and levels.This method improves the effectiveness of news media evaluation.Experimental results indicate that our proposed framework significantly outperforms existing methods,achieving the best performance in Accuracy and F1 score,improving by at least 1%compared to other methods.This paper further analyzes and discusses based on the experimental results.
基金supported by the National Natural Science Foundation of China(NSFC)(Grant Nos.62006167 and 62276177)the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD).
文摘Evidential Document-level Event Factuality Identification(EvDEFI)aims to predict the factual nature of an event and extract evidential sentences from the document precisely.Previous work usually limited to only predicting the factuality of an event with respect to a document,and neglected the interpretability of the task.As a more fine-grained and interpretable task,EvDEFI is still in the early stage.The existing model only used shallow similarity calculation to extract evidences,and employed simple attentions without lexical features,which is quite coarse-grained.Therefore,we propose a novel EvDEFI model named Heterogeneous and Extractive Graph Attention Network(HEGAT),which can update representations of events and sentences by multi-view graph attentions based on tokens and various lexical features from both local and global levels.Experiments on EB-DEF-v2 corpus demonstrate that HEGAT model is superior to several competitive baselines and can validate the interpretability of the task.
文摘近年来,大语言模型(large language models,LLMs)在自然语言处理(natural language processing,NLP)等领域取得了显著进展,展现出强大的语言理解与生成能力。然而,在实际应用过程中,大语言模型仍然面临诸多挑战。其中,幻觉(hallucination)问题引起了学术界和工业界的广泛关注。如何有效检测大语言模型幻觉,成为确保其在文本生成等下游任务可靠、安全、可信应用的关键挑战。该研究着重对大语言模型幻觉检测方法进行综述:首先,介绍了大语言模型概念,进一步明确了幻觉的定义与分类,系统梳理了大语言模型从构建到部署应用全生命周期各环节的特点,并深入分析了幻觉的产生机制与诱因;其次,立足于实际应用需求,考虑到在不同任务场景下模型透明度的差异等因素,将幻觉检测方法划分为针对白盒模型和黑盒模型2类,并进行了重点梳理和深入对比;而后,分析总结了现阶段主流的幻觉检测基准,为后续开展幻觉检测奠定基础;最后,指出了大语言模型幻觉检测的各种潜在研究方法和新的挑战。
文摘为解决标准检索增强生成(retrieval-augmented generation,RAG)模型在处理国防白皮书等高权威性、结构化文本时存在的语义偏差、上下文信息丢失及知识安全风险等问题,提出一种融合结构感知与交叉验证的检索增强生成方法(structure-aware and verification-enhanced RAG,SV-RAG)。首先,在知识库构建阶段,采用“结构化切块”方法,在切分文本的同时保留并利用文档的原始层次结构元数据。其次,在知识检索阶段,设计了双轨混合检索机制,针对权威知识库,提出一种查询感知的自适应加权重排序算法,融合文本相似度与元数据匹配度以精确捕捉上下文内涵;针对外部检索信息,则通过“双流交叉核查”流程,以权威知识为基准验证并整合外部信息。最后,在问答生成阶段,设计了“原文引用与精准应答”生成范式,通过提示工程强制模型先引用核心原文再进行阐述,以保障答案的忠实度。为验证该方法的有效性,构建了包含494对高质量问答对的中国国防安全知识问答数据集,在该数据集上的综合实验结果表明:所提出的SV-RAG架构在忠实度和语境准确性等核心指标上表现显著,相较于传统RAG基线,性能分别提升28.0%和28.2%。
基金supported by the National Natural Science Foundation of China under Grant Nos.62006167,62276177,62376181,and 62376178the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No.24KJB520036the Project Funded by the Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institutions.
文摘This paper focuses on document-level event factuality identification (DEFI), which predicts the factual nature of an event from the view of a document. As the document-level sub-task of event factuality identification (EFI), DEFI is a challenging and fundamental task in natural language processing (NLP). Currently, most existing studies focus on sentence-level event factuality identification (SEFI). However, DEFI is still in the early stage and related studies are quite limited. Previous work is heavily dependent on various NLP tools and annotated information, e.g., dependency trees, event triggers, speculative and negative cues, and does not consider filtering irrelevant and noisy texts that can lead to wrong results. To address these issues, this paper proposes a reinforced multi-granularity hierarchical network model: Reinforced Semantic Learning Network (RSLN), which means it can learn semantics from sentences and tokens at various levels of granularity and hierarchy. Since integrated with hierarchical reinforcement learning (HRL), the RSLN model is able to select relevant and meaningful sentences and tokens. Then, RSLN encodes the event and document according to these selected texts. To evaluate our model, based on the DLEF (Document-Level Event Factuality) corpus, we annotate the ExDLEF corpus as the benchmark dataset. Experimental results show that the RSLN model outperforms several state-of-the-arts.