摘要
主要研究题文不符的社交推文检测任务.这些推文往往通过欺骗性的标题或封面图来误导读者点击与之无关的低质内容,以让其广泛传播和带来点击量等商业利益.为了规避检测,恶意的创作者还会使用各种窍门将题文不符的推文伪装成合法的,譬如添加无关易混淆的合法内容来干扰检测器.检测这种推文需要对细节反复推敲,甚至还要借助外部的常识进行多步推理验证.然而,传统方法一般把推文看成是一堆词语符号并简单灌入神经网络做分类,忽略对其内在隐含的虚假细节进行分析,导致漏判和误判.而且这种黑盒子般的模型缺乏可解释性.为了解决这些问题,提出一种问答引导的新检测器,通过质疑-验证的方式对细节逐一分析,以发现潜在的不一致和虚假点.首先利用多模态检索增强技术提取推文中的细节点,然后通过提问的方式来质疑每个点.为了充分验证事实和其复杂关系,不仅覆盖简单的浅层匹配提问,还有深层次常识推理的高阶提问.每个提问可以从推文中找到字面答案.但是该答案可能是虚构和不准确的.为此,通过开放域的问答模型借助外部知识源来交叉验证,推导出相对可信的答案.当两个答案不同时,推文很可能存在虚假内容.这种不一致可以作为有效的特征,并与其他多模态的语义特征结合,以提高检测模型的判别能力和鲁棒性.此外,这可以把复杂的检测任务分解为一系列问答步骤,便于找出不一致细节来解释引起题文不符的原因.在3个主流数据集上做了充分的实验,验证了该方法的有效性.
This study investigates the task of clickbait detection in social media posts.These posts often employ deceptive headlines or thumbnails to mislead readers into clicking on irrelevant or undesirable content,thus enabling widespread dissemination and generating commercial benefits such as increased clicks.To evade detection,malicious creators frequently disguise clickbait posts as legitimate ones,using techniques such as adding irrelevant or misleading content to deceive the detector.Detecting such posts requires a detailed analysis and complex multi-step reasoning using commonsense knowledge to identify inconsistencies.However,existing methods typically treat a post as a simple text span and feed it into a neural network for classification,neglecting the analysis of inherent false details,which leads to misjudgments.Moreover,these black-box models lack explainability.To address this issue,a new question-guided detector is proposed,which systematically analyzes the details through a doubt-then-verify approach to uncover potential inconsistencies and falsehoods.Specifically,a multi-modal retrieval-augmented technique is used to extract detailed clues from the content of the post,followed by questioning each clue.To ensure thorough verification of facts and their complex relationships,both simple matching questions and deep commonsense reasoning questions with varying levels of complexity are employed.Each question yields a plausible answer from the post,but the answer may be fabricated or inaccurate.Therefore,an open-domain QA model is utilized for cross-verification,leveraging external knowledge to derive a more reliable answer.When discrepancies are found between answers,the post is likely to contain false content.This inconsistency serves as a valuable feature and can be combined with other multi-modal features indicative of clickbait,improving the discriminative power of the detection model.By breaking down the complex clickbait detection task into a series of question-guided verification steps,inconspicuous inconsistencies can be identified to explain the underlying reasons for clickbait.Extensive experiments on three popular datasets demonstrate the effectiveness of the proposed approach.
作者
余建兴
王世祺
陈祺
赖韩江
饶洋辉
苏勤亮
印鉴
YU Jian-Xing;WANG Shi-Qi;CHEN Qi;LAI Han-Jiang;RAO Yang-Hui;SU Qin-Liang;YIN Jian(School of Artificial Intelligence,Sun Yat-sen University,Zhuhai 519082,China;School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China;Key Laboratory of Intelligent Assessment Technology for Sustainable Tourism,Ministry of Culture and Tourism(Sun Yat-sen University),Zhuhai 510006,China;Guangdong Artificial Intelligence and Digital Economy Laboratory(Guangzhou),Guangzhou 510330,China)
出处
《软件学报》
北大核心
2025年第12期5720-5738,共19页
Journal of Software
基金
国家自然科学基金(62276279,62372483,62276280,U2001211,U22B2060)
广东省基础与应用基础研究基金(2024B1515020032)
广州市科技计划(2023B01J0001,2024B01W0004)。
关键词
题文不符检测
常识推理
提问生成
clickbait detection
commonsense reasoning
question generation