People read many things online these days.It's an easy way to get a lot of information fast.They look at news,see posts and watch videos.But how much of the information is true?Some things online are fake.So it...People read many things online these days.It's an easy way to get a lot of information fast.They look at news,see posts and watch videos.But how much of the information is true?Some things online are fake.So it's important to check the facts before you believe or share anything.You can ask people or look at other sources first.Check newspapers or official websites.Always think carefully before you believe something online.展开更多
Claim verification is the task of determining whether a claim is supported or refuted by evidence.Self-improvement methods,where reasoning chains are generated and those leading to correct results are selected for tra...Claim verification is the task of determining whether a claim is supported or refuted by evidence.Self-improvement methods,where reasoning chains are generated and those leading to correct results are selected for training,have succeeded in tasks such as mathematical problem solving.However,in claim verification,this approach struggles.Low-quality reasoning chains may falsely match binary truth labels,introducing faulty reasoning into the self-improvement process and ultimately degrading performance.To address this,we propose STRIVE:Structured reasoning for self-improved verification.Our method introduces a structured reasoning design with claim decomposition,entity analysis,and evidence grounding verification.These components improve reasoning quality,reduce errors,and provide additional supervision signals for self-improvement.STRIVE begins with a warm-up phase,where the base model is fine-tuned on a small number of annotated examples to learn the structured reasoning design.It is then applied to generate reasoning chains for all training examples,selecting only those that are correct and structurally sound for subsequent self-improvement training.We demonstrate that STRIVE achieves significant improvements over the baseline models,with a 21.9%performance gain over the normal self-improvement method on the HOVER datasets,highlighting its effectiveness.展开更多
文摘People read many things online these days.It's an easy way to get a lot of information fast.They look at news,see posts and watch videos.But how much of the information is true?Some things online are fake.So it's important to check the facts before you believe or share anything.You can ask people or look at other sources first.Check newspapers or official websites.Always think carefully before you believe something online.
基金supported by National Natural Science Foundation of China(Nos.62372454,62141608 and 62576339).
文摘Claim verification is the task of determining whether a claim is supported or refuted by evidence.Self-improvement methods,where reasoning chains are generated and those leading to correct results are selected for training,have succeeded in tasks such as mathematical problem solving.However,in claim verification,this approach struggles.Low-quality reasoning chains may falsely match binary truth labels,introducing faulty reasoning into the self-improvement process and ultimately degrading performance.To address this,we propose STRIVE:Structured reasoning for self-improved verification.Our method introduces a structured reasoning design with claim decomposition,entity analysis,and evidence grounding verification.These components improve reasoning quality,reduce errors,and provide additional supervision signals for self-improvement.STRIVE begins with a warm-up phase,where the base model is fine-tuned on a small number of annotated examples to learn the structured reasoning design.It is then applied to generate reasoning chains for all training examples,selecting only those that are correct and structurally sound for subsequent self-improvement training.We demonstrate that STRIVE achieves significant improvements over the baseline models,with a 21.9%performance gain over the normal self-improvement method on the HOVER datasets,highlighting its effectiveness.