Numerical Constraint-Aware Dense Retrieval with Two-Phase Contrastive Learning

导出

摘要 Retrieval-Augmented Generation(RAG)enhances Large Language Models(LLMs)by integrating external knowledge,leading to significant improvements in both factual accuracy and task performance.However,existing dense retrievers face considerable challenges when handling numerical constraints,particularly in queries requiring precise filtering conditions.To systematically explore these issues,we introduce Numerical Constraint Question(NumConQ),a comprehensive multi-domain benchmark dataset that contains more than 6500 queries covering healthcare,finance,education,sports,and movies.Empirical analysis reveals that state-of-the-art dense retrievers achieve only 16.3%accuracy in numerical constraint satisfaction,significantly underperforming relative to their semantic matching capabilities.To address these limitations,we propose Numerical Constraint-aware Retriever(NC-Retriever),which features:(1)a two-phase contrastive learning framework that combines in-batch negative samplings with progressively introduced hard negatives,and(2)a hybrid numerical representation scheme for consistent tokenization.Extensive experiments show that NC-Retriever achieves a relative improvement of 65.84%in recall@10 and a 78.28%increase in precision@10 compared to current state-of-the-art methods.The code and benchmark dataset are available at https://github.com/Tongji-KGLLM/NumConQ.

作者 Meng Wang Yisong Wang Feifan Wu

机构地区 College of Design and Innovation

出处《Big Data Mining and Analytics》 2026年第2期341-359,共19页 大数据挖掘与分析(英文)

基金 supported by the National Natural Science Foundation of China(Nos.62276063,U23B2057,and 62176185) the Natural Science Foundation of Jiangsu Province(No.BK20221457) the Natural Science Foundation of Beijing Municipality(No.L247008) the Tongji University Innovative Design and Intelligent Manufacturing Discipline Group Project,Tongji University Construction Project of the National Artificial Intelligence Industry-Academia Collaborative Innovation Platform Tongji University 2023 Interdisciplinary Collaborative Research Project.

关键词 dense retriever benchmark dataset contrastive learning numerical constraint query Retrieval-augmented Generation(RAG)

分类号 TP391.3 [自动化与计算机技术]

Big Data Mining and Analytics

2026年第2期

浏览历史

内容加载中请稍等...

Numerical Constraint-Aware Dense Retrieval with Two-Phase Contrastive Learning

相关作者

相关机构

相关主题

浏览历史