期刊文献+

基于大语言模型的BCC语料库自然语言检索

Natural Language Retrieval of BCC Corpus Based on Large Language Models
在线阅读 下载PDF
导出
摘要 语料库在语言学和自然语言处理领域至关重要。北京语言大学的BCC语料库,资源丰富且检索高效,备受推崇。然而,BCC检索式的复杂性限制了其普及。为此,本文提出TextToBCC模型,目标是实现自然语言对BCC语料库的检索。本文首先构建了一个均衡的BCC检索式数据集,利用大语言模型为BCC检索式生成自然语言描述。其次,微调大语言模型使其能够支持自然语言到BCC检索式的转换。实验结果证明了TextToBCC模型的优异性能。这一成果不仅降低了BCC语料库的使用难度,而且有助于促进其在更广泛领域的传播和应用,为语言学研究和自然语言处理实践带来便利。 Corpora play a vital role in the fields of linguistics and natural language processing.The BCC corpus developed by Beijing Language and Culture University is known for its rich resources and efficient retrieval capabilities.However,the complexity of its search query language limits its accessibility and widespread use.To address this issue,this paper introduces the TextToBCC model,which enables natural language retrieval over the BCC corpus.A balanced dataset of BCC search queries was first constructed,and corresponding natural language descriptions were generated using a large language model.The model was then fine-tuned to support the conversion from natural language to BCC search queries.Experimental results demonstrate the strong performance of the proposed TextToBCC model.This work not only reduces the learning curve associated with using the BCC corpus but also promotes its broader dissemination and application,facilitating research and development in linguistics and natural language processing.
作者 刘廷超 鲁鹿鸣 荀恩东 靳泽莹 杨兆勇 Tingchao Liu;Luming Lu;Endong Xun;Zeying Jin;Zhaoyong Yang
机构地区 北京语言大学
出处 《语料库语言学》 2025年第1期1-16,共16页 Corpus Linguistics
关键词 语料库 检索式 大语言模型 微调 corpus search query large language model fine-tuning
  • 相关文献

参考文献7

二级参考文献38

共引文献1508

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部