摘要
【目的】基于大语言模型技术进行法律文本的自动摘要,解决传统方法长文本处理能力弱、摘要逻辑性不强等问题。【方法】提出一种基于大语言模型微调的法律文本自动摘要方法。首先,构建一套法律文本摘要指令数据集。其次,探索指令增强和结果增强两种数据增强方式。最后,对预训练模型进行领域化微调,并对结果进行多维度评价。【结果】在CAIL2020司法摘要数据集上,本文方法在ROUGE-1、ROUGE-2和ROUGE-L的F1指标上分别比最好的基准结果增长13.8、21.3和7.4个百分点。在人工评估和智能评估方面的结果也进一步证明了本文方法在各个维度的有效性。【局限】在处理专业术语密集和逻辑结构复杂的法律文本时,生成的摘要在细节和法律条款的准确性上仍存在不足。【结论】基于大语言模型微调可有效提升法律文本的摘要水平。
[Objective]This study uses large language model technology to automatically summarise legal texts.This addresses issues associated with traditional methods,such as the inadequate handling of lengthy texts and weak logical coherence in summaries.[Methods]This study proposes a method of automatically summarising legal texts based on the fine-tuning of large language models for specific domains.Firstly,a legal text summarisation instruction dataset is constructed.Secondly,two data augmentation strategies are explored:instruction augmentation and result augmentation.Finally,the study will perform domain-specific fine-tuning on a pre-trained model and conduct a multi-dimensional evaluation of the results.[Results]On the CAIL2020 Judicial Summary Dataset,our method achieves improvements of 13.8,21.3,and 7.4 percentage points in the ROUGE-1,ROUGE-2,and ROUGE-L F1 scores,respectively,compared to the best baseline methods.Both human and automated evaluations further validate the effectiveness of our approach across multiple dimensions.[Limitations]When processing legal texts that are dense with technical terms and complex logical structures,the generated summaries still lack detail accuracy and precision with regard to legal provisions.[Conclusions]Fine-tuning large language models for specific domains can effectively improve the quality of legal text summarisation.
作者
朱丹浩
黄肖宇
李堯霖
王东波
Zhu Danhao;Huang Xiaoyu;Li Yaolin;Wang Dongbo(Department of Criminal Science and Technology,Jiangsu Police Institute,Nanjing 210031,China;Department of Computer Information and Network Security,Jiangsu Police Institute,Nanjing 210031,China;Department of Information Management and Information Systems,Nanjing University of Science and Technology,Nanjing 210094,China;School of Information Management,Nanjing Agricultural University,Nanjing 210095,China)
出处
《数据分析与知识发现》
北大核心
2025年第6期35-46,共12页
Data Analysis and Knowledge Discovery
基金
国家社会科学基金项目(项目编号:21&ZD331)
江苏高校“青蓝工程”的研究成果之一
关键词
法律文本
自动摘要技术
大语言模型
指令数据集
领域化微调
Legal Texts
Automatic Summarization Techniques
Large Language Models
Instruction Dataset
Domain-Specific Fine-Tuning