摘要
API(Application Programming Interface)使用约束是开发者在调用API时必须遵守的条件或限制,以确保正确使用并避免API误用。API文档是提取这些约束的重要来源。现有的基于自然语言处理(NLP)的API使用约束提取方法通常依赖于句法模式,但对复杂并列句的处理能力有限,且对语法模式要求严格。为此,提出一种基于大语言模型(LLM)的API使用约束知识提取方法,记为AUCK。AUCK首先对Java API文档进行预处理,提取包含API使用约束的句子;其次,总结并列句的句法模式并设计相应案例,指导LLM将并列句拆分为简单句;最后,针对简单句总结出三元组句法模式,并设计案例指导LLM提取API使用约束三元组。实验结果表明,在Java API文档上,AUCK的准确率和召回率分别达到92.23%和93.14%,显著优于现有方法DRONE(准确率为80.61%,召回率为86.81%)、主流三元组提取工具OpenIE(准确率为76.92%,召回率为52.63%)以及大语言模型ChatGPT-3.5(准确率为82.23%,召回率为67.71%)。此外,将AUCK应用于Android和Python API文档的实验结果验证了其良好的迁移能力。
Application Programming Interface(API)usage constraints are the conditions or restrictions that developers must follow when invoking APIs to ensure correct usage and prevent misuse.API documentation is an important tool for extracting these constraints.Existing Natural Language Processing(NLP)-based methods for extracting API usage constraints often rely on syntactic patterns,but their ability to handle complex coordinated sentences and impose strict requirements on syntactic structures is limited.To address these issues,this paper proposes an API usage constraint knowledge extraction method based on Large Language Model(LLM),referred to as AUCK.AUCK first preprocesses Java API documentation and extracts sentences containing API usage constraints.It then summarizes the syntactic patterns of coordinated sentences and designs corresponding cases to guide a LLM to decompose coordinated sentences into simple sentences.Finally,it summarizes the syntactic patterns of triplets and design cases to guide the LLM in extracting API usage constraint triplets.Experimental results on Java API documentation show that AUCK achieves an accuracy of 92.23%and recall of 93.14%,significantly outperforming existing methods,including DRONE(accuracy:80.61%,recall:86.81%),the mainstream triplet extraction tool OpenIE(accuracy:76.92%,recall:52.63%),and the large language model ChatGPT-3.5(accuracy:82.23%,recall:67.71%).In addition,the application of AUCK to Android and Python API documentation verifies its good transferability.
作者
刘根壕
张能
郑子彬
LIU Genhao;ZHANG Neng;ZHENG Zibin(School of Software Engineering,Sun Yat-sen University,Zhuhai 519082,Guangdong,China;School of Computer Science,Central China Normal University,Wuhan 430079,Hubei,China)
出处
《计算机工程》
北大核心
2025年第8期74-85,共12页
Computer Engineering
基金
国家自然科学基金(62302536,62032025)
广东省基础与应用基础研究基金(2023A1515012292)。
关键词
Java
API文档
API使用约束
大语言模型
并列句拆解
三元组提取
知识提取
Java API documentation
API usage constraint
Large Language Model(LLM)
parallel sentence decomposition
triplet extraction
knowledge extraction