Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the...Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the popularities of the topics or codes over time.Although it is simple and effective,the taxonomies are difficult to manage because new technologies are introduced rapidly.Therefore,recent studies exploit deep learning to extract pre-defined targets such as problems and solutions.Based on the recent advances in question answering(QA)using deep learning,we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports.With the previous research,we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction(IE)for sentence-level benchmark dataset.After investigating the characteristics of Korean R&D,we propose a model to deal with multiple and repeated appearances of targets in the reports.Accordingly,we propose a model that includes an algorithm with two novel modules and a prompt.A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge.We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.展开更多
该文提出了一种高效评测中文大语言模型(LLM)指令遵循能力和多轮对话能力的方法,并构建了中文多轮指令遵循基准(Chinese Multiturn Instruction Following Benchmark,CMIF)。该文研究设计了专门针对中文的原子指令数据集,涵盖语言结构...该文提出了一种高效评测中文大语言模型(LLM)指令遵循能力和多轮对话能力的方法,并构建了中文多轮指令遵循基准(Chinese Multiturn Instruction Following Benchmark,CMIF)。该文研究设计了专门针对中文的原子指令数据集,涵盖语言结构、拼音、音调等特性,并结合规则与LLM对多轮问题的合法性进行复查,确保评测结果的准确性。在实验中,选取了包括GPT4o和Qwen2.5-72B-Instruct在内的14个开源及闭源模型进行评估。结果显示,主流模型在单轮对话场景中具有较好的指令遵循能力,但多轮对话表现仍有较大提升空间。其中,单轮指令级准确率最高的Claude-3.5-Sonnet在多轮场景下准确率从73.8%下降至40.0%。此外,这些模型在处理中文原子指令时表现出明显的性能下降,中文任务的综合准确率最高仅为51.0%,显著低于其他四类任务平均79.0%的综合准确率。展开更多
基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(NRF-2019R1G1A1003312)the Ministry of Education(NRF-2021R1I1A3052815).
文摘Analyzing Research and Development(R&D)trends is important because it can influence future decisions regarding R&D direction.In typical trend analysis,topic or technology taxonomies are employed to compute the popularities of the topics or codes over time.Although it is simple and effective,the taxonomies are difficult to manage because new technologies are introduced rapidly.Therefore,recent studies exploit deep learning to extract pre-defined targets such as problems and solutions.Based on the recent advances in question answering(QA)using deep learning,we adopt a multi-turn QA model to extract problems and solutions from Korean R&D reports.With the previous research,we use the reports directly and analyze the difficulties in handling them using QA style on Information Extraction(IE)for sentence-level benchmark dataset.After investigating the characteristics of Korean R&D,we propose a model to deal with multiple and repeated appearances of targets in the reports.Accordingly,we propose a model that includes an algorithm with two novel modules and a prompt.A newly proposed methodology focuses on reformulating a question without a static template or pre-defined knowledge.We show the effectiveness of the proposed model using a Korean R&D report dataset that we constructed and presented an in-depth analysis of the benefits of the multi-turn QA model.
文摘该文提出了一种高效评测中文大语言模型(LLM)指令遵循能力和多轮对话能力的方法,并构建了中文多轮指令遵循基准(Chinese Multiturn Instruction Following Benchmark,CMIF)。该文研究设计了专门针对中文的原子指令数据集,涵盖语言结构、拼音、音调等特性,并结合规则与LLM对多轮问题的合法性进行复查,确保评测结果的准确性。在实验中,选取了包括GPT4o和Qwen2.5-72B-Instruct在内的14个开源及闭源模型进行评估。结果显示,主流模型在单轮对话场景中具有较好的指令遵循能力,但多轮对话表现仍有较大提升空间。其中,单轮指令级准确率最高的Claude-3.5-Sonnet在多轮场景下准确率从73.8%下降至40.0%。此外,这些模型在处理中文原子指令时表现出明显的性能下降,中文任务的综合准确率最高仅为51.0%,显著低于其他四类任务平均79.0%的综合准确率。