期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
CMGBench:Benchmarking Chinese Metaphor Generation for Large Language Models
1
作者 Yan Liu Renren Jin +1 位作者 Tianhao Shen Deyi Xiong 《Data Intelligence》 2025年第4期1270-1290,共21页
Can large language models(LLMs)generate metaphor expressions as humans do?To address this question,we present CMGBench,a Chinese Metaphor Generation Benchmark specifically designed to evaluate the ability of LLMs to g... Can large language models(LLMs)generate metaphor expressions as humans do?To address this question,we present CMGBench,a Chinese Metaphor Generation Benchmark specifically designed to evaluate the ability of LLMs to generate metaphors.CMGBench offers a high-quality dataset comprising 810 examples,3,354 annotations,and two types of metaphor expressions:direct metaphor expressions and implicit metaphor expressions,the latter of which has received limited attention in previous research.To assess the quality of metaphors generated by LLMs,we introduce three evaluation criteria.The first criterion measures the disparity between the vehicles in the LLM-generated metaphors and those used by humans.The second criterion evaluates whether an LLM-generated metaphor contains unconventional semantic collocations.The third criterion calculates the proportion of implicit metaphor expressions within the LLM-generated metaphors.We conducted extensive experiments on both proprietary and open-source LLMs.The results demonstrate that,compared to human-generated metaphors,LLM-generated metaphors display a lack of variety in using the attributes of a vehicle,show limited innovation in semantic collocation,and tend to use direct expressions.Even top-performing LLMs such as GPT-4,when not explicitly prompted to generate implicit metaphors,produced them at a rate of only 23.9%.This highlights a significant gap between human and LLM capabilities in metaphor generation. 展开更多
关键词 Large language model Benchmark for large language models Metaphor generation Metaphor datasets Metaphor generation evaluation
原文传递
Transforming Healthcare with State-of-the-Art Medical-LLMs:A Comprehensive Evaluation of Current Advances Using Benchmarking Framework
2
作者 Himadri Nath Saha Dipanwita Chakraborty Bhattacharya +5 位作者 Sancharita Dutta Arnab Bera Srutorshi Basuray Satyasaran Changdar Saptarshi Banerjee Jon Turdiev 《Computers, Materials & Continua》 2026年第2期234-289,共56页
The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decis... The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature. 展开更多
关键词 Medical large language models(Med-LLM) AI in healthcare natural language processing(NLP)in medicine fine-tuning medical LLMs retrieval-augmented generation(RAG)in medicine multi-modal learning in healthcare explainability and transparency in medical AI FDA regulations for AI in medicine evaluation and benchmarking of medical large language models
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部