期刊文献+
共找到2,969篇文章
< 1 2 149 >
每页显示 20 50 100
Large language models for robotics:Opportunities,challenges,and perspectives 被引量:3
1
作者 Jiaqi Wang Enze Shi +7 位作者 Huawen Hu Chong Ma Yiheng Liu Xuhui Wang Yincheng Yao Xuan Liu Bao Ge Shu Zhang 《Journal of Automation and Intelligence》 2025年第1期52-64,共13页
Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and langua... Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction. 展开更多
关键词 large language models ROBOTICS Generative AI Embodied intelligence
在线阅读 下载PDF
Evaluating research quality with Large Language Models:An analysis of ChatGPT’s effectiveness with different settings and inputs 被引量:1
2
作者 Mike Thelwall 《Journal of Data and Information Science》 2025年第1期7-25,共19页
Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether ... Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether Large Language Models(LLMs)can play a role in this process.Design/methodology/approach:This article assesses which ChatGPT inputs(full text without tables,figures,and references;title and abstract;title only)produce better quality score estimates,and the extent to which scores are affected by ChatGPT models and system prompts.Findings:The optimal input is the article title and abstract,with average ChatGPT scores based on these(30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores,the highest ever reported.ChatGPT 4o is slightly better than 3.5-turbo(0.66),and 4o-mini(0.66).Research limitations:The data is a convenience sample of the work of a single author,it only includes one field,and the scores are self-evaluations.Practical implications:The results suggest that article full texts might confuse LLM research quality evaluations,even though complex system instructions for the task are more effective than simple ones.Thus,whilst abstracts contain insufficient information for a thorough assessment of rigour,they may contain strong pointers about originality and significance.Finally,linear regression can be used to convert the model scores into the human scale scores,which is 31%more accurate than guessing.Originality/value:This is the first systematic comparison of the impact of different prompts,parameters and inputs for ChatGPT research quality evaluations. 展开更多
关键词 ChatGPT large language models LLMs SCIENTOMETRICS Research Assessment
在线阅读 下载PDF
On large language models safety,security,and privacy:A survey 被引量:1
3
作者 Ran Zhang Hong-Wei Li +2 位作者 Xin-Yuan Qian Wen-Bo Jiang Han-Xiao Chen 《Journal of Electronic Science and Technology》 2025年第1期1-21,共21页
The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.De... The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.Despite their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant challenges.These challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy leakage.Previous works often conflated safety issues with security concerns.In contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of LLMs.Building on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in LLMs.Additionally,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats. 展开更多
关键词 large language models Privacy issues Safety issues Security issues
在线阅读 下载PDF
When Software Security Meets Large Language Models:A Survey 被引量:1
4
作者 Xiaogang Zhu Wei Zhou +3 位作者 Qing-Long Han Wanlun Ma Sheng Wen Yang Xiang 《IEEE/CAA Journal of Automatica Sinica》 2025年第2期317-334,共18页
Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, ... Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, software testing and analysis are two of the critical methods, which significantly benefit from the advancements in deep learning technologies. Due to the successful use of deep learning in software security, recently,researchers have explored the potential of using large language models(LLMs) in this area. In this paper, we systematically review the results focusing on LLMs in software security. We analyze the topics of fuzzing, unit test, program repair, bug reproduction, data-driven bug detection, and bug triage. We deconstruct these techniques into several stages and analyze how LLMs can be used in the stages. We also discuss the future directions of using LLMs in software security, including the future directions for the existing use of LLMs and extensions from conventional deep learning research. 展开更多
关键词 large language models(LLMs) software analysis software security software testing
在线阅读 下载PDF
Evaluating large language models as patient education tools for inflammatory bowel disease:A comparative study 被引量:1
5
作者 Yan Zhang Xiao-Han Wan +6 位作者 Qing-Zhou Kong Han Liu Jun Liu Jing Guo Xiao-Yun Yang Xiu-Li Zuo Yan-Qing Li 《World Journal of Gastroenterology》 2025年第6期34-43,共10页
BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patie... BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided. 展开更多
关键词 Inflammatory bowel disease large language models Patient education Medical information accuracy Readability assessment
暂未订购
The Security of Using Large Language Models:A Survey With Emphasis on ChatGPT 被引量:1
6
作者 Wei Zhou Xiaogang Zhu +4 位作者 Qing-Long Han Lin Li Xiao Chen Sheng Wen Yang Xiang 《IEEE/CAA Journal of Automatica Sinica》 2025年第1期1-26,共26页
ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential sec... ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential security risks that need to be carefully evaluated and addressed. In this survey, we provide an overview of the current state of research on security of using ChatGPT, with aspects of bias, disinformation, ethics, misuse,attacks and privacy. We review and discuss the literature on these topics and highlight open research questions and future directions.Through this survey, we aim to contribute to the academic discourse on AI security, enriching the understanding of potential risks and mitigations. We anticipate that this survey will be valuable for various stakeholders involved in AI development and usage, including AI researchers, developers, policy makers, and end-users. 展开更多
关键词 Artificial intelligence(AI) ChatGPT large language models(LLMs) SECURITY
在线阅读 下载PDF
Rethinking Chart Understanding Using Multimodal Large Language Models
7
作者 Andreea-Maria Tanasa Simona-Vasilica Oprea 《Computers, Materials & Continua》 2025年第8期2905-2933,共29页
Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing... Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities. 展开更多
关键词 Chart understanding large language models multimodal models PDF extraction
在线阅读 下载PDF
A Critical Review of Methods and Challenges in Large Language Models
8
作者 Milad Moradi Ke Yan +2 位作者 David Colwell Matthias Samwald Rhona Asgari 《Computers, Materials & Continua》 2025年第2期1681-1698,共18页
This critical review provides an in-depth analysis of Large Language Models(LLMs),encompassing their foundational principles,diverse applications,and advanced training methodologies.We critically examine the evolution... This critical review provides an in-depth analysis of Large Language Models(LLMs),encompassing their foundational principles,diverse applications,and advanced training methodologies.We critically examine the evolution from Recurrent Neural Networks(RNNs)to Transformer models,highlighting the significant advancements and innovations in LLM architectures.The review explores state-of-the-art techniques such as in-context learning and various fine-tuning approaches,with an emphasis on optimizing parameter efficiency.We also discuss methods for aligning LLMs with human preferences,including reinforcement learning frameworks and human feedback mechanisms.The emerging technique of retrieval-augmented generation,which integrates external knowledge into LLMs,is also evaluated.Additionally,we address the ethical considerations of deploying LLMs,stressing the importance of responsible and mindful application.By identifying current gaps and suggesting future research directions,this review provides a comprehensive and critical overview of the present state and potential advancements in LLMs.This work serves as an insightful guide for researchers and practitioners in artificial intelligence,offering a unified perspective on the strengths,limitations,and future prospects of LLMs. 展开更多
关键词 large language models artificial intelligence natural language processing machine learning generative artificial intelligence
在线阅读 下载PDF
Adapting High-Level Language Programming(C Language)Education in the Era of Large Language Models
9
作者 Baokai Zu Hongyuan Wang +1 位作者 Hongli Chen Yafang Li 《Journal of Contemporary Educational Research》 2025年第5期264-269,共6页
With the widespread application of large language models(LLMs)in natural language processing and code generation,traditional High-Level Language Programming courses are facing unprecedented challenges and opportunitie... With the widespread application of large language models(LLMs)in natural language processing and code generation,traditional High-Level Language Programming courses are facing unprecedented challenges and opportunities.As a core programming language for computer science majors,C language remains irreplaceable due to its foundational nature and engineering adaptability.This paper,based on the rapid development of large model technologies,proposes a systematic reform design for C language teaching,focusing on teaching objectives,content structure,teaching methods,and evaluation systems.The article suggests a teaching framework centered on“human-computer collaborative programming,”integrating prompt training,AI-assisted debugging,and code generation analysis,aiming to enhance students’problem modeling ability,programming expression skills,and AI collaboration literacy. 展开更多
关键词 large language models(LLMs) High-level language programming C language Human-computer collaborative programming
在线阅读 下载PDF
Large language models in ophthalmology: a bibliometric analysis
10
作者 Ruyue Shen Eunice See Heng Lee +2 位作者 Xiaoyan Hu Clement C.Tham Carol Y.Cheung 《Eye Science》 2025年第3期222-237,共16页
Background:With the rapid development of artificial intelligence(AI),large language models(LLMs)have emerged as a potent tool for invigorating ophthalmology across clinical,educational,and research fields.Their accura... Background:With the rapid development of artificial intelligence(AI),large language models(LLMs)have emerged as a potent tool for invigorating ophthalmology across clinical,educational,and research fields.Their accuracy and reliability have undergone tested.This bibliometric analysis aims to provide an overview of research on LLMs in ophthalmology from both thematic and geographical perspectives.Methods:All existing and highly cited LLM-related ophthalmology research papers published in English up to 24th April 2025 were sourced from Scopus,PubMed,and Web of Science.The characteristics of these publications,including publication output,authors,journals,countries,institutions,citations,and research domains,were analyzed using Biblioshiny and VOSviewer software.Results:A total of 277 articles from 1,459 authors and 89 journals were included in this study.Although relevant publications began to appear in 2019,there was a significant increase starting from 2023.He M and Shi D are the most prolific authors,while Investigative Ophthalmology&Visual Science stands out as the most prominent journal.Most of the top-publishing countries are high-income economies,with the USA taking the lead,and the University of California is the leading institution.VOSviewer identified 5 clusters in the keyword co-occurrence analysis,indicating that current research focuses on the clinical applications of LLMs,particularly in diagnosis and patient education.Conclusions:While LLMs have demonstrated effectiveness in retaining knowledge,their accuracy in image-based diagnosis remains limited.Therefore,future research should investigate fine-tuning strategies and domain-specific adaptations to close this gap.Although research on the applications of LLMs in ophthalmology is still in its early stages,it holds significant potential for advancing the field. 展开更多
关键词 artificial intelligence large language models OPHTHALMOLOGY bibliometric analysis
在线阅读 下载PDF
Potential role of large language models and personalized medicine to innovate cardiac rehabilitation
11
作者 Rishith Mishra Hersh Patel +1 位作者 Aleena Jamal Som Singh 《World Journal of Clinical Cases》 2025年第19期1-4,共4页
Cardiac rehabilitation is a crucial multidisciplinary approach to improve patient outcomes.There is a growing body of evidence that suggests that these programs contribute towards reducing cardiovascular mortality and... Cardiac rehabilitation is a crucial multidisciplinary approach to improve patient outcomes.There is a growing body of evidence that suggests that these programs contribute towards reducing cardiovascular mortality and recurrence.Despite this,cardiac rehabilitation is underutilized and adherence to these programs has been a demonstrated barrier in achieving these outcomes.As a result,there is a growing focus on innovating these programs,especially from the standpoint of digital health and personalized medicine.This editorial discusses the possible roles of large language models,such as their role in ChatGPT,in further personalizing cardiac rehabilitation programs through simplifying medical jargon and employing motivational interviewing techniques,thus boosting patient engagement and adherence.However,these possibilities must be further investigated in the clinical literature.Likewise,the integration of large language models in cardiac rehabilitation will be challenging in its nascent stages to ensure accurate and ethical information delivery. 展开更多
关键词 Cardiac rehabilitation large language models Patient education Motivational interviewing Artificial intelligence
暂未订购
Quantitative Assessment of Generative Large Language Models on Design Pattern Application
12
作者 Dae-Kyoo Kim 《Computers, Materials & Continua》 2025年第3期3843-3872,共30页
Design patterns offer reusable solutions for common software issues,enhancing quality.The advent of generative large language models(LLMs)marks progress in software development,but their efficacy in applying design pa... Design patterns offer reusable solutions for common software issues,enhancing quality.The advent of generative large language models(LLMs)marks progress in software development,but their efficacy in applying design patterns is not fully assessed.The recent introduction of generative large language models(LLMs)like ChatGPT and CoPilot has demonstrated significant promise in software development.They assist with a variety of tasks including code generation,modeling,bug fixing,and testing,leading to enhanced efficiency and productivity.Although initial uses of these LLMs have had a positive effect on software development,their potential influence on the application of design patterns remains unexplored.This study introduces a method to quantify LLMs’ability to implement design patterns,using Role-Based Metamodeling Language(RBML)for a rigorous specification of the pattern’s problem,solution,and transformation rules.The method evaluates the pattern applicability of a software application using the pattern’s problem specification.If deemed applicable,the application is input to the LLM for pattern application.The resulting application is assessed for conformance to the pattern’s solution specification and for completeness against the pattern’s transformation rules.Evaluating the method with ChatGPT 4 across three applications reveals ChatGPT’s high proficiency,achieving averages of 98%in conformance and 87%in completeness,thereby demonstrating the effectiveness of the method.Using RBML,this study confirms that LLMs,specifically ChatGPT 4,have great potential in effective and efficient application of design patterns with high conformance and completeness.This opens avenues for further integrating LLMs into complex software engineering processes. 展开更多
关键词 Design patterns large language models pattern application pattern-based refactoring quantitative assessment
在线阅读 下载PDF
Large Language Models in Software Engineering Education: A Preliminary Study on Software Requirements Engineering Courses
13
作者 Feng Chen Shaomin Zhu +1 位作者 Xin Liu Ying Qian 《计算机教育》 2025年第3期24-33,共10页
The advent of large language models(LLMs)has made knowledge acquisition and content creation increasingly easier and cheaper,which in turn redefines learning and urges transformation in software engineering education.... The advent of large language models(LLMs)has made knowledge acquisition and content creation increasingly easier and cheaper,which in turn redefines learning and urges transformation in software engineering education.To do so,there is a need to understand the impact of LLMs on software engineering education.In this paper,we conducted a preliminary case study on three software requirements engineering classes where students are allowed to use LLMs to assist in their projects.Based on the students’experience,performance,and feedback from a survey conducted at the end of the courses,we characterized the challenges and benefits of applying LLMs in software engineering education.This research contributes to the ongoing discourse on the integration of LLMs in education,emphasizing both their prominent potential and the need for balanced,mindful usage. 展开更多
关键词 large language models Software engineering Software requirements engineering EDUCATION
在线阅读 下载PDF
Exploring the Empowerment of International Chinese Language Teachers through Large Language Models
14
作者 Dongqing Liu Xiaodie Liu 《Journal of Contemporary Educational Research》 2025年第7期176-182,共7页
Generative artificial intelligence,represented by large language models,holds vast application scenarios and significant development potential in the field of language teaching.This study employs large language models... Generative artificial intelligence,represented by large language models,holds vast application scenarios and significant development potential in the field of language teaching.This study employs large language models such as ChatGPT4o,ERNIE Bot,and Spark Cognition to explore how they empower teachers in international Chinese language teaching through practical cases.It focuses on various aspects of international Chinese language teaching and language skills training,examining the application effects of large language models in generating tailored teaching content and converting textual content into multimodal teaching materials.Finally,the study proposes that teachers should rationally recognize the opportunities and challenges that large language models bring to the teaching ecosystem,while acknowledging the models’efficiency in empowering teachers’instruction,it is crucial to fully recognize their essential tool nature,uphold teachers’subjectivity,and pay close attention to the boundaries of their development and application. 展开更多
关键词 large language models EMPOWERMENT Teacher instruction
在线阅读 下载PDF
Assessing the proficiency of large language models on funduscopic disease knowledge
15
作者 Jun-Yi Wu Yan-Mei Zeng +8 位作者 Xian-Zhe Qian Qi Hong Jin-Yu Hu Hong Wei Jie Zou Cheng Chen Xiao-Yu Wang Xu Chen Yi Shao 《International Journal of Ophthalmology(English edition)》 2025年第7期1205-1213,共9页
AIM:To assess the performance of five distinct large language models(LLMs;ChatGPT-3.5,ChatGPT-4,PaLM2,Claude 2,and SenseNova)in comparison to two human cohorts(a group of funduscopic disease experts and a group of oph... AIM:To assess the performance of five distinct large language models(LLMs;ChatGPT-3.5,ChatGPT-4,PaLM2,Claude 2,and SenseNova)in comparison to two human cohorts(a group of funduscopic disease experts and a group of ophthalmologists)on the specialized subject of funduscopic disease.METHODS:Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test.The performance of these entities was assessed by comparing their average scores,response stability,and answer confidence,thereby establishing a basis for evaluation.RESULTS:Among all the LLMs,ChatGPT-4 and PaLM2 exhibited the most substantial average correlation.Additionally,ChatGPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam.In comparison to human cohorts,ChatGPT-4 exhibited comparable performance to ophthalmologists,albeit falling short of the expertise demonstrated by funduscopic disease specialists.CONCLUSION:The study provides evidence of the exceptional performance of ChatGPT-4 in the domain of funduscopic disease.With continued enhancements,validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians. 展开更多
关键词 large language models ChatGPT funduscopic disease
原文传递
Large language models’performances regarding common patient questions about osteoarthritis:A comparative analysis of ChatGPT-3.5,ChatGPT-4.0,and Perplexity
16
作者 Mingde Cao Qianwen Wang +4 位作者 Xueyou Zhang Zuru Liang Jihong Qiu Patrick Shu-Hang Yung Michael Tim-Yun Ong 《Journal of Sport and Health Science》 2025年第4期3-10,共8页
Background:Large Language Models(LLMs)have gained much attention and,in part,have replaced common search engines as a popular channel for obtaining information due to their contextually relevant responses.Osteoarthrit... Background:Large Language Models(LLMs)have gained much attention and,in part,have replaced common search engines as a popular channel for obtaining information due to their contextually relevant responses.Osteoarthritis(OA)is a common topic in skeletal muscle disor-ders,and patients often seek information about it online.Our study evaluated the ability of 3 LLMs(ChatGPT-3.5,ChatGPT-4.0,and Perplexity)to accurately answer common OA-related queries.Methods:We defined 6 themes(pathogenesis,risk factors,clinical presentation,diagnosis,treatment and prevention,and prognosis)based on a generalization of 25 frequently asked questions about OA.Three consultant-level orthopedic specialists independently rated the LLMs’replies on a 4-point accuracy scale.Thefinal ratings for each response were determined using a majority consensus approach.Responses classified as“satisfactory”were evaluated for comprehensiveness on a 5-point scale.Results:ChatGPT-4.0 demonstrated superior accuracy,with 64%of responses rated as“excellent”,compared to 40%for ChatGPT-3.5 and 28%for Perplexity(Pearson’s x2 test with Fisher’s exact test,all p<0.001).All 3 LLM-chatbots had high mean comprehensiveness ratings(Perplexity=3.88;ChatGPT-4.0=4.56;ChatGPT-3.5=3.96,out of a maximum score of 5).The LLM-chatbots performed reliably across domains,except for“treatment and prevention”However,ChatGPT-4.0 still outperformed ChatGPT-3.5 and Perplexity,garnering 53.8%“excellent”ratings(Pearson’s x2 test with Fisher’s exact test,all p<0.001).Conclusion:Ourfindings underscore the potential of LLMs,specifically ChatGPT-4.0 and Perplexity,to deliver accurate and thorough responses to OA-related queries.Targeted correction of specific misconceptions to improve the accuracy of LLMs remains crucial. 展开更多
关键词 large language models OSTEOARTHRITIS Primary care
暂未订购
Assessing the possibility of using large language models in ocular surface diseases
17
作者 Qian Ling Zi-Song Xu +11 位作者 Yan-Mei Zeng Qi Hong Xian-Zhe Qian Jin-Yu Hu Chong-Gang Pei Hong Wei Jie Zou Cheng Chen Xiao-Yu Wang Xu Chen Zhen-Kai Wu Yi Shao 《International Journal of Ophthalmology(English edition)》 2025年第1期1-8,共8页
AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surfa... AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specialized discipline,GPT-4 still exhibits superior performance,suggesting that its potential and ability to be applied in this field is enormous,perhaps with the potential to be a valuable resource for medical students and clinicians in the future. 展开更多
关键词 ChatGPT-4.0 ChatGPT-3.5 large language models ocular surface diseases
原文传递
Large language models in clinical psychiatry:Applications and optimization strategies
18
作者 Yi-Fan Wang Ming-Da Li +4 位作者 Su-Hong Wang Yin Fang Jie Sun Lin Lu Wei Yan 《World Journal of Psychiatry》 2025年第11期90-100,共11页
Psychiatric disorders constitute a complex health issue,primarily manifesting as significant disturbances in cognition,emotional regulation,and behavior.However,due to limited resources within health care systems,only... Psychiatric disorders constitute a complex health issue,primarily manifesting as significant disturbances in cognition,emotional regulation,and behavior.However,due to limited resources within health care systems,only a minority of patients can access effective treatment and care services,highlighting an urgent need for improvement.large language models(LLMs),with their natural language understanding and generation capabilities,are gradually penetrating the entire process of psychiatric diagnosis and treatment,including outpatient reception,diagnosis and therapy,clinical nursing,medication safety,and prognosis follow-up.They hold promise for improving the current severe shortage of health system resources and promoting equal access to mental health care.This article reviews the application scenarios and research progress of LLMs.It explores optimization methods for LLMs in psychiatry.Based on the research findings,we propose a clinical LLM for mental health using the Mixture of Experts framework to improve the accuracy of psychiatric diagnosis and therapeutic interventions. 展开更多
关键词 large language models Clinical psychiatry Mixture of experts Mental health Research progress
在线阅读 下载PDF
Measurement Problem of Enterprise Digital Transformation:New Methods and Findings Based on Large Language Models
19
作者 Jin Xingye Zuo Congjiang +2 位作者 Fang Mingyue Li Tao Nie Huihua 《China Economist》 2025年第2期70-95,共26页
Despite broad consensus on the importance of enterprise digital transformation,significant discrepancies persist regarding its actual effects.This divergence stems primarily from two key measurement challenges:(1)a la... Despite broad consensus on the importance of enterprise digital transformation,significant discrepancies persist regarding its actual effects.This divergence stems primarily from two key measurement challenges:(1)a lack of clear and consistent definitions of enterprise digital transformation,and(2)a lack of rigorous and accurate measurement methodologies.These shortcomings lead to research findings that are incomparable,difficult to replicate,and often conflicting.To effectively address the aforementioned challenges,this paper employs machine learning and large language models(LLMs)to construct a novel set of indicators for enterprise digital transformation.The work begins by manually annotating sentences from annual reports of listed companies in China from 2006 to 2020.These labeled sentences are then used to train and fine-tune several machine learning models,including LLMs.The ERNIE model,demonstrating the best classification performance among the models tested,is selected as the sentence classifier to predict sentence labels across the full text of the annual reports,ultimately constructing the enterprise digital transformation metrics.Both theoretical analysis and multiple data cross-validations demonstrate that the metrics developed in this paper are more accurate than existing approaches.Based on these metrics,the paper empirically examines the impact of enterprise digital transformation on financial performance.Our findings reveal three key points:(1)enterprise digital transformation significantly enhances financial performance,with big data,AI,mobile internet,cloud computing,and the Internet of Things(IoT)all playing a significant role;however,blockchain technology does not show a significant effect;(2)the significant positive effect of digital transformation on financial performance is primarily observed in firms with weaker initial financial performance;and(3)enterprise digital transformation improves financial performance mainly through enhancing efficiency and reducing costs.This research has practical implications for promoting enterprise digital transformation and fostering high-quality economic development. 展开更多
关键词 Enterprise digital transformation digital economy digital technology AI large language models(LLMs)
在线阅读 下载PDF
A Semantic Evaluation Framework for Medical Report Generation Using Large Language Models
20
作者 Haider Ali Rashadul Islam Sumon +2 位作者 Abdul Rehman Khalid Kounen Fathima Hee Cheol Kim 《Computers, Materials & Continua》 2025年第9期5445-5462,共18页
Artificial intelligence is reshaping radiology by enabling automated report generation,yet evaluating the clinical accuracy and relevance of these reports is a challenging task,as traditional natural language generati... Artificial intelligence is reshaping radiology by enabling automated report generation,yet evaluating the clinical accuracy and relevance of these reports is a challenging task,as traditional natural language generation metrics like BLEU and ROUGE prioritize lexical overlap over clinical relevance.To address this gap,we propose a novel semantic assessment framework for evaluating the accuracy of artificial intelligence-generated radiology reports against ground truth references.We trained 5229 image–report pairs from the Indiana University chest X-ray dataset on the R2GenRL model and generated a benchmark dataset on test data from the Indiana University chest X-ray and MIMIC-CXR datasets.These datasets were selected for their public availability,large scale,and comprehensive coverage of diverse clinical cases in chest radiography,enabling robust evaluation and comparison with prior work.Results demonstrate that the Mistral model,particularly with task-oriented prompting,achieves superior performance(up to 91.9%accuracy),surpassing other models and closely aligning with established metrics like BERTScore-F1(88.1%)and CLIP-Score(88.7%).Statistical analyses,including paired t-tests(p<0.01)and analysis of variance(p<0.05),confirm significant improvements driven by structured prompting.Failure case analysis reveals limitations,such as over-reliance on lexical similarity,underscoring the need for domain-specific fine-tuning.This framework advances the evaluation of artificial intelligence-driven(AI-driven)radiology report generation,offering a robust,clinically relevant metric for assessing semantic accuracy and paving the way for more reliable automated systems in medical imaging. 展开更多
关键词 Semantic assessment AI-generated radiology reports large language models prompt engineering semantic score evaluation
暂未订购
上一页 1 2 149 下一页 到第
使用帮助 返回顶部