期刊文献+
共找到641篇文章
< 1 2 33 >
每页显示 20 50 100
CIT-Rec:Enhancing Sequential Recommendation System with Large Language Models
1
作者 Ziyu Li Zhen Chen +2 位作者 Xuejing Fu Tong Mo Weiping Li 《Computers, Materials & Continua》 2026年第3期2328-2343,共16页
Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interact... Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interactions to predict future items of interest.However,many current methods rely on unique user and item IDs,limiting their ability to represent users and items effectively,especially in zero-shot learning scenarios where training data is scarce.With the rapid development of Large Language Models(LLMs),researchers are exploring their potential to enhance recommendation systems.However,there is a semantic gap between the linguistic semantics of LLMs and the collaborative semantics of recommendation systems,where items are typically indexed by IDs.Moreover,most research focuses on item representations,neglecting personalized user modeling.To address these issues,we propose a sequential recommendation framework using LLMs,called CIT-Rec,a model that integrates Collaborative semantics for user representation and Image and Text information for item representation to enhance Recommendations.Specifically,by aligning intuitive image information with text containing semantic features,we can more accurately represent items,improving item representation quality.We focus not only on item representations but also on user representations.To more precisely capture users’personalized preferences,we use traditional sequential recommendation models to train on users’historical interaction data,effectively capturing behavioral patterns.Finally,by combining LLMs and traditional sequential recommendation models,we allow the LLM to understand linguistic semantics while capturing collaborative semantics.Extensive evaluations on real-world datasets show that our model outperforms baseline methods,effectively combining user interaction history with item visual and textual modalities to provide personalized recommendations. 展开更多
关键词 Large language models vision language models sequential recommendation instruction tuning
在线阅读 下载PDF
Detection of Maliciously Disseminated Hate Speech in Spanish Using Fine-Tuning and In-Context Learning Techniques with Large Language Models
2
作者 Tomás Bernal-Beltrán RonghaoPan +3 位作者 JoséAntonio García-Díaz María del Pilar Salas-Zárate Mario Andrés Paredes-Valverde Rafael Valencia-García 《Computers, Materials & Continua》 2026年第4期353-390,共38页
The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in S... The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in Spanish is challenging due to linguistic complexity and the scarcity of annotated resources.In this paper,we compare two predominant AI-based approaches for the forensic detection of malicious hate speech:(1)finetuning encoder-only models that have been trained in Spanish and(2)In-Context Learning techniques(Zero-and Few-Shot Learning)with large-scale language models.Our approach goes beyond binary classification,proposing a comprehensive,multidimensional evaluation that labels each text by:(1)type of speech,(2)recipient,(3)level of intensity(ordinal)and(4)targeted group(multi-label).Performance is evaluated using an annotated Spanish corpus,standard metrics such as precision,recall and F1-score and stability-oriented metrics to evaluate the stability of the transition from zero-shot to few-shot prompting(Zero-to-Few Shot Retention and Zero-to-Few Shot Gain)are applied.The results indicate that fine-tuned encoder-only models(notably MarIA and BETO variants)consistently deliver the strongest and most reliable performance:in our experiments their macro F1-scores lie roughly in the range of approximately 46%–66%depending on the task.Zero-shot approaches are much less stable and typically yield substantially lower performance(observed F1-scores range approximately 0%–39%),often producing invalid outputs in practice.Few-shot prompting(e.g.,Qwen 38B,Mistral 7B)generally improves stability and recall relative to pure zero-shot,bringing F1-scores into a moderate range of approximately 20%–51%but still falling short of fully fine-tuned models.These findings highlight the importance of supervised adaptation and discuss the potential of both paradigms as components in AI-powered cybersecurity and malware forensics systems designed to identify and mitigate coordinated online hate campaigns. 展开更多
关键词 Hate speech detection malicious communication campaigns AI-driven cybersecurity social media analytics large language models prompt-tuning fine-tuning in-context learning natural language processing
在线阅读 下载PDF
When Large Language Models and Machine Learning Meet Multi-Criteria Decision Making: Fully Integrated Approach for Social Media Moderation
3
作者 Noreen Fuentes Janeth Ugang +4 位作者 Narcisan Galamiton Suzette Bacus Samantha Shane Evangelista Fatima Maturan Lanndon Ocampo 《Computers, Materials & Continua》 2026年第1期2137-2162,共26页
This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to use... This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities. 展开更多
关键词 Self-moderation user-generated content k-means clustering TODIM large language models
在线阅读 下载PDF
Task-Structured Curriculum Learning for Multi-Task Distillation:Enhancing Step-by-Step Knowledge Transfer in Language Models
4
作者 Ahmet Ezgi Aytug Onan 《Computers, Materials & Continua》 2026年第3期1647-1673,共27页
Knowledge distillation has become a standard technique for compressing large language models into efficient student models,but existing methods often struggle to balance prediction accuracy with explanation quality.Re... Knowledge distillation has become a standard technique for compressing large language models into efficient student models,but existing methods often struggle to balance prediction accuracy with explanation quality.Recent approaches such as Distilling Step-by-Step(DSbS)introduce explanation supervision,yet they apply it in a uniform manner that may not fully exploit the different learning dynamics of prediction and explanation.In this work,we propose a task-structured curriculum learning(TSCL)framework that structures training into three sequential phases:(i)prediction-only,to establish stable feature representations;(ii)joint prediction-explanation,to align task outputs with rationale generation;and(iii)explanation-only,to refine the quality of rationales.This design provides a simple but effective modification to DSbS,requiring no architectural changes and adding negligible training cost.We justify the phase scheduling with ablation studies and convergence analysis,showing that an initial prediction-heavy stage followed by a balanced joint phase improves both stability and explanation alignment.Extensive experiments on five datasets(e-SNLI,ANLI,CommonsenseQA,SVAMP,and MedNLI)demonstrate that TSCL consistently outperforms strong baselines,achieving gains of+1.7-2.6 points in accuracy and 0.8-1.2 in ROUGE-L,corresponding to relative error reductions of up to 21%.Beyond lexical metrics,human evaluation and ERASERstyle faithfulness diagnostics confirm that TSCL produces more faithful and informative explanations.Comparative training curves further reveal faster convergence and lower variance across seeds.Efficiency analysis shows less than 3%overhead in wall-clock training time and no additional inference cost,making the approach practical for realworld deployment.This study demonstrates that a simple task-structured curriculum can significantly improve the effectiveness of knowledge distillation.By separating and sequencing objectives,TSCL achieves a better balance between accuracy,stability,and explanation quality.The framework generalizes across domains,including medical NLI,and offers a principled recipe for future applications in multimodal reasoning and reinforcement learning. 展开更多
关键词 Knowledge distillation curriculum learning language models multi-task learning step-by-step learning
在线阅读 下载PDF
Command-agent:Reconstructing warfare simulation and command decision-making using large language models
5
作者 Mengwei Zhang Minchi Kuang +3 位作者 Heng Shi Jihong Zhu Jingyu Zhu Xiao Jiang 《Defence Technology(防务技术)》 2026年第2期294-313,共20页
War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient an... War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient and inflexible,with particularly pronounced limitations in command and decision-making.The overwhelming volume of information and high decision complexity hinder the realization of autonomous and agile command and control.To address this challenge,an intelligent warfare simulation framework named Command-Agent is proposed,which deeply integrates large language models(LLMs)with digital twin battlefields.By constructing a highly realistic battlefield environment through real-time simulation and multi-source data fusion,the natural language interaction capabilities of LLMs are leveraged to lower the command threshold and to enable autonomous command through the Observe-Orient-Decide-Act(OODA)feedback loop.Within the Command-Agent framework,a multimodel collaborative architecture is further adopted to decouple the decision-generation and command-execution functions of LLMs.By combining specialized models such as Deep Seek-R1 and MCTool,the limitations of single-model capabilities are overcome.MCTool is a lightweight execution model fine-tuned for military Function Calling tasks.The framework also introduces a Vector Knowledge Base to mitigate hallucinations commonly exhibited by LLMs.Experimental results demonstrate that Command-Agent not only enables natural language-driven simulation and control but also deeply understands commander intent.Leveraging the multi-model collaborative architecture,during red-blue UAV confrontations involving 2 to 8 UAVs,the integrated score is improved by an average of 41.8%compared to the single-agent system(MCTool),accompanied by a 161.8%optimization in the battle loss ratio.Furthermore,when compared with multi-agent systems lacking the knowledge base,the inclusion of the Vector Knowledge Base further improves overall performance by 16.8%.In comparison with the general model(Qwen2.5-7B),the fine-tuned MCTool leads by 5%in execution efficiency.Therefore,the proposed Command-Agent introduces a novel perspective to the military command system and offers a feasible solution for intelligent battlefield decision-making. 展开更多
关键词 Digital twin battlefield Large language models Multi-agent system Military command
在线阅读 下载PDF
Prompt Injection Attacks on Large Language Models:A Survey of Attack Methods,Root Causes,and Defense Strategies
6
作者 Tongcheng Geng Zhiyuan Xu +1 位作者 Yubin Qu W.Eric Wong 《Computers, Materials & Continua》 2026年第4期134-185,共52页
Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that man... Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that manipulate model behavior through malicious instructions.Following Kitchenham’s guidelines,this systematic review synthesizes 128 peer-reviewed studies from 2022 to 2025 to provide a unified understanding of this rapidly evolving threat landscape.Our findings reveal a swift progression from simple direct injections to sophisticated multimodal attacks,achieving over 90%success rates against unprotected systems.In response,defense mechanisms show varying effectiveness:input preprocessing achieves 60%–80%detection rates and advanced architectural defenses demonstrate up to 95%protection against known patterns,though significant gaps persist against novel attack vectors.We identified 37 distinct defense approaches across three categories,but standardized evaluation frameworks remain limited.Our analysis attributes these vulnerabilities to fundamental LLM architectural limitations,such as the inability to distinguish instructions from data and attention mechanism vulnerabilities.This highlights critical research directions such as formal verification methods,standardized evaluation protocols,and architectural innovations for inherently secure LLM designs. 展开更多
关键词 Prompt injection attacks large language models defense mechanisms security evaluation
在线阅读 下载PDF
OPOR-Bench:Evaluating Large Language Models on Online Public Opinion Report Generation
7
作者 Jinzheng Yu Yang Xu +4 位作者 Haozhen Li Junqi Li Ligu Zhu Hao Shen Lei Shi 《Computers, Materials & Continua》 2026年第4期1403-1427,共25页
Online Public Opinion Reports consolidate news and social media for timely crisis management by governments and enterprises.While large language models(LLMs)enable automated report generation,this specific domain lack... Online Public Opinion Reports consolidate news and social media for timely crisis management by governments and enterprises.While large language models(LLMs)enable automated report generation,this specific domain lacks formal task definitions and corresponding benchmarks.To bridge this gap,we define the Automated Online Public Opinion Report Generation(OPOR-Gen)task and construct OPOR-Bench,an event-centric dataset with 463 crisis events across 108 countries(comprising 8.8 K news articles and 185 K tweets).To evaluate report quality,we propose OPOR-Eval,a novel agent-based framework that simulates human expert evaluation.Validation experiments show OPOR-Eval achieves a high Spearman’s correlation(ρ=0.70)with human judgments,though challenges in temporal reasoning persist.This work establishes an initial foundation for advancing automated public opinion reporting research. 展开更多
关键词 Online public opinion reports crisis management large language models agent-based evaluation
在线阅读 下载PDF
LLMKB:Large Language Models with Knowledge Base Augmentation for Conversational Recommendation
8
作者 FANG Xiu QIU Sijia +1 位作者 SUN Guohao LU Jinhu 《Journal of Donghua University(English Edition)》 2026年第1期91-103,共13页
Conversational recommender systems(CRSs)focus on refining preferences and providing personalized recommendations through natural language interactions and dialogue history.Large language models(LLMs)have shown outstan... Conversational recommender systems(CRSs)focus on refining preferences and providing personalized recommendations through natural language interactions and dialogue history.Large language models(LLMs)have shown outstanding performance across various domains,thereby prompting researchers to investigate their applicability in recommendation systems.However,due to the lack of task-specific knowledge and an inefficient feature extraction process,LLMs still have suboptimal performance in recommendation tasks.Therefore,external knowledge sources,such as knowledge graphs(KGs)and knowledge bases(KBs),are often introduced to address the issue of data sparsity.Compared to KGs,KBs possess higher retrieval efficiency,making them more suitable for scenarios where LLMs serve as recommenders.To this end,we introduce a novel framework integrating LLMs with KBs for enhanced retrieval generation,namely LLMKB.LLMKB initially leverages structured knowledge to create mapping dictionaries,extracting entity-relation information from heterogeneous knowledge to construct KBs.Then,LLMKB achieves the embedding calibration between user information representations and documents in KBs through retrieval model fine-tuning.Finally,LLMKB employs retrievalaugmented generation to produce recommendations based on fused text inputs,followed by post-processing.Experiment results on two public CRS datasets demonstrate the effectiveness of our framework.Our code is publicly available at the link:https://anonymous.4open.science/r/LLMKB-6FD0. 展开更多
关键词 recommender system large language model(LLM) knowledge base(KB)
在线阅读 下载PDF
Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond
9
作者 Xiansheng Cai Sihan Hu +4 位作者 Tao Wang Yuan Huang Pan Zhang Youjin Deng Kun Chen 《Chinese Physics Letters》 2025年第12期7-23,共17页
Fundamental physics often confronts complex symbolic problems with few guiding exemplars or established principles.While artificial intelligence(AI)offers promise,its typical need for vast datasets to learn from hinde... Fundamental physics often confronts complex symbolic problems with few guiding exemplars or established principles.While artificial intelligence(AI)offers promise,its typical need for vast datasets to learn from hinders its use in these information-scarce frontiers.We introduce learning at criticality(LaC),a reinforcement learning scheme that tunes large language models(LLMs)to a sharp learning transition,addressing this information scarcity.At this transition,LLMs achieve peak generalization from minimal data,exemplified by 7-digit base-7 addition-a test of nontrivial arithmetic reasoning.To elucidate this peak,we analyze a minimal concept-network model designed to capture the essence of how LLMs might link tokens.Trained on a single exemplar,this model also undergoes a sharp learning transition.This transition exhibits hallmarks of a second-order phase transition,notably power-law distributed solution path lengths.At this critical point,the system maximizes a“critical thinking pattern”crucial for generalization,enabled by the underlying scale-free exploration.This suggests LLMs reach peak performance by operating at criticality,where such explorative dynamics enable the extraction of underlying operational rules.We demonstrate LaC in quantum field theory:an 8B-parameter LLM,tuned to its critical point by LaC using a few exemplars of symbolic Matsubara sums,solves unseen,higher-order problems,significantly outperforming far larger models.LaC thus leverages critical phenomena,a physical principle,to empower AI for complex,data-sparse challenges in fundamental physics. 展开更多
关键词 artificial intelligence ai offers learning criticality lac symbolic problems large language models llms reinforcement learning large language models fundamental physics minimal dat
原文传递
Multilingual Text Summarization in Healthcare Using Pre-Trained Transformer-Based Language Models
10
作者 Josua Käser Thomas Nagy +1 位作者 Patrick Stirnemann Thomas Hanne 《Computers, Materials & Continua》 2025年第4期201-217,共17页
We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of t... We analyze the suitability of existing pre-trained transformer-based language models(PLMs)for abstractive text summarization on German technical healthcare texts.The study focuses on the multilingual capabilities of these models and their ability to perform the task of abstractive text summarization in the healthcare field.The research hypothesis was that large language models could perform high-quality abstractive text summarization on German technical healthcare texts,even if the model is not specifically trained in that language.Through experiments,the research questions explore the performance of transformer language models in dealing with complex syntax constructs,the difference in performance between models trained in English and German,and the impact of translating the source text to English before conducting the summarization.We conducted an evaluation of four PLMs(GPT-3,a translation-based approach also utilizing GPT-3,a German language Model,and a domain-specific bio-medical model approach).The evaluation considered the informativeness using 3 types of metrics based on Recall-Oriented Understudy for Gisting Evaluation(ROUGE)and the quality of results which is manually evaluated considering 5 aspects.The results show that text summarization models could be used in the German healthcare domain and that domain-independent language models achieved the best results.The study proves that text summarization models can simplify the search for pre-existing German knowledge in various domains. 展开更多
关键词 Text summarization pre-trained transformer-based language models large language models technical healthcare texts natural language processing
在线阅读 下载PDF
Large language models for robotics:Opportunities,challenges,and perspectives 被引量:4
11
作者 Jiaqi Wang Enze Shi +7 位作者 Huawen Hu Chong Ma Yiheng Liu Xuhui Wang Yincheng Yao Xuan Liu Bao Ge Shu Zhang 《Journal of Automation and Intelligence》 2025年第1期52-64,共13页
Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and langua... Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction. 展开更多
关键词 Large language models ROBOTICS Generative AI Embodied intelligence
在线阅读 下载PDF
When Software Security Meets Large Language Models:A Survey 被引量:4
12
作者 Xiaogang Zhu Wei Zhou +3 位作者 Qing-Long Han Wanlun Ma Sheng Wen Yang Xiang 《IEEE/CAA Journal of Automatica Sinica》 2025年第2期317-334,共18页
Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, ... Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, software testing and analysis are two of the critical methods, which significantly benefit from the advancements in deep learning technologies. Due to the successful use of deep learning in software security, recently,researchers have explored the potential of using large language models(LLMs) in this area. In this paper, we systematically review the results focusing on LLMs in software security. We analyze the topics of fuzzing, unit test, program repair, bug reproduction, data-driven bug detection, and bug triage. We deconstruct these techniques into several stages and analyze how LLMs can be used in the stages. We also discuss the future directions of using LLMs in software security, including the future directions for the existing use of LLMs and extensions from conventional deep learning research. 展开更多
关键词 Large language models(LLMs) software analysis software security software testing
在线阅读 下载PDF
The Security of Using Large Language Models:A Survey With Emphasis on ChatGPT 被引量:4
13
作者 Wei Zhou Xiaogang Zhu +4 位作者 Qing-Long Han Lin Li Xiao Chen Sheng Wen Yang Xiang 《IEEE/CAA Journal of Automatica Sinica》 2025年第1期1-26,共26页
ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential sec... ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential security risks that need to be carefully evaluated and addressed. In this survey, we provide an overview of the current state of research on security of using ChatGPT, with aspects of bias, disinformation, ethics, misuse,attacks and privacy. We review and discuss the literature on these topics and highlight open research questions and future directions.Through this survey, we aim to contribute to the academic discourse on AI security, enriching the understanding of potential risks and mitigations. We anticipate that this survey will be valuable for various stakeholders involved in AI development and usage, including AI researchers, developers, policy makers, and end-users. 展开更多
关键词 Artificial intelligence(AI) ChatGPT large language models(LLMs) SECURITY
在线阅读 下载PDF
On large language models safety,security,and privacy:A survey 被引量:3
14
作者 Ran Zhang Hong-Wei Li +2 位作者 Xin-Yuan Qian Wen-Bo Jiang Han-Xiao Chen 《Journal of Electronic Science and Technology》 2025年第1期1-21,共21页
The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.De... The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.Despite their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant challenges.These challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy leakage.Previous works often conflated safety issues with security concerns.In contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of LLMs.Building on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in LLMs.Additionally,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats. 展开更多
关键词 Large language models Privacy issues Safety issues Security issues
在线阅读 下载PDF
Evaluating research quality with Large Language Models:An analysis of ChatGPT’s effectiveness with different settings and inputs 被引量:1
15
作者 Mike Thelwall 《Journal of Data and Information Science》 2025年第1期7-25,共19页
Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether ... Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether Large Language Models(LLMs)can play a role in this process.Design/methodology/approach:This article assesses which ChatGPT inputs(full text without tables,figures,and references;title and abstract;title only)produce better quality score estimates,and the extent to which scores are affected by ChatGPT models and system prompts.Findings:The optimal input is the article title and abstract,with average ChatGPT scores based on these(30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores,the highest ever reported.ChatGPT 4o is slightly better than 3.5-turbo(0.66),and 4o-mini(0.66).Research limitations:The data is a convenience sample of the work of a single author,it only includes one field,and the scores are self-evaluations.Practical implications:The results suggest that article full texts might confuse LLM research quality evaluations,even though complex system instructions for the task are more effective than simple ones.Thus,whilst abstracts contain insufficient information for a thorough assessment of rigour,they may contain strong pointers about originality and significance.Finally,linear regression can be used to convert the model scores into the human scale scores,which is 31%more accurate than guessing.Originality/value:This is the first systematic comparison of the impact of different prompts,parameters and inputs for ChatGPT research quality evaluations. 展开更多
关键词 ChatGPT Large language models LLMs SCIENTOMETRICS Research Assessment
在线阅读 下载PDF
Evaluating large language models as patient education tools for inflammatory bowel disease:A comparative study 被引量:1
16
作者 Yan Zhang Xiao-Han Wan +6 位作者 Qing-Zhou Kong Han Liu Jun Liu Jing Guo Xiao-Yun Yang Xiu-Li Zuo Yan-Qing Li 《World Journal of Gastroenterology》 2025年第6期34-43,共10页
BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patie... BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided. 展开更多
关键词 Inflammatory bowel disease Large language models Patient education Medical information accuracy Readability assessment
暂未订购
Assessing the possibility of using large language models in ocular surface diseases 被引量:1
17
作者 Qian Ling Zi-Song Xu +11 位作者 Yan-Mei Zeng Qi Hong Xian-Zhe Qian Jin-Yu Hu Chong-Gang Pei Hong Wei Jie Zou Cheng Chen Xiao-Yu Wang Xu Chen Zhen-Kai Wu Yi Shao 《International Journal of Ophthalmology(English edition)》 2025年第1期1-8,共8页
AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surfa... AIM:To assess the possibility of using different large language models(LLMs)in ocular surface diseases by selecting five different LLMS to test their accuracy in answering specialized questions related to ocular surface diseases:ChatGPT-4,ChatGPT-3.5,Claude 2,PaLM2,and SenseNova.METHODS:A group of experienced ophthalmology professors were asked to develop a 100-question singlechoice question on ocular surface diseases designed to assess the performance of LLMs and human participants in answering ophthalmology specialty exam questions.The exam includes questions on the following topics:keratitis disease(20 questions),keratoconus,keratomalaciac,corneal dystrophy,corneal degeneration,erosive corneal ulcers,and corneal lesions associated with systemic diseases(20 questions),conjunctivitis disease(20 questions),trachoma,pterygoid and conjunctival tumor diseases(20 questions),and dry eye disease(20 questions).Then the total score of each LLMs and compared their mean score,mean correlation,variance,and confidence were calculated.RESULTS:GPT-4 exhibited the highest performance in terms of LLMs.Comparing the average scores of the LLMs group with the four human groups,chief physician,attending physician,regular trainee,and graduate student,it was found that except for ChatGPT-4,the total score of the rest of the LLMs is lower than that of the graduate student group,which had the lowest score in the human group.Both ChatGPT-4 and PaLM2 were more likely to give exact and correct answers,giving very little chance of an incorrect answer.ChatGPT-4 showed higher credibility when answering questions,with a success rate of 59%,but gave the wrong answer to the question 28% of the time.CONCLUSION:GPT-4 model exhibits excellent performance in both answer relevance and confidence.PaLM2 shows a positive correlation(up to 0.8)in terms of answer accuracy during the exam.In terms of answer confidence,PaLM2 is second only to GPT4 and surpasses Claude 2,SenseNova,and GPT-3.5.Despite the fact that ocular surface disease is a highly specialized discipline,GPT-4 still exhibits superior performance,suggesting that its potential and ability to be applied in this field is enormous,perhaps with the potential to be a valuable resource for medical students and clinicians in the future. 展开更多
关键词 ChatGPT-4.0 ChatGPT-3.5 large language models ocular surface diseases
原文传递
Anime Generation through Diffusion and Language Models:A Comprehensive Survey of Techniques and Trends
18
作者 Yujie Wu Xing Deng +4 位作者 Haijian Shao Ke Cheng Ming Zhang Yingtao Jiang Fei Wang 《Computer Modeling in Engineering & Sciences》 2025年第9期2709-2778,共70页
The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation... The application of generative artificial intelligence(AI)is bringing about notable changes in anime creation.This paper surveys recent advancements and applications of diffusion and language models in anime generation,focusing on their demonstrated potential to enhance production efficiency through automation and personalization.Despite these benefits,it is crucial to acknowledge the substantial initial computational investments required for training and deploying these models.We conduct an in-depth survey of cutting-edge generative AI technologies,encompassing models such as Stable Diffusion and GPT,and appraise pivotal large-scale datasets alongside quantifiable evaluation metrics.Review of the surveyed literature indicates the achievement of considerable maturity in the capacity of AI models to synthesize high-quality,aesthetically compelling anime visual images from textual prompts,alongside discernible progress in the generation of coherent narratives.However,achieving perfect long-form consistency,mitigating artifacts like flickering in video sequences,and enabling fine-grained artistic control remain critical ongoing challenges.Building upon these advancements,research efforts have increasingly pivoted towards the synthesis of higher-dimensional content,such as video and three-dimensional assets,with recent studies demonstrating significant progress in this burgeoning field.Nevertheless,formidable challenges endure amidst these advancements.Foremost among these are the substantial computational exigencies requisite for training and deploying these sophisticated models,particularly pronounced in the realm of high-dimensional generation such as video synthesis.Additional persistent hurdles include maintaining spatial-temporal consistency across complex scenes and mitigating ethical considerations surrounding bias and the preservation of human creative autonomy.This research underscores the transformative potential and inherent complexities of AI-driven synergy within the creative industries.We posit that future research should be dedicated to the synergistic fusion of diffusion and autoregressive models,the integration of multimodal inputs,and the balanced consideration of ethical implications,particularly regarding bias and the preservation of human creative autonomy,thereby establishing a robust foundation for the advancement of anime creation and the broader landscape of AI-driven content generation. 展开更多
关键词 Diffusion models language models anime generation image synthesis video generation stable diffusion AIGC
在线阅读 下载PDF
Cost vs clinical utility on application of large language models in clinical practice:A double-edged sword
19
作者 Sunny Chi Lik Au 《World Journal of Gastrointestinal Oncology》 2025年第12期7-11,共5页
As large language models increasingly permeate medical workflows,a recent study evaluating ChatGPT 4.0’s performance in addressing patient queries about endoscopic submucosal dissection and endoscopic mucosal resecti... As large language models increasingly permeate medical workflows,a recent study evaluating ChatGPT 4.0’s performance in addressing patient queries about endoscopic submucosal dissection and endoscopic mucosal resection offers critical insights into three domains:Performance parity,cost democratization,and clinical readiness.The findings highlight ChatGPT’s high accuracy,completeness,and comprehensibility,suggesting potential cost efficiency in patient education.Yet,cost-effectiveness alone does not ensure clinical utility.Notably,the study relied exclusively on text-based prompts,omitting multimodal data such as photographs or endoscopic scans.This is a significant limitation in a visually driven field like endoscopy,where large language model performance may drop precipitously without image context.Without multimodal integration,artificial intelligence tools risk failing to capture key diagnostic signals,underscoring the need for cautious adoption and robust validation in clinical practice. 展开更多
关键词 Large language models Artificial intelligence ChatGPT COST Patient education Endoscopic submucosal dissection Endoscopic mucosal resection
在线阅读 下载PDF
上一页 1 2 33 下一页 到第
使用帮助 返回顶部