In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes tech...In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes technical exchanges and learning globally.Second,resources required for large model R&D are difficult for a single institution to obtain.The evaluation of general large models also requires the participation of experts from various industries.Third,without open source collaboration,it is difficult to form a unified upper-layer software ecosystem.Therefore,open source has become an important cooperation mechanism to promote the development of AI and large models.There are two cases to illustrate how open source and international standards interact with each other.展开更多
The integration of large-scale foundation models(e.g.,GPT series and AlphaFold)into oncology is fundamentally transforming both research methodologies and clinical practices,driven by unprecedented advancements in com...The integration of large-scale foundation models(e.g.,GPT series and AlphaFold)into oncology is fundamentally transforming both research methodologies and clinical practices,driven by unprecedented advancements in computational power.This review synthesizes recent progress in the application of large language models to core oncological tasks,including medical imaging analysis,genomic interpretation,and personalized treatment planning.Underpinned by advanced computational infrastructures,such as graphics processing unit/tensor processing unit clusters,heterogeneous computing,and cloud platforms,these models enable superior representation learning and generalization across multimodal data sources.This review examines how these infrastructures overcome key bottlenecks in intelligent oncology through scalable optimization strategies,including mixed-precision training,memory optimization,and heterogeneous computing.Alongside these technical advancements,the review explores pressing challenges,such as data heterogeneity,limited model interpretability,regulatory uncertainties,and the environmental impact of artificial intelligence(AI)systems.Special emphasis is placed on emerging solutions,encompassing green AI and edge computing,which offer promising approaches for low-resource deployment scenarios.Additionally,the review highlights the critical role of interdisciplinary collaboration among oncology,computer science,ethics,and policy to ensure that AI systems are not only powerful but also transparent,safe,and clinically relevant.Finally,the review outlines potential avenues for future research aimed at developing robust,scalable,and human-centered frameworks for intelligent oncology.展开更多
This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interaction...This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions.Within the context of large models,AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential.The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry,emphasizing its adoption and necessity from a user's perspective.This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective.The fundamental drivers of AIx include the introduction of novel capabilities,replication of capabilities(both anthropomorphic and superhuman),migration of capabilities,aggregation of intelligence,and multiplication of capabilities.These elements are essential for propelling innovation,expanding the frontiers of capability,and realizing the exponential superposition of capabilities,thereby mitigating labor redundancy and addressing a spectrum of human needs.Furthermore,this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models.Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services,thereby facilitating a more intuitive,adaptive,and personalized engagement between humans and machines.The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction,corresponding to twenty-one specific scenarios,including applications in smart home systems,health assistance,and elderly care.This emphasizes the significance of this new paradigm in advancing HCI,fostering technological advancements,and redefining user experiences.However,it also acknowledges the challenges and ethical considerations that accompany this paradigm shift,recognizing the need for a balanced approach to harness the full potential of AIx in modern society.展开更多
Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved percepti...Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.展开更多
Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entere...Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entered the public eye,marking a significant milestone in LLM advancement.展开更多
The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These con...The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.展开更多
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si...The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.展开更多
The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language mod...The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language model Meta AI(LLaMa),have brought significant computational and energy challenges.Neuromorphic computing presents a biologically inspired approach to addressing these issues,leveraging event-driven processing and in-memory computation for enhanced energy efficiency.This survey explores the intersection of neuromorphic computing and large-scale deep learning models,focusing on neuromorphic models,learning methods,and hardware.We highlight transferable techniques from deep learning to neuromorphic computing and examine the memoryrelated scalability limitations of current neuromorphic systems.Furthermore,we identify potential directions to enable neuromorphic systems to meet the growing demands of modern AI workloads.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a...Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a significant role in the application field,greatly improving the efficiency of training and optimization,and contributing to the landing of many innovative artificial intelligence tools.Based on the Chinese PaddlePaddle large model framework,an application system is designed in combination with the intelligent classroom teaching scenario,which uses machine vision algorithms to distinguish and present teachers’and students’behaviors,that is,the digitization and multi-classification scheme of class character states.After having digital data,data analysis can be carried out to evaluate the class status of teachers and students,and the traditional subjective judgment such as peacetime grades and teaching ability can be upgraded to the objective judgment of artificial intelligence.展开更多
This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing...This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing patient queries about disease and lifestyle behaviors.The models selected were ChatGPT-4o,Gemini 2.0 Pro,Claude 3.5 Sonnet,and DeepSeek V3,with 12 questions chosen by two HCV experts from the domains of prevention,diagnosis,and treatment.展开更多
It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problemat...It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problematic,since there is a need to differentiate between these two scenarios.Until recently,the proper−semantic−causality of the relationship could have been determined only by human experts from the area of expertise of the studied data.This has changed with the advance of large language models,which are often utilized as surrogates for such human experts,making the process automated and readily available to all data analysts.This motivates the main objective of this work,which is to introduce the design and implementation of a large language model-based semantic causality evaluator based on correlation analysis,together with its visual analysis model called Causal heatmap.After the implementation itself,the model is evaluated from the point of view of the quality of the visual model,from the point of view of the quality of causal evaluation based on large language models,and from the point of view of comparative analysis,while the results reached in the study highlight the usability of large language models in the task and the potential of the proposed approach in the analysis of unknown datasets.The results of the experimental evaluation demonstrate the usefulness of the Causal heatmap method,supported by the evident highlighting of interesting relationships,while suppressing irrelevant ones.展开更多
This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to use...This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.展开更多
Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Ja...Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Jabbour et al.2017,Wong et al.2018,and Wu et al.2021)were replicated using ChatGPTv3.5 and Bard.Outputs(author,title,publication year,and journal)were compared to the original references and cross-referenced with medical databases for authenticity and recall.Results:Several themes emerged when comparing Bard and ChatGPT across the three reviews.Bard generated more outputs and had greater recall in Wong et al.'s review,with a broader date range in Jabbour et al.'s review.In Wu et al.'s review,ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.Conclusion:Large language models(LLMs)failed to fully replicate peer-reviewed methodologies,producing outputs with inaccuracies but identifying relevant,especially recent,articles missed by the references.While human-led PRISMA-based reviews remain the gold standard,refining LLMs for literature reviews shows potential.展开更多
The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streaml...The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streamlined in different domains.The offered study introduces an immediate optimization outline,named PROMPTx-PE,that is going to yield a greater level of precision and strength when it comes to the assignments that are premised on LLM.The proposed systemfeatures a timely selection schemewhich is informed by reinforcement learning,a contextual layer and a dynamic weighting module which is regulated by Lyapunov-based stability guidelines.The PROMPTx-PE dynamically varies the exploration and exploitation of the prompt space,depending on real-time feedback and multi-objective reward development.Extensive testing on both benchmark(GLUE,SuperGLUE)and domain-specific data(Healthcare-QA and Industrial-NER)demonstrates a large best performance to be 89.4%and a strong robustness disconnect with under 3%computation expense.The results confirm the effectiveness,consistency,and scalability of PROMPTx-PE as a platform of adaptive prompt engineering based on recent uses of LLMs.展开更多
Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a fr...Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a framework for evaluating LLMs and physician decisions in challenging lung cancer cases.Methods:We curated 50 challenging lung cancer cases(25 local and 25 published)classified as complex,rare,or refractory.Blinded three-dimensional,five-point Likert evaluations(1–5 for comprehensiveness,specificity,and readability)compared standalone LLMs(DeepSeek R1,Claude 3.5,Gemini 1.5,and GPT-4o),physicians by experience level(junior,intermediate,and senior),and AI-assisted juniors;intergroup differences and augmentation effects were analyzed statistically.Results:Of 50 challenging cases(18 complex,17 rare,and 15 refractory)rated by three experts,DeepSeek R1 achieved scores of 3.95±0.33,3.71±0.53,and 4.26±0.18 for comprehensiveness,specificity,and readability,respectively,positioning it between intermediate(3.68,3.68,3.75)and senior(4.50,4.64,4.53)physicians.GPT-4o and Claude 3.5 reached intermediate physician–level comprehensiveness(3.76±0.39,3.60±0.39)but junior-to-intermediate physician–level specificity(3.39±0.39,3.39±0.49).All LLMs scored higher on rare cases than intermediate physicians but fell below junior physicians in refractory-case specificity.AIassisted junior physicians showed marked gains in rare cases,with comprehensiveness rising from 2.32 to 4.29(84.8%),specificity from 2.24 to 4.26(90.8%),and readability from 2.76 to 4.59(66.0%),while specificity declined by 3.2%(3.17 to 3.07)in refractory cases.Error analysis showed complementary strengths,with physicians demonstrating reasoning stability and LLMs excelling in knowledge updating and risk management.Conclusions:LLMs performed variably in clinical decision-making tasks depending on case type,performing better in rare cases and worse in refractory cases requiring longitudinal reasoning.Complementary strengths between LLMs and physicians support case-and task-tailored human–AI collaboration.展开更多
LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora...LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora,which introduces significant bias in performance evaluation;and(2)mainstream metrics focus primarily on surface-level accuracy,failing to uncover the underlying factors that constrain model capabilities.To address these issues,this paper presents TCode(Translation-Oriented Code Evaluation benchmark)—a complexity-controllable,contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework.The dataset is carefully designed to control complexity along multiple dimensions—including syntactic nesting and expression intricacy—enabling both broad coverage and fine-grained differentiation of sample difficulty.This design supports precise evaluation of model capabilities across a wide spectrum of translation challenges.The proposed evaluation framework introduces a correlation-driven analysis mechanism based on static program features,enabling predictive modeling of translation success from two perspectives:Code Form Complexity(e.g.,code length and character density)and Semantic Modeling Complexity(e.g.,syntactic depth,control-flow nesting,and type system complexity).Empirical evaluations across representative LLMs—including Qwen2.5-72B and Llama3.3-70B—demonstrate that even state-of-the-art models achieve over 80% compilation success on simple samples,but their accuracy drops sharply below 40% on complex cases.Further correlation analysis indicates that Semantic Modeling Complexity alone is correlated with up to 60% of the variance in translation success,with static program features exhibiting nonlinear threshold effects that highlight clear capability boundaries.This study departs fromthe traditional accuracy-centric evaluation paradigm and,for the first time,systematically characterizes the capabilities of large languagemodels in translation tasks through the lens of programstatic features.The findings provide actionable insights for model refinement and training strategy development.展开更多
The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in S...The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in Spanish is challenging due to linguistic complexity and the scarcity of annotated resources.In this paper,we compare two predominant AI-based approaches for the forensic detection of malicious hate speech:(1)finetuning encoder-only models that have been trained in Spanish and(2)In-Context Learning techniques(Zero-and Few-Shot Learning)with large-scale language models.Our approach goes beyond binary classification,proposing a comprehensive,multidimensional evaluation that labels each text by:(1)type of speech,(2)recipient,(3)level of intensity(ordinal)and(4)targeted group(multi-label).Performance is evaluated using an annotated Spanish corpus,standard metrics such as precision,recall and F1-score and stability-oriented metrics to evaluate the stability of the transition from zero-shot to few-shot prompting(Zero-to-Few Shot Retention and Zero-to-Few Shot Gain)are applied.The results indicate that fine-tuned encoder-only models(notably MarIA and BETO variants)consistently deliver the strongest and most reliable performance:in our experiments their macro F1-scores lie roughly in the range of approximately 46%–66%depending on the task.Zero-shot approaches are much less stable and typically yield substantially lower performance(observed F1-scores range approximately 0%–39%),often producing invalid outputs in practice.Few-shot prompting(e.g.,Qwen 38B,Mistral 7B)generally improves stability and recall relative to pure zero-shot,bringing F1-scores into a moderate range of approximately 20%–51%but still falling short of fully fine-tuned models.These findings highlight the importance of supervised adaptation and discuss the potential of both paradigms as components in AI-powered cybersecurity and malware forensics systems designed to identify and mitigate coordinated online hate campaigns.展开更多
War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient an...War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient and inflexible,with particularly pronounced limitations in command and decision-making.The overwhelming volume of information and high decision complexity hinder the realization of autonomous and agile command and control.To address this challenge,an intelligent warfare simulation framework named Command-Agent is proposed,which deeply integrates large language models(LLMs)with digital twin battlefields.By constructing a highly realistic battlefield environment through real-time simulation and multi-source data fusion,the natural language interaction capabilities of LLMs are leveraged to lower the command threshold and to enable autonomous command through the Observe-Orient-Decide-Act(OODA)feedback loop.Within the Command-Agent framework,a multimodel collaborative architecture is further adopted to decouple the decision-generation and command-execution functions of LLMs.By combining specialized models such as Deep Seek-R1 and MCTool,the limitations of single-model capabilities are overcome.MCTool is a lightweight execution model fine-tuned for military Function Calling tasks.The framework also introduces a Vector Knowledge Base to mitigate hallucinations commonly exhibited by LLMs.Experimental results demonstrate that Command-Agent not only enables natural language-driven simulation and control but also deeply understands commander intent.Leveraging the multi-model collaborative architecture,during red-blue UAV confrontations involving 2 to 8 UAVs,the integrated score is improved by an average of 41.8%compared to the single-agent system(MCTool),accompanied by a 161.8%optimization in the battle loss ratio.Furthermore,when compared with multi-agent systems lacking the knowledge base,the inclusion of the Vector Knowledge Base further improves overall performance by 16.8%.In comparison with the general model(Qwen2.5-7B),the fine-tuned MCTool leads by 5%in execution efficiency.Therefore,the proposed Command-Agent introduces a novel perspective to the military command system and offers a feasible solution for intelligent battlefield decision-making.展开更多
Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that man...Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that manipulate model behavior through malicious instructions.Following Kitchenham’s guidelines,this systematic review synthesizes 128 peer-reviewed studies from 2022 to 2025 to provide a unified understanding of this rapidly evolving threat landscape.Our findings reveal a swift progression from simple direct injections to sophisticated multimodal attacks,achieving over 90%success rates against unprotected systems.In response,defense mechanisms show varying effectiveness:input preprocessing achieves 60%–80%detection rates and advanced architectural defenses demonstrate up to 95%protection against known patterns,though significant gaps persist against novel attack vectors.We identified 37 distinct defense approaches across three categories,but standardized evaluation frameworks remain limited.Our analysis attributes these vulnerabilities to fundamental LLM architectural limitations,such as the inability to distinguish instructions from data and attention mechanism vulnerabilities.This highlights critical research directions such as formal verification methods,standardized evaluation protocols,and architectural innovations for inherently secure LLM designs.展开更多
文摘In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes technical exchanges and learning globally.Second,resources required for large model R&D are difficult for a single institution to obtain.The evaluation of general large models also requires the participation of experts from various industries.Third,without open source collaboration,it is difficult to form a unified upper-layer software ecosystem.Therefore,open source has become an important cooperation mechanism to promote the development of AI and large models.There are two cases to illustrate how open source and international standards interact with each other.
文摘The integration of large-scale foundation models(e.g.,GPT series and AlphaFold)into oncology is fundamentally transforming both research methodologies and clinical practices,driven by unprecedented advancements in computational power.This review synthesizes recent progress in the application of large language models to core oncological tasks,including medical imaging analysis,genomic interpretation,and personalized treatment planning.Underpinned by advanced computational infrastructures,such as graphics processing unit/tensor processing unit clusters,heterogeneous computing,and cloud platforms,these models enable superior representation learning and generalization across multimodal data sources.This review examines how these infrastructures overcome key bottlenecks in intelligent oncology through scalable optimization strategies,including mixed-precision training,memory optimization,and heterogeneous computing.Alongside these technical advancements,the review explores pressing challenges,such as data heterogeneity,limited model interpretability,regulatory uncertainties,and the environmental impact of artificial intelligence(AI)systems.Special emphasis is placed on emerging solutions,encompassing green AI and edge computing,which offer promising approaches for low-resource deployment scenarios.Additionally,the review highlights the critical role of interdisciplinary collaboration among oncology,computer science,ethics,and policy to ensure that AI systems are not only powerful but also transparent,safe,and clinically relevant.Finally,the review outlines potential avenues for future research aimed at developing robust,scalable,and human-centered frameworks for intelligent oncology.
文摘This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions.Within the context of large models,AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential.The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry,emphasizing its adoption and necessity from a user's perspective.This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective.The fundamental drivers of AIx include the introduction of novel capabilities,replication of capabilities(both anthropomorphic and superhuman),migration of capabilities,aggregation of intelligence,and multiplication of capabilities.These elements are essential for propelling innovation,expanding the frontiers of capability,and realizing the exponential superposition of capabilities,thereby mitigating labor redundancy and addressing a spectrum of human needs.Furthermore,this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models.Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services,thereby facilitating a more intuitive,adaptive,and personalized engagement between humans and machines.The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction,corresponding to twenty-one specific scenarios,including applications in smart home systems,health assistance,and elderly care.This emphasizes the significance of this new paradigm in advancing HCI,fostering technological advancements,and redefining user experiences.However,it also acknowledges the challenges and ethical considerations that accompany this paradigm shift,recognizing the need for a balanced approach to harness the full potential of AIx in modern society.
文摘Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.
文摘Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entered the public eye,marking a significant milestone in LLM advancement.
基金supported in part by NSFC under Grant Nos.62402379,U22A2029 and U24A20237.
文摘The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.
基金The Natural Science Foundation of Hebei Province(F2024501044).
文摘The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.
文摘The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language model Meta AI(LLaMa),have brought significant computational and energy challenges.Neuromorphic computing presents a biologically inspired approach to addressing these issues,leveraging event-driven processing and in-memory computation for enhanced energy efficiency.This survey explores the intersection of neuromorphic computing and large-scale deep learning models,focusing on neuromorphic models,learning methods,and hardware.We highlight transferable techniques from deep learning to neuromorphic computing and examine the memoryrelated scalability limitations of current neuromorphic systems.Furthermore,we identify potential directions to enable neuromorphic systems to meet the growing demands of modern AI workloads.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
基金Education Department of Hainan Provincial(Hnky2024-43)Sanya University’s Industry-Education Integration Project(USY-CJRH2313)Financial Innovation and Multi-Asset Intelligent Trading Laboratory of the Key Laboratory of Philosophy and Social Sciences in Hainan Province of University of Sanya.
文摘Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a significant role in the application field,greatly improving the efficiency of training and optimization,and contributing to the landing of many innovative artificial intelligence tools.Based on the Chinese PaddlePaddle large model framework,an application system is designed in combination with the intelligent classroom teaching scenario,which uses machine vision algorithms to distinguish and present teachers’and students’behaviors,that is,the digitization and multi-classification scheme of class character states.After having digital data,data analysis can be carried out to evaluate the class status of teachers and students,and the traditional subjective judgment such as peacetime grades and teaching ability can be upgraded to the objective judgment of artificial intelligence.
基金funded by the National Key Research and Development Program of China(No.2021YFA1100500)the National Natural Science Foundation of China(No.82370662)the Key Research&Development Plan of Zhejiang Province(No.2024C03051).
文摘This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing patient queries about disease and lifestyle behaviors.The models selected were ChatGPT-4o,Gemini 2.0 Pro,Claude 3.5 Sonnet,and DeepSeek V3,with 12 questions chosen by two HCV experts from the domains of prevention,diagnosis,and treatment.
基金supported by University Grant Agency of Matej Bel University in Banská Bystrica project number UGA-14-PDS-2025.
文摘It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problematic,since there is a need to differentiate between these two scenarios.Until recently,the proper−semantic−causality of the relationship could have been determined only by human experts from the area of expertise of the studied data.This has changed with the advance of large language models,which are often utilized as surrogates for such human experts,making the process automated and readily available to all data analysts.This motivates the main objective of this work,which is to introduce the design and implementation of a large language model-based semantic causality evaluator based on correlation analysis,together with its visual analysis model called Causal heatmap.After the implementation itself,the model is evaluated from the point of view of the quality of the visual model,from the point of view of the quality of causal evaluation based on large language models,and from the point of view of comparative analysis,while the results reached in the study highlight the usability of large language models in the task and the potential of the proposed approach in the analysis of unknown datasets.The results of the experimental evaluation demonstrate the usefulness of the Causal heatmap method,supported by the evident highlighting of interesting relationships,while suppressing irrelevant ones.
基金funded by the Office of the Vice-President for Research and Development of Cebu Technological University.
文摘This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.
文摘Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Jabbour et al.2017,Wong et al.2018,and Wu et al.2021)were replicated using ChatGPTv3.5 and Bard.Outputs(author,title,publication year,and journal)were compared to the original references and cross-referenced with medical databases for authenticity and recall.Results:Several themes emerged when comparing Bard and ChatGPT across the three reviews.Bard generated more outputs and had greater recall in Wong et al.'s review,with a broader date range in Jabbour et al.'s review.In Wu et al.'s review,ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.Conclusion:Large language models(LLMs)failed to fully replicate peer-reviewed methodologies,producing outputs with inaccuracies but identifying relevant,especially recent,articles missed by the references.While human-led PRISMA-based reviews remain the gold standard,refining LLMs for literature reviews shows potential.
基金supported by the National Science and Technology Council(NSTC),Taiwan,under grant number 114-2221-E-182-041-MY3by Chang Gung University and Chang Gung Memorial Hospital under project number NERPD4Q0021.
文摘The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streamlined in different domains.The offered study introduces an immediate optimization outline,named PROMPTx-PE,that is going to yield a greater level of precision and strength when it comes to the assignments that are premised on LLM.The proposed systemfeatures a timely selection schemewhich is informed by reinforcement learning,a contextual layer and a dynamic weighting module which is regulated by Lyapunov-based stability guidelines.The PROMPTx-PE dynamically varies the exploration and exploitation of the prompt space,depending on real-time feedback and multi-objective reward development.Extensive testing on both benchmark(GLUE,SuperGLUE)and domain-specific data(Healthcare-QA and Industrial-NER)demonstrates a large best performance to be 89.4%and a strong robustness disconnect with under 3%computation expense.The results confirm the effectiveness,consistency,and scalability of PROMPTx-PE as a platform of adaptive prompt engineering based on recent uses of LLMs.
文摘Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a framework for evaluating LLMs and physician decisions in challenging lung cancer cases.Methods:We curated 50 challenging lung cancer cases(25 local and 25 published)classified as complex,rare,or refractory.Blinded three-dimensional,five-point Likert evaluations(1–5 for comprehensiveness,specificity,and readability)compared standalone LLMs(DeepSeek R1,Claude 3.5,Gemini 1.5,and GPT-4o),physicians by experience level(junior,intermediate,and senior),and AI-assisted juniors;intergroup differences and augmentation effects were analyzed statistically.Results:Of 50 challenging cases(18 complex,17 rare,and 15 refractory)rated by three experts,DeepSeek R1 achieved scores of 3.95±0.33,3.71±0.53,and 4.26±0.18 for comprehensiveness,specificity,and readability,respectively,positioning it between intermediate(3.68,3.68,3.75)and senior(4.50,4.64,4.53)physicians.GPT-4o and Claude 3.5 reached intermediate physician–level comprehensiveness(3.76±0.39,3.60±0.39)but junior-to-intermediate physician–level specificity(3.39±0.39,3.39±0.49).All LLMs scored higher on rare cases than intermediate physicians but fell below junior physicians in refractory-case specificity.AIassisted junior physicians showed marked gains in rare cases,with comprehensiveness rising from 2.32 to 4.29(84.8%),specificity from 2.24 to 4.26(90.8%),and readability from 2.76 to 4.59(66.0%),while specificity declined by 3.2%(3.17 to 3.07)in refractory cases.Error analysis showed complementary strengths,with physicians demonstrating reasoning stability and LLMs excelling in knowledge updating and risk management.Conclusions:LLMs performed variably in clinical decision-making tasks depending on case type,performing better in rare cases and worse in refractory cases requiring longitudinal reasoning.Complementary strengths between LLMs and physicians support case-and task-tailored human–AI collaboration.
文摘LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora,which introduces significant bias in performance evaluation;and(2)mainstream metrics focus primarily on surface-level accuracy,failing to uncover the underlying factors that constrain model capabilities.To address these issues,this paper presents TCode(Translation-Oriented Code Evaluation benchmark)—a complexity-controllable,contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework.The dataset is carefully designed to control complexity along multiple dimensions—including syntactic nesting and expression intricacy—enabling both broad coverage and fine-grained differentiation of sample difficulty.This design supports precise evaluation of model capabilities across a wide spectrum of translation challenges.The proposed evaluation framework introduces a correlation-driven analysis mechanism based on static program features,enabling predictive modeling of translation success from two perspectives:Code Form Complexity(e.g.,code length and character density)and Semantic Modeling Complexity(e.g.,syntactic depth,control-flow nesting,and type system complexity).Empirical evaluations across representative LLMs—including Qwen2.5-72B and Llama3.3-70B—demonstrate that even state-of-the-art models achieve over 80% compilation success on simple samples,but their accuracy drops sharply below 40% on complex cases.Further correlation analysis indicates that Semantic Modeling Complexity alone is correlated with up to 60% of the variance in translation success,with static program features exhibiting nonlinear threshold effects that highlight clear capability boundaries.This study departs fromthe traditional accuracy-centric evaluation paradigm and,for the first time,systematically characterizes the capabilities of large languagemodels in translation tasks through the lens of programstatic features.The findings provide actionable insights for model refinement and training strategy development.
基金the research project LaTe4PoliticES(PID2022-138099OB-I00)funded by MCIN/AEI/10.13039/501100011033 and the European Fund for Regional Development(ERDF)-a way to make Europe.Tomás Bernal-Beltrán is supported by University of Murcia through the predoctoral programme.
文摘The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in Spanish is challenging due to linguistic complexity and the scarcity of annotated resources.In this paper,we compare two predominant AI-based approaches for the forensic detection of malicious hate speech:(1)finetuning encoder-only models that have been trained in Spanish and(2)In-Context Learning techniques(Zero-and Few-Shot Learning)with large-scale language models.Our approach goes beyond binary classification,proposing a comprehensive,multidimensional evaluation that labels each text by:(1)type of speech,(2)recipient,(3)level of intensity(ordinal)and(4)targeted group(multi-label).Performance is evaluated using an annotated Spanish corpus,standard metrics such as precision,recall and F1-score and stability-oriented metrics to evaluate the stability of the transition from zero-shot to few-shot prompting(Zero-to-Few Shot Retention and Zero-to-Few Shot Gain)are applied.The results indicate that fine-tuned encoder-only models(notably MarIA and BETO variants)consistently deliver the strongest and most reliable performance:in our experiments their macro F1-scores lie roughly in the range of approximately 46%–66%depending on the task.Zero-shot approaches are much less stable and typically yield substantially lower performance(observed F1-scores range approximately 0%–39%),often producing invalid outputs in practice.Few-shot prompting(e.g.,Qwen 38B,Mistral 7B)generally improves stability and recall relative to pure zero-shot,bringing F1-scores into a moderate range of approximately 20%–51%but still falling short of fully fine-tuned models.These findings highlight the importance of supervised adaptation and discuss the potential of both paradigms as components in AI-powered cybersecurity and malware forensics systems designed to identify and mitigate coordinated online hate campaigns.
文摘War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient and inflexible,with particularly pronounced limitations in command and decision-making.The overwhelming volume of information and high decision complexity hinder the realization of autonomous and agile command and control.To address this challenge,an intelligent warfare simulation framework named Command-Agent is proposed,which deeply integrates large language models(LLMs)with digital twin battlefields.By constructing a highly realistic battlefield environment through real-time simulation and multi-source data fusion,the natural language interaction capabilities of LLMs are leveraged to lower the command threshold and to enable autonomous command through the Observe-Orient-Decide-Act(OODA)feedback loop.Within the Command-Agent framework,a multimodel collaborative architecture is further adopted to decouple the decision-generation and command-execution functions of LLMs.By combining specialized models such as Deep Seek-R1 and MCTool,the limitations of single-model capabilities are overcome.MCTool is a lightweight execution model fine-tuned for military Function Calling tasks.The framework also introduces a Vector Knowledge Base to mitigate hallucinations commonly exhibited by LLMs.Experimental results demonstrate that Command-Agent not only enables natural language-driven simulation and control but also deeply understands commander intent.Leveraging the multi-model collaborative architecture,during red-blue UAV confrontations involving 2 to 8 UAVs,the integrated score is improved by an average of 41.8%compared to the single-agent system(MCTool),accompanied by a 161.8%optimization in the battle loss ratio.Furthermore,when compared with multi-agent systems lacking the knowledge base,the inclusion of the Vector Knowledge Base further improves overall performance by 16.8%.In comparison with the general model(Qwen2.5-7B),the fine-tuned MCTool leads by 5%in execution efficiency.Therefore,the proposed Command-Agent introduces a novel perspective to the military command system and offers a feasible solution for intelligent battlefield decision-making.
基金supported by 2023 Higher Education Scientific Research Planning Project of China Society of Higher Education(No.23PG0408)2023 Philosophy and Social Science Research Programs in Jiangsu Province(No.2023SJSZ0993)+2 种基金Nantong Science and Technology Project(No.JC2023070)Key Project of Jiangsu Province Education Science 14th Five-Year Plan(Grant No.B-b/2024/02/41)the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province(Grant No.SKLACSS-202407).
文摘Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that manipulate model behavior through malicious instructions.Following Kitchenham’s guidelines,this systematic review synthesizes 128 peer-reviewed studies from 2022 to 2025 to provide a unified understanding of this rapidly evolving threat landscape.Our findings reveal a swift progression from simple direct injections to sophisticated multimodal attacks,achieving over 90%success rates against unprotected systems.In response,defense mechanisms show varying effectiveness:input preprocessing achieves 60%–80%detection rates and advanced architectural defenses demonstrate up to 95%protection against known patterns,though significant gaps persist against novel attack vectors.We identified 37 distinct defense approaches across three categories,but standardized evaluation frameworks remain limited.Our analysis attributes these vulnerabilities to fundamental LLM architectural limitations,such as the inability to distinguish instructions from data and attention mechanism vulnerabilities.This highlights critical research directions such as formal verification methods,standardized evaluation protocols,and architectural innovations for inherently secure LLM designs.