In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes tech...In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes technical exchanges and learning globally.Second,resources required for large model R&D are difficult for a single institution to obtain.The evaluation of general large models also requires the participation of experts from various industries.Third,without open source collaboration,it is difficult to form a unified upper-layer software ecosystem.Therefore,open source has become an important cooperation mechanism to promote the development of AI and large models.There are two cases to illustrate how open source and international standards interact with each other.展开更多
This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interaction...This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions.Within the context of large models,AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential.The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry,emphasizing its adoption and necessity from a user's perspective.This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective.The fundamental drivers of AIx include the introduction of novel capabilities,replication of capabilities(both anthropomorphic and superhuman),migration of capabilities,aggregation of intelligence,and multiplication of capabilities.These elements are essential for propelling innovation,expanding the frontiers of capability,and realizing the exponential superposition of capabilities,thereby mitigating labor redundancy and addressing a spectrum of human needs.Furthermore,this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models.Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services,thereby facilitating a more intuitive,adaptive,and personalized engagement between humans and machines.The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction,corresponding to twenty-one specific scenarios,including applications in smart home systems,health assistance,and elderly care.This emphasizes the significance of this new paradigm in advancing HCI,fostering technological advancements,and redefining user experiences.However,it also acknowledges the challenges and ethical considerations that accompany this paradigm shift,recognizing the need for a balanced approach to harness the full potential of AIx in modern society.展开更多
Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved percepti...Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.展开更多
Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entere...Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entered the public eye,marking a significant milestone in LLM advancement.展开更多
The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These con...The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.展开更多
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si...The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.展开更多
The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language mod...The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language model Meta AI(LLaMa),have brought significant computational and energy challenges.Neuromorphic computing presents a biologically inspired approach to addressing these issues,leveraging event-driven processing and in-memory computation for enhanced energy efficiency.This survey explores the intersection of neuromorphic computing and large-scale deep learning models,focusing on neuromorphic models,learning methods,and hardware.We highlight transferable techniques from deep learning to neuromorphic computing and examine the memoryrelated scalability limitations of current neuromorphic systems.Furthermore,we identify potential directions to enable neuromorphic systems to meet the growing demands of modern AI workloads.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a...Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a significant role in the application field,greatly improving the efficiency of training and optimization,and contributing to the landing of many innovative artificial intelligence tools.Based on the Chinese PaddlePaddle large model framework,an application system is designed in combination with the intelligent classroom teaching scenario,which uses machine vision algorithms to distinguish and present teachers’and students’behaviors,that is,the digitization and multi-classification scheme of class character states.After having digital data,data analysis can be carried out to evaluate the class status of teachers and students,and the traditional subjective judgment such as peacetime grades and teaching ability can be upgraded to the objective judgment of artificial intelligence.展开更多
1 Background and motivation Recent advances in foundation models have ushered in a paradigm shift across the field of artificial intelligence(AI),with profound implications for financial technology(FinTech).Foundation...1 Background and motivation Recent advances in foundation models have ushered in a paradigm shift across the field of artificial intelligence(AI),with profound implications for financial technology(FinTech).Foundation models refer to large-scale neural networks trained on vast and heterogeneous corpora using self-supervised or instruction-driven objectives,which endow them with strong generalization and transfer capabilities across downstream tasks.Representative classes of such models,including large language models(LLMs),multimodal foundation models,and timeseries foundation models,exhibit emergent abilities in semantic understanding,reasoning,and multimodal representation learning.展开更多
The rapid advancement of artificial intelligence technology is driving transformative changes in medical diagnosis,treatment,and management systems through large-scale deep learning models-a process that brings both g...The rapid advancement of artificial intelligence technology is driving transformative changes in medical diagnosis,treatment,and management systems through large-scale deep learning models-a process that brings both groundbreaking opportunities and multifaceted challenges.This study focuses on the medical and healthcare applications of large-scale deep learning architectures,conducting a comprehensive survey to categorize and analyze their diverse uses.The survey results reveal that current applications of large models in healthcare encompass medical data management,healthcare services,medical devices,and preventive medicine,among others.Concurrently,large models demonstrate significant advantages in the medical domain,especially in high-precision diagnosis and prediction,data analysis and knowledge discovery,and enhancing operational efficiency.Nevertheless,we identify several challenges that need urgent attention,including improving the interpretability of large models,strengthening privacy protection,and addressing issues related to handling incomplete data.This research is dedicated to systematically elucidating the deep collaborative mechanisms between artificial intelligence and the healthcare field,providing theoretical references and practical guidance for both academia and industry.展开更多
Intelligent spatial-temporal data analysis,leveraging data such as multivariate time series and geographic information,provides researchers with powerful tools to uncover multiscale patterns and enhance decision-makin...Intelligent spatial-temporal data analysis,leveraging data such as multivariate time series and geographic information,provides researchers with powerful tools to uncover multiscale patterns and enhance decision-making processes.As artificial intelligence advances,intelligent spatial-temporal algorithms have found extensive applications across various disciplines,such as geosciences,biology,and public health.1 Compared to traditional methods,these algorithms are data driven,making them well suited for addressing the complexities of modeling real-world systems.However,their reliance on substantial domain-specific expertise limits their broader applicability.Recently,significant advancements have been made in spatial-temporal large models.Trained on large-scale data,these models exhibit a vast parameter scale,superior generalization capabilities,and multitasking advantages over previous methods.Their high versatility and scalability position them as promising super hubs for multidisciplinary research,integrating knowledge,intelligent algorithms,and research communities from different fields.Nevertheless,achieving this vision will require overcoming numerous critical challenges,offering an expansive and profound space for future exploration.展开更多
Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and langua...Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction.展开更多
Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether ...Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether Large Language Models(LLMs)can play a role in this process.Design/methodology/approach:This article assesses which ChatGPT inputs(full text without tables,figures,and references;title and abstract;title only)produce better quality score estimates,and the extent to which scores are affected by ChatGPT models and system prompts.Findings:The optimal input is the article title and abstract,with average ChatGPT scores based on these(30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores,the highest ever reported.ChatGPT 4o is slightly better than 3.5-turbo(0.66),and 4o-mini(0.66).Research limitations:The data is a convenience sample of the work of a single author,it only includes one field,and the scores are self-evaluations.Practical implications:The results suggest that article full texts might confuse LLM research quality evaluations,even though complex system instructions for the task are more effective than simple ones.Thus,whilst abstracts contain insufficient information for a thorough assessment of rigour,they may contain strong pointers about originality and significance.Finally,linear regression can be used to convert the model scores into the human scale scores,which is 31%more accurate than guessing.Originality/value:This is the first systematic comparison of the impact of different prompts,parameters and inputs for ChatGPT research quality evaluations.展开更多
BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patie...BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.展开更多
The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.De...The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.Despite their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant challenges.These challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy leakage.Previous works often conflated safety issues with security concerns.In contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of LLMs.Building on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in LLMs.Additionally,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats.展开更多
Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, ...Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, software testing and analysis are two of the critical methods, which significantly benefit from the advancements in deep learning technologies. Due to the successful use of deep learning in software security, recently,researchers have explored the potential of using large language models(LLMs) in this area. In this paper, we systematically review the results focusing on LLMs in software security. We analyze the topics of fuzzing, unit test, program repair, bug reproduction, data-driven bug detection, and bug triage. We deconstruct these techniques into several stages and analyze how LLMs can be used in the stages. We also discuss the future directions of using LLMs in software security, including the future directions for the existing use of LLMs and extensions from conventional deep learning research.展开更多
ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential sec...ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential security risks that need to be carefully evaluated and addressed. In this survey, we provide an overview of the current state of research on security of using ChatGPT, with aspects of bias, disinformation, ethics, misuse,attacks and privacy. We review and discuss the literature on these topics and highlight open research questions and future directions.Through this survey, we aim to contribute to the academic discourse on AI security, enriching the understanding of potential risks and mitigations. We anticipate that this survey will be valuable for various stakeholders involved in AI development and usage, including AI researchers, developers, policy makers, and end-users.展开更多
Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprec...Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice.展开更多
AIM:To investigate the capabilities of large language models(LLM)for providing information and diagnoses in the field of neuro-ophthalmology by comparing the performances of ChatGPT-3.5 and-4.0,Bard,and Bing.METHODS:E...AIM:To investigate the capabilities of large language models(LLM)for providing information and diagnoses in the field of neuro-ophthalmology by comparing the performances of ChatGPT-3.5 and-4.0,Bard,and Bing.METHODS:Each chatbot was evaluated for four criteria,namely diagnostic success rate for the described case,answer quality,response speed,and critical keywords for diagnosis.The selected topics included optic neuritis,nonarteritic anterior ischemic optic neuropathy,and Leber hereditary optic neuropathy.RESULTS:In terms of diagnostic success rate for the described cases,Bard was unable to provide a diagnosis.The success rates for the described cases increased in the order of Bing,ChatGPT-3.5,and ChatGPT-4.0.Further,ChatGPT-4.0 and-3.5 provided the most satisfactory answer quality for judgment by neuro-ophthalmologists,with their sets of answers resembling the sample set most.Bard was only able to provide ten differential diagnoses in three trials.Bing scored the lowest for the satisfactory standard.A Mann-Whitney test indicated that Bard was significantly faster than ChatGPT-4.0(Z=-3.576,P=0.000),ChatGPT-3.5(Z=-3.576,P=0.000)and Bing(Z=-2.517,P=0.011).ChatGPT-3.5 and-4.0 far exceeded the other two interfaces at providing diagnoses and were thus used to find the critical keywords for diagnosis.CONCLUSION:ChatGPT-3.5 and-4.0 are better than Bard and Bing in terms of answer success rate,answer quality,and critical keywords for diagnosis in ophthalmology.This study has broad implications for the field of ophthalmology,providing further evidence that artificial intelligence LLM can aid clinical decision-making through free-text explanations.展开更多
文摘In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes technical exchanges and learning globally.Second,resources required for large model R&D are difficult for a single institution to obtain.The evaluation of general large models also requires the participation of experts from various industries.Third,without open source collaboration,it is difficult to form a unified upper-layer software ecosystem.Therefore,open source has become an important cooperation mechanism to promote the development of AI and large models.There are two cases to illustrate how open source and international standards interact with each other.
文摘This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions.Within the context of large models,AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential.The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry,emphasizing its adoption and necessity from a user's perspective.This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective.The fundamental drivers of AIx include the introduction of novel capabilities,replication of capabilities(both anthropomorphic and superhuman),migration of capabilities,aggregation of intelligence,and multiplication of capabilities.These elements are essential for propelling innovation,expanding the frontiers of capability,and realizing the exponential superposition of capabilities,thereby mitigating labor redundancy and addressing a spectrum of human needs.Furthermore,this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models.Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services,thereby facilitating a more intuitive,adaptive,and personalized engagement between humans and machines.The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction,corresponding to twenty-one specific scenarios,including applications in smart home systems,health assistance,and elderly care.This emphasizes the significance of this new paradigm in advancing HCI,fostering technological advancements,and redefining user experiences.However,it also acknowledges the challenges and ethical considerations that accompany this paradigm shift,recognizing the need for a balanced approach to harness the full potential of AIx in modern society.
文摘Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.
文摘Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entered the public eye,marking a significant milestone in LLM advancement.
基金supported in part by NSFC under Grant Nos.62402379,U22A2029 and U24A20237.
文摘The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.
基金The Natural Science Foundation of Hebei Province(F2024501044).
文摘The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.
文摘The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language model Meta AI(LLaMa),have brought significant computational and energy challenges.Neuromorphic computing presents a biologically inspired approach to addressing these issues,leveraging event-driven processing and in-memory computation for enhanced energy efficiency.This survey explores the intersection of neuromorphic computing and large-scale deep learning models,focusing on neuromorphic models,learning methods,and hardware.We highlight transferable techniques from deep learning to neuromorphic computing and examine the memoryrelated scalability limitations of current neuromorphic systems.Furthermore,we identify potential directions to enable neuromorphic systems to meet the growing demands of modern AI workloads.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
基金Education Department of Hainan Provincial(Hnky2024-43)Sanya University’s Industry-Education Integration Project(USY-CJRH2313)Financial Innovation and Multi-Asset Intelligent Trading Laboratory of the Key Laboratory of Philosophy and Social Sciences in Hainan Province of University of Sanya.
文摘Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a significant role in the application field,greatly improving the efficiency of training and optimization,and contributing to the landing of many innovative artificial intelligence tools.Based on the Chinese PaddlePaddle large model framework,an application system is designed in combination with the intelligent classroom teaching scenario,which uses machine vision algorithms to distinguish and present teachers’and students’behaviors,that is,the digitization and multi-classification scheme of class character states.After having digital data,data analysis can be carried out to evaluate the class status of teachers and students,and the traditional subjective judgment such as peacetime grades and teaching ability can be upgraded to the objective judgment of artificial intelligence.
文摘1 Background and motivation Recent advances in foundation models have ushered in a paradigm shift across the field of artificial intelligence(AI),with profound implications for financial technology(FinTech).Foundation models refer to large-scale neural networks trained on vast and heterogeneous corpora using self-supervised or instruction-driven objectives,which endow them with strong generalization and transfer capabilities across downstream tasks.Representative classes of such models,including large language models(LLMs),multimodal foundation models,and timeseries foundation models,exhibit emergent abilities in semantic understanding,reasoning,and multimodal representation learning.
基金funded by the National Natural Science Foundation of China(Grant No.62272236)the Natural Science Foundation of Jiangsu Province(Grant No.BK20201136).
文摘The rapid advancement of artificial intelligence technology is driving transformative changes in medical diagnosis,treatment,and management systems through large-scale deep learning models-a process that brings both groundbreaking opportunities and multifaceted challenges.This study focuses on the medical and healthcare applications of large-scale deep learning architectures,conducting a comprehensive survey to categorize and analyze their diverse uses.The survey results reveal that current applications of large models in healthcare encompass medical data management,healthcare services,medical devices,and preventive medicine,among others.Concurrently,large models demonstrate significant advantages in the medical domain,especially in high-precision diagnosis and prediction,data analysis and knowledge discovery,and enhancing operational efficiency.Nevertheless,we identify several challenges that need urgent attention,including improving the interpretability of large models,strengthening privacy protection,and addressing issues related to handling incomplete data.This research is dedicated to systematically elucidating the deep collaborative mechanisms between artificial intelligence and the healthcare field,providing theoretical references and practical guidance for both academia and industry.
基金supported by NSFC No.62372430the Youth Innovation Promotion As-sociation CAS No.2023112.
文摘Intelligent spatial-temporal data analysis,leveraging data such as multivariate time series and geographic information,provides researchers with powerful tools to uncover multiscale patterns and enhance decision-making processes.As artificial intelligence advances,intelligent spatial-temporal algorithms have found extensive applications across various disciplines,such as geosciences,biology,and public health.1 Compared to traditional methods,these algorithms are data driven,making them well suited for addressing the complexities of modeling real-world systems.However,their reliance on substantial domain-specific expertise limits their broader applicability.Recently,significant advancements have been made in spatial-temporal large models.Trained on large-scale data,these models exhibit a vast parameter scale,superior generalization capabilities,and multitasking advantages over previous methods.Their high versatility and scalability position them as promising super hubs for multidisciplinary research,integrating knowledge,intelligent algorithms,and research communities from different fields.Nevertheless,achieving this vision will require overcoming numerous critical challenges,offering an expansive and profound space for future exploration.
基金supported by National Natural Science Foundation of China(62376219 and 62006194)Foundational Research Project in Specialized Discipline(Grant No.G2024WD0146)Faculty Construction Project(Grant No.24GH0201148).
文摘Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction.
文摘Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether Large Language Models(LLMs)can play a role in this process.Design/methodology/approach:This article assesses which ChatGPT inputs(full text without tables,figures,and references;title and abstract;title only)produce better quality score estimates,and the extent to which scores are affected by ChatGPT models and system prompts.Findings:The optimal input is the article title and abstract,with average ChatGPT scores based on these(30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores,the highest ever reported.ChatGPT 4o is slightly better than 3.5-turbo(0.66),and 4o-mini(0.66).Research limitations:The data is a convenience sample of the work of a single author,it only includes one field,and the scores are self-evaluations.Practical implications:The results suggest that article full texts might confuse LLM research quality evaluations,even though complex system instructions for the task are more effective than simple ones.Thus,whilst abstracts contain insufficient information for a thorough assessment of rigour,they may contain strong pointers about originality and significance.Finally,linear regression can be used to convert the model scores into the human scale scores,which is 31%more accurate than guessing.Originality/value:This is the first systematic comparison of the impact of different prompts,parameters and inputs for ChatGPT research quality evaluations.
基金Supported by the China Health Promotion Foundation Young Doctors'Research Foundation for Inflammatory Bowel Disease,the Taishan Scholars Program of Shandong Province,China,No.tsqn202306343National Natural Science Foundation of China,No.82270578.
文摘BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.
基金supported by the National Key R&D Program of China under Grant No.2022YFB3103500the National Natural Science Foundation of China under Grants No.62402087 and No.62020106013+3 种基金the Sichuan Science and Technology Program under Grant No.2023ZYD0142the Chengdu Science and Technology Program under Grant No.2023-XT00-00002-GXthe Fundamental Research Funds for Chinese Central Universities under Grants No.ZYGX2020ZB027 and No.Y030232063003002the Postdoctoral Innovation Talents Support Program under Grant No.BX20230060.
文摘The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.Despite their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant challenges.These challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy leakage.Previous works often conflated safety issues with security concerns.In contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of LLMs.Building on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in LLMs.Additionally,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats.
文摘Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, software testing and analysis are two of the critical methods, which significantly benefit from the advancements in deep learning technologies. Due to the successful use of deep learning in software security, recently,researchers have explored the potential of using large language models(LLMs) in this area. In this paper, we systematically review the results focusing on LLMs in software security. We analyze the topics of fuzzing, unit test, program repair, bug reproduction, data-driven bug detection, and bug triage. We deconstruct these techniques into several stages and analyze how LLMs can be used in the stages. We also discuss the future directions of using LLMs in software security, including the future directions for the existing use of LLMs and extensions from conventional deep learning research.
文摘ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential security risks that need to be carefully evaluated and addressed. In this survey, we provide an overview of the current state of research on security of using ChatGPT, with aspects of bias, disinformation, ethics, misuse,attacks and privacy. We review and discuss the literature on these topics and highlight open research questions and future directions.Through this survey, we aim to contribute to the academic discourse on AI security, enriching the understanding of potential risks and mitigations. We anticipate that this survey will be valuable for various stakeholders involved in AI development and usage, including AI researchers, developers, policy makers, and end-users.
基金This work was supported by the National Key R&D Program of China(2023YFC2415200)National Natural Science Foundation of China(82361168664,82372053,82441018,U24A20759,62222609,62076236,32350010,82302407,82302296)+3 种基金Beijing Natural Science Foundation(JQ24048,7232346)Beijing Nova Program(20240484528)Science and Technology Development Fund of Macao Special Administrative Region(0006/2023/AFJ)China Postdoctoral Science Foundation(2022M720357).
文摘Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice.
文摘AIM:To investigate the capabilities of large language models(LLM)for providing information and diagnoses in the field of neuro-ophthalmology by comparing the performances of ChatGPT-3.5 and-4.0,Bard,and Bing.METHODS:Each chatbot was evaluated for four criteria,namely diagnostic success rate for the described case,answer quality,response speed,and critical keywords for diagnosis.The selected topics included optic neuritis,nonarteritic anterior ischemic optic neuropathy,and Leber hereditary optic neuropathy.RESULTS:In terms of diagnostic success rate for the described cases,Bard was unable to provide a diagnosis.The success rates for the described cases increased in the order of Bing,ChatGPT-3.5,and ChatGPT-4.0.Further,ChatGPT-4.0 and-3.5 provided the most satisfactory answer quality for judgment by neuro-ophthalmologists,with their sets of answers resembling the sample set most.Bard was only able to provide ten differential diagnoses in three trials.Bing scored the lowest for the satisfactory standard.A Mann-Whitney test indicated that Bard was significantly faster than ChatGPT-4.0(Z=-3.576,P=0.000),ChatGPT-3.5(Z=-3.576,P=0.000)and Bing(Z=-2.517,P=0.011).ChatGPT-3.5 and-4.0 far exceeded the other two interfaces at providing diagnoses and were thus used to find the critical keywords for diagnosis.CONCLUSION:ChatGPT-3.5 and-4.0 are better than Bard and Bing in terms of answer success rate,answer quality,and critical keywords for diagnosis in ophthalmology.This study has broad implications for the field of ophthalmology,providing further evidence that artificial intelligence LLM can aid clinical decision-making through free-text explanations.