Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and langua...Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction.展开更多
As embodied intelligence(EI),large language models(LLMs),and cloud computing continue to advance,Industry5.0 facilitates the development of industrial artificial intelligence(Ind AI)through cyber-physical-social syste...As embodied intelligence(EI),large language models(LLMs),and cloud computing continue to advance,Industry5.0 facilitates the development of industrial artificial intelligence(Ind AI)through cyber-physical-social systems(CPSSs)with a human-centric focus.These technologies are organized by the system-wide approach of Industry 5.0,in order to empower the manufacturing industry to achieve broader societal goals of job creation,economic growth,and green production.This survey first provides a general framework of smart manufacturing in the context of Industry 5.0.Wherein,the embodied agents,like robots,sensors,and actuators,are the carriers for Ind AI,facilitating the development of the self-learning intelligence in individual entities,the collaborative intelligence in production lines and factories(smart systems),and the swarm intelligence within industrial clusters(systems of smart systems).Through the framework of CPSSs,the key technologies and their possible applications for supporting the single-agent,multi-agent and swarm-agent embodied Ind AI have been reviewed,such as the embodied perception,interaction,scheduling,multi-mode large language models,and collaborative training.Finally,to stimulate future research in this area,the open challenges and opportunities of applying Industry 5.0 to smart manufacturing are identified and discussed.The perspective of Industry 5.0-driven manufacturing industry aims to enhance operational productivity and efficiency by seamlessly integrating the virtual and physical worlds in a human-centered manner,thereby fostering an intelligent,sustainable,and resilient industrial landscape.展开更多
In recent years,Volunteered Geographic Information(VGI)has emerged as a crucial source of mapping data,contributed by users through crowdsourcing platforms such as OpenStreetMap.This paper presents a novel approach th...In recent years,Volunteered Geographic Information(VGI)has emerged as a crucial source of mapping data,contributed by users through crowdsourcing platforms such as OpenStreetMap.This paper presents a novel approach that Integrates Large Language Models(LLMs)into a fully automated mapping workflow,utilizing VGI data.The process leverages Prompt Engineering,which involves designing and optimizing input instructions to ensure the LLM produces desired mapping outputs.By constructing precise and detailed prompts,LLM agents are able to accurately interpret mapping requirements,and autonomously extract,analyze,and process VGI geospatial data.They dynamically interact with mapping tools to automate the entire mapping process—from data acquisition to map generation.This approach significantly streamlines the creation of high-quality mapping outputs,reducing the time and resources typically required for such tasks.Moreover,the system lowers the barrier for non-expert users,enabling them to generate accurate maps without extensive technical expertise.Through various case studies,we demonstrate the LLM application across different mapping scenarios,highlighting its potential to enhance the efficiency,accuracy,and accessibility of map production.The results suggest that LLM-powered mapping systems can not only optimize VGI data processing but also expand the usability of ubiquitous mapping across diverse fields,including urban planning and infrastructure development.展开更多
Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information ...Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need.展开更多
Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether ...Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether Large Language Models(LLMs)can play a role in this process.Design/methodology/approach:This article assesses which ChatGPT inputs(full text without tables,figures,and references;title and abstract;title only)produce better quality score estimates,and the extent to which scores are affected by ChatGPT models and system prompts.Findings:The optimal input is the article title and abstract,with average ChatGPT scores based on these(30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores,the highest ever reported.ChatGPT 4o is slightly better than 3.5-turbo(0.66),and 4o-mini(0.66).Research limitations:The data is a convenience sample of the work of a single author,it only includes one field,and the scores are self-evaluations.Practical implications:The results suggest that article full texts might confuse LLM research quality evaluations,even though complex system instructions for the task are more effective than simple ones.Thus,whilst abstracts contain insufficient information for a thorough assessment of rigour,they may contain strong pointers about originality and significance.Finally,linear regression can be used to convert the model scores into the human scale scores,which is 31%more accurate than guessing.Originality/value:This is the first systematic comparison of the impact of different prompts,parameters and inputs for ChatGPT research quality evaluations.展开更多
Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited ...Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited research on molecular sequence diffusion models.The International Union of Pure and Applied Chemistry(IUPAC)names are more akin to chemical natural language than the simplified molecular input line entry system(SMILES)for organic compounds.In this work,we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language(SMILES)and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language.We propose DiffIUPAC,a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings.Evaluation results demonstrate that our model out-performs existing methods and successfully captures the semantic rules of both chemical languages.Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints.Additionally,to illustrate the model’s applicability in drug design,we conducted case studies in functional group editing,analogue design and linker design.展开更多
Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the...Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences.展开更多
The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.De...The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.Despite their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant challenges.These challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy leakage.Previous works often conflated safety issues with security concerns.In contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of LLMs.Building on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in LLMs.Additionally,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats.展开更多
Recently,tool learning with large language models(LLMs)has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems.Despite growing attention and rapid advancements in ...Recently,tool learning with large language models(LLMs)has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems.Despite growing attention and rapid advancements in this field,the existing literature remains fragmented and lacks systematic organization,posing barriers to entry for newcomers.This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs.In this survey,we focus on reviewing existing literature from the two primary aspects(1)why tool learning is beneficial and(2)how tool learning is implemented,enabling a comprehensive understanding of tool learning with LLMs.We first explore the“why”by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects.In terms of“how”,we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow:task planning,tool selection,tool calling,and response generation.Additionally,we provide a detailed summary of existing benchmarks and evaluation methods,categorizing them according to their relevance to different stages.Finally,we discuss current challenges and outline potential future directions,aiming to inspire both researchers and industrial developers to further explore this emerging and promising area.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, ...Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, software testing and analysis are two of the critical methods, which significantly benefit from the advancements in deep learning technologies. Due to the successful use of deep learning in software security, recently,researchers have explored the potential of using large language models(LLMs) in this area. In this paper, we systematically review the results focusing on LLMs in software security. We analyze the topics of fuzzing, unit test, program repair, bug reproduction, data-driven bug detection, and bug triage. We deconstruct these techniques into several stages and analyze how LLMs can be used in the stages. We also discuss the future directions of using LLMs in software security, including the future directions for the existing use of LLMs and extensions from conventional deep learning research.展开更多
ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential sec...ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential security risks that need to be carefully evaluated and addressed. In this survey, we provide an overview of the current state of research on security of using ChatGPT, with aspects of bias, disinformation, ethics, misuse,attacks and privacy. We review and discuss the literature on these topics and highlight open research questions and future directions.Through this survey, we aim to contribute to the academic discourse on AI security, enriching the understanding of potential risks and mitigations. We anticipate that this survey will be valuable for various stakeholders involved in AI development and usage, including AI researchers, developers, policy makers, and end-users.展开更多
BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patie...BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.展开更多
Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different ...Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different models exhibit distinct strengths and preferences,resulting in varying levels of performance.In this paper,we compare the capabilities of the most advanced LLMs—DeepSeek,ChatGPT,and Claude—along with their reasoning-optimized versions in addressing computational challenges.Specifically,we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems.We designed all our experiments so that a nontrivial decision is required,e.g,defining the proper space of input functions for neural operator learning.Our findings show that reasoning and hybrid-reasoning models consistently and significantly outperform non-reasoning ones in solving challenging problems,with ChatGPT o3-mini-high generally offering the fastest reasoning speed.展开更多
CLIL,which stands for Content and Language Integrated Learning,is an instructional approach that gives ample curricular and pedagogical attention to content and language outcomes in multilingual educational settings.I...CLIL,which stands for Content and Language Integrated Learning,is an instructional approach that gives ample curricular and pedagogical attention to content and language outcomes in multilingual educational settings.Increasingly,it is heralded as a way to responsibly enact top-down English-Medium-of-Instruction(EMI)policies at the university level,where teachers and students are tasked with developing their English proficiency while remaining competitive in the international job market.However,teachers and teacher educators hoping to implement this approach in their science,technology,engineering and mathematics(STEM)content courses face significant challenges.This article serves as an introduction to a vip-edited special issue that reports on several aspects related to a project of international collaboration called Project SCILLA,an acronym for“STEM Content Integrated with Language-Learning Activities”.We first provide a brief overview of the project,which was developed and carried out in collaboration between Michigan State University and a consortium of 10 rural universities in Kazakhstan as a way to support STEM educators who wish to adapt their teaching practices to Kazakhstan’s Ministry of Education.We then offer an overview of the six articles that comprise the special issue,and call for deliberate and dialogic international collaboration as a way to support teachers responding to language policy demands.展开更多
二语写作是二语习得研究领域的重要组成部分。运用CiteSpace软件对近十年发表在Journal of Second Language Writing的231篇实证研究论文进行可视化分析,研究发现:二语写作研究整体呈波动性上升趋势,研究规模较为稳定,研究关注度逐渐提...二语写作是二语习得研究领域的重要组成部分。运用CiteSpace软件对近十年发表在Journal of Second Language Writing的231篇实证研究论文进行可视化分析,研究发现:二语写作研究整体呈波动性上升趋势,研究规模较为稳定,研究关注度逐渐提升;二语写作研究领域暂未形成明显的核心作者和机构的合作网络;研究主题主要聚焦二语写作教学方法的多元化、二语写作反馈的多焦点、二语写作评估与测试的科学化,以及学习者个体差异的多维影响等方面。基于此,提出未来该领域发展需加强学者、机构之间的相互合作;关注个体学习者写作过程的认知特征与情感因素,尤其重视青少年二语学习过程的研究;扩大二语写作纵向研究规模,推动研究的深入发展。展开更多
As new-generation intelligent technologies rapidly evolve,enhancing artificial intelligence(AI)education has become a global consensus,and improving AI literacy is a key focus in higher education.To address the lack o...As new-generation intelligent technologies rapidly evolve,enhancing artificial intelligence(AI)education has become a global consensus,and improving AI literacy is a key focus in higher education.To address the lack of relevant knowledge among non-computer science students,the complexity of the material,which leads to low interest and high difficulty in learning,this paper proposes a three-pronged teaching design model:“BOPPPS model+large language models(LLMs)+mind maps with 3w2h”.This model aims to assist teachers in designing practical teaching cases and engaging,interactive activities,and provides examples of its application to help teachers better teach AI and improve the AI literacy of non-computer science students.展开更多
Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in speci...Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.展开更多
The rapid advancement of Artificial Intelligence(AI)and Large Language Models(LLMs)has led to their increasing integration into various domains,from text generation and translation to question-answering.However,a crit...The rapid advancement of Artificial Intelligence(AI)and Large Language Models(LLMs)has led to their increasing integration into various domains,from text generation and translation to question-answering.However,a critical question remains:do these sophisticated models,much like humans,exhibit susceptibility to cognitive biases?Understanding the presence and nature of such biases in AI is paramount for assessing their reliability,enhancing their performance,and predicting their societal impact.This research specifically investigates the susceptibility of Google’s Gemini 1.5 Pro and DeepSeek,two prominent LLMs,to framing effects and confirmation bias.The study meticulously designed a series of experimental trials,systematically manipulating information proportions and presentation orders to evaluate these biases.In the framing effect experiment,a genetic testing decision-making scenario was constructed.The proportion of positive and negative information(e.g.,20%,50%,or 80%positive)and their presentation order were varied.The models’inclination towards undergoing genetic testing was recorded.For the confirmation bias experiment,two reports-one positive and one negative-about“RoboTaxi”autonomous vehicles were provided.The proportion of erroneous information within these reports(10%,30%,and 50%)and their presentation order were systematically altered,and the models’support for each report was assessed.The findings demonstrate that both Gemini 1.5 Pro and DeepSeek are susceptible to framing effects.In the genetic testing scenario,their decision-making was primarily influenced by the proportion of positive and negative information presented.When the proportion of positive information was higher,both models showed a greater inclination to recommend or proceed with genetic testing.Conversely,a higher proportion of negative information led to greater caution or a tendency not to recommend the testing.Importantly,the order in which this information was presented did not significantly influence their decisions in the framing effect scenarios.Regarding confirmation bias,the two models exhibited distinct behaviors.Gemini 1.5 Pro did not show an overall preference for either positive or negative reports.However,its judgments were significantly influenced by the order of information presentation,demonstrating a“recency effect,”meaning it tended to support the report presented later.The proportion of erroneous information within the reports had no significant impact on Gemini 1.5 Pro’s decisions.In contrast,DeepSeek exhibited an overall confirmation bias,showing a clear preference for positive reports.Similar to Gemini 1.5 Pro,DeepSeek’s decisions were also significantly affected by the order of information presentation,while the proportion of misinformation had no significant effect.These results reveal human-like cognitive vulnerabilities in advanced LLMs,highlighting critical challenges to their reliability and objectivity in decision-making processes.Gemini 1.5 Pro’s sensitivity to presentation order and DeepSeek’s general preference for positive information,coupled with its sensitivity to order,underscore the need for careful evaluation of potential cognitive biases during the development and application of AI.The study suggests that effective measures are necessary to mitigate these biases and prevent potential negative societal impacts.Future research should include a broader range of models for comparative analysis and explore more complex interactive scenarios to further understand and address these phenomena.The findings contribute significantly to understanding the limitations and capabilities of current AI systems,guiding their responsible development,and anticipating their potential societal implications.展开更多
基金supported by National Natural Science Foundation of China(62376219 and 62006194)Foundational Research Project in Specialized Discipline(Grant No.G2024WD0146)Faculty Construction Project(Grant No.24GH0201148).
文摘Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction.
基金supported by the National Key Research and Development Program of China(2021YFB1714300)the National Natural Science Foundation of China(62233005,U2441245,62173141)+3 种基金CNPC Innovation Found(2024DQ02-0507)Shanghai Natural Science(24ZR1416400)Shanghai Baiyu Lan Talent Program Pujiang Project(24PJD020)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)(B17017)
文摘As embodied intelligence(EI),large language models(LLMs),and cloud computing continue to advance,Industry5.0 facilitates the development of industrial artificial intelligence(Ind AI)through cyber-physical-social systems(CPSSs)with a human-centric focus.These technologies are organized by the system-wide approach of Industry 5.0,in order to empower the manufacturing industry to achieve broader societal goals of job creation,economic growth,and green production.This survey first provides a general framework of smart manufacturing in the context of Industry 5.0.Wherein,the embodied agents,like robots,sensors,and actuators,are the carriers for Ind AI,facilitating the development of the self-learning intelligence in individual entities,the collaborative intelligence in production lines and factories(smart systems),and the swarm intelligence within industrial clusters(systems of smart systems).Through the framework of CPSSs,the key technologies and their possible applications for supporting the single-agent,multi-agent and swarm-agent embodied Ind AI have been reviewed,such as the embodied perception,interaction,scheduling,multi-mode large language models,and collaborative training.Finally,to stimulate future research in this area,the open challenges and opportunities of applying Industry 5.0 to smart manufacturing are identified and discussed.The perspective of Industry 5.0-driven manufacturing industry aims to enhance operational productivity and efficiency by seamlessly integrating the virtual and physical worlds in a human-centered manner,thereby fostering an intelligent,sustainable,and resilient industrial landscape.
基金National Natural Science Foundation of china(No.42371446)Natural Science Foundatiorof Hubei Province(No.2024AFD412)Fundamental Research Funds for National Universities,China University of Geosciences(Wuhan)(No.2024XLA17).
文摘In recent years,Volunteered Geographic Information(VGI)has emerged as a crucial source of mapping data,contributed by users through crowdsourcing platforms such as OpenStreetMap.This paper presents a novel approach that Integrates Large Language Models(LLMs)into a fully automated mapping workflow,utilizing VGI data.The process leverages Prompt Engineering,which involves designing and optimizing input instructions to ensure the LLM produces desired mapping outputs.By constructing precise and detailed prompts,LLM agents are able to accurately interpret mapping requirements,and autonomously extract,analyze,and process VGI geospatial data.They dynamically interact with mapping tools to automate the entire mapping process—from data acquisition to map generation.This approach significantly streamlines the creation of high-quality mapping outputs,reducing the time and resources typically required for such tasks.Moreover,the system lowers the barrier for non-expert users,enabling them to generate accurate maps without extensive technical expertise.Through various case studies,we demonstrate the LLM application across different mapping scenarios,highlighting its potential to enhance the efficiency,accuracy,and accessibility of map production.The results suggest that LLM-powered mapping systems can not only optimize VGI data processing but also expand the usability of ubiquitous mapping across diverse fields,including urban planning and infrastructure development.
基金supported in part by the National Key Research and Development Program of China under Grant No.2024YFE0200600the Zhejiang Provincial Natural Science Foundation of China under Grant No.LR23F010005the Huawei Cooperation Project under Grant No.TC20240829036。
文摘Along with the proliferating research interest in semantic communication(Sem Com),joint source channel coding(JSCC)has dominated the attention due to the widely assumed existence in efficiently delivering information semantics.Nevertheless,this paper challenges the conventional JSCC paradigm and advocates for adopting separate source channel coding(SSCC)to enjoy a more underlying degree of freedom for optimization.We demonstrate that SSCC,after leveraging the strengths of the Large Language Model(LLM)for source coding and Error Correction Code Transformer(ECCT)complemented for channel coding,offers superior performance over JSCC.Our proposed framework also effectively highlights the compatibility challenges between Sem Com approaches and digital communication systems,particularly concerning the resource costs associated with the transmission of high-precision floating point numbers.Through comprehensive evaluations,we establish that assisted by LLM-based compression and ECCT-enhanced error correction,SSCC remains a viable and effective solution for modern communication systems.In other words,separate source channel coding is still what we need.
文摘Purpose:Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises,appointments and promotion.It is therefore important to investigate whether Large Language Models(LLMs)can play a role in this process.Design/methodology/approach:This article assesses which ChatGPT inputs(full text without tables,figures,and references;title and abstract;title only)produce better quality score estimates,and the extent to which scores are affected by ChatGPT models and system prompts.Findings:The optimal input is the article title and abstract,with average ChatGPT scores based on these(30 iterations on a dataset of 51 papers)correlating at 0.67 with human scores,the highest ever reported.ChatGPT 4o is slightly better than 3.5-turbo(0.66),and 4o-mini(0.66).Research limitations:The data is a convenience sample of the work of a single author,it only includes one field,and the scores are self-evaluations.Practical implications:The results suggest that article full texts might confuse LLM research quality evaluations,even though complex system instructions for the task are more effective than simple ones.Thus,whilst abstracts contain insufficient information for a thorough assessment of rigour,they may contain strong pointers about originality and significance.Finally,linear regression can be used to convert the model scores into the human scale scores,which is 31%more accurate than guessing.Originality/value:This is the first systematic comparison of the impact of different prompts,parameters and inputs for ChatGPT research quality evaluations.
基金supported by the Yonsei University graduate school Department of Integrative Biotechnology.
文摘Recently,diffusion models have emerged as a promising paradigm for molecular design and optimization.However,most diffusion-based molecular generative models focus on modeling 2D graphs or 3D geom-etries,with limited research on molecular sequence diffusion models.The International Union of Pure and Applied Chemistry(IUPAC)names are more akin to chemical natural language than the simplified molecular input line entry system(SMILES)for organic compounds.In this work,we apply an IUPAC-guided conditional diffusion model to facilitate molecular editing from chemical natural language to chemical language(SMILES)and explore whether the pre-trained generative performance of diffusion models can be transferred to chemical natural language.We propose DiffIUPAC,a controllable molecular editing diffusion model that converts IUPAC names to SMILES strings.Evaluation results demonstrate that our model out-performs existing methods and successfully captures the semantic rules of both chemical languages.Chemical space and scaffold analysis show that the model can generate similar compounds with diverse scaffolds within the specified constraints.Additionally,to illustrate the model’s applicability in drug design,we conducted case studies in functional group editing,analogue design and linker design.
基金supported in part by National Institute of Health(NIH),USA(Grant Nos.:R01GM126189,R01AI164266,and R35GM148196)the National Science Foundation,USA(Grant Nos.DMS2052983,DMS-1761320,and IIS-1900473)+3 种基金National Aero-nautics and Space Administration(NASA),USA(Grant No.:80NSSC21M0023)Michigan State University(MSU)Foundation,USA,Bristol-Myers Squibb(Grant No.:65109)USA,and Pfizer,USAsupported by the National Natural Science Foundation of China(Grant Nos.:11971367,12271416,and 11972266).
文摘Transformer models have emerged as pivotal tools within the realm of drug discovery,distinguished by their unique architectural features and exceptional performance in managing intricate data landscapes.Leveraging the innate capabilities of transformer architectures to comprehend intricate hierarchical dependencies inherent in sequential data,these models showcase remarkable efficacy across various tasks,including new drug design and drug target identification.The adaptability of pre-trained trans-former-based models renders them indispensable assets for driving data-centric advancements in drug discovery,chemistry,and biology,furnishing a robust framework that expedites innovation and dis-covery within these domains.Beyond their technical prowess,the success of transformer-based models in drug discovery,chemistry,and biology extends to their interdisciplinary potential,seamlessly combining biological,physical,chemical,and pharmacological insights to bridge gaps across diverse disciplines.This integrative approach not only enhances the depth and breadth of research endeavors but also fosters synergistic collaborations and exchange of ideas among disparate fields.In our review,we elucidate the myriad applications of transformers in drug discovery,as well as chemistry and biology,spanning from protein design and protein engineering,to molecular dynamics(MD),drug target iden-tification,transformer-enabled drug virtual screening(VS),drug lead optimization,drug addiction,small data set challenges,chemical and biological image analysis,chemical language understanding,and single cell data.Finally,we conclude the survey by deliberating on promising trends in transformer models within the context of drug discovery and other sciences.
基金supported by the National Key R&D Program of China under Grant No.2022YFB3103500the National Natural Science Foundation of China under Grants No.62402087 and No.62020106013+3 种基金the Sichuan Science and Technology Program under Grant No.2023ZYD0142the Chengdu Science and Technology Program under Grant No.2023-XT00-00002-GXthe Fundamental Research Funds for Chinese Central Universities under Grants No.ZYGX2020ZB027 and No.Y030232063003002the Postdoctoral Innovation Talents Support Program under Grant No.BX20230060.
文摘The integration of artificial intelligence(AI)technology,particularly large language models(LLMs),has become essential across various sectors due to their advanced language comprehension and generation capabilities.Despite their transformative impact in fields such as machine translation and intelligent dialogue systems,LLMs face significant challenges.These challenges include safety,security,and privacy concerns that undermine their trustworthiness and effectiveness,such as hallucinations,backdoor attacks,and privacy leakage.Previous works often conflated safety issues with security concerns.In contrast,our study provides clearer and more reasonable definitions for safety,security,and privacy within the context of LLMs.Building on these definitions,we provide a comprehensive overview of the vulnerabilities and defense mechanisms related to safety,security,and privacy in LLMs.Additionally,we explore the unique research challenges posed by LLMs and suggest potential avenues for future research,aiming to enhance the robustness and reliability of LLMs in the face of emerging threats.
基金funded by the National Key R&D Program of China(2023YFA1008704),the National Natural Science Foundation of China(Grant No.62377044)Beijing Key Laboratory of Big Data Management and Analysis Methods,Major Innovation&Planning Interdisciplinary Platform for the“Double-First Class”Initiative,funds for building world-class universities(disciplines)of Renmin University of China,and PCC@RUC.The authors would like to extend their sincere gratitude to Yankai Lin for his constructive feedback throughout the development of this work.
文摘Recently,tool learning with large language models(LLMs)has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems.Despite growing attention and rapid advancements in this field,the existing literature remains fragmented and lacks systematic organization,posing barriers to entry for newcomers.This gap motivates us to conduct a comprehensive survey of existing works on tool learning with LLMs.In this survey,we focus on reviewing existing literature from the two primary aspects(1)why tool learning is beneficial and(2)how tool learning is implemented,enabling a comprehensive understanding of tool learning with LLMs.We first explore the“why”by reviewing both the benefits of tool integration and the inherent benefits of the tool learning paradigm from six specific aspects.In terms of“how”,we systematically review the literature according to a taxonomy of four key stages in the tool learning workflow:task planning,tool selection,tool calling,and response generation.Additionally,we provide a detailed summary of existing benchmarks and evaluation methods,categorizing them according to their relevance to different stages.Finally,we discuss current challenges and outline potential future directions,aiming to inspire both researchers and industrial developers to further explore this emerging and promising area.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
文摘Software security poses substantial risks to our society because software has become part of our life. Numerous techniques have been proposed to resolve or mitigate the impact of software security issues. Among them, software testing and analysis are two of the critical methods, which significantly benefit from the advancements in deep learning technologies. Due to the successful use of deep learning in software security, recently,researchers have explored the potential of using large language models(LLMs) in this area. In this paper, we systematically review the results focusing on LLMs in software security. We analyze the topics of fuzzing, unit test, program repair, bug reproduction, data-driven bug detection, and bug triage. We deconstruct these techniques into several stages and analyze how LLMs can be used in the stages. We also discuss the future directions of using LLMs in software security, including the future directions for the existing use of LLMs and extensions from conventional deep learning research.
文摘ChatGPT is a powerful artificial intelligence(AI)language model that has demonstrated significant improvements in various natural language processing(NLP) tasks. However, like any technology, it presents potential security risks that need to be carefully evaluated and addressed. In this survey, we provide an overview of the current state of research on security of using ChatGPT, with aspects of bias, disinformation, ethics, misuse,attacks and privacy. We review and discuss the literature on these topics and highlight open research questions and future directions.Through this survey, we aim to contribute to the academic discourse on AI security, enriching the understanding of potential risks and mitigations. We anticipate that this survey will be valuable for various stakeholders involved in AI development and usage, including AI researchers, developers, policy makers, and end-users.
基金Supported by the China Health Promotion Foundation Young Doctors'Research Foundation for Inflammatory Bowel Disease,the Taishan Scholars Program of Shandong Province,China,No.tsqn202306343National Natural Science Foundation of China,No.82270578.
文摘BACKGROUND Inflammatory bowel disease(IBD)is a global health burden that affects millions of individuals worldwide,necessitating extensive patient education.Large language models(LLMs)hold promise for addressing patient information needs.However,LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.AIM To assess the utility of three LLMs(ChatGPT-4.0,Claude-3-Opus,and Gemini-1.5-Pro)as a reference point for patients with IBD.METHODS In this comparative study,two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns.These questions were used to evaluate the performance of the three LLMs.The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy,comprehensibility,and correlation.Simultaneously,three patients were invited to evaluate the comprehensibility of their answers.Finally,a readability assessment was performed.RESULTS Overall,each of the LLMs achieved satisfactory levels of accuracy,comprehensibility,and completeness when answering IBD-related questions,although their performance varies.All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods.Nevertheless,when dealing with more complex medical advice,such as medication side effects,dietary adjustments,and complication risks,the quality of answers was inconsistent between the LLMs.Notably,Claude-3-Opus generated answers with better readability than the other two models.CONCLUSION LLMs have the potential as educational tools for patients with IBD;however,there are discrepancies between the models.Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.
基金supported by the ONR Vannevar Bush Faculty Fellowship(Grant No.N00014-22-1-2795).
文摘Large language models(LLMs)have emerged as powerful tools for addressing a wide range of problems,including those in scientific computing,particularly in solving partial differential equations(PDEs).However,different models exhibit distinct strengths and preferences,resulting in varying levels of performance.In this paper,we compare the capabilities of the most advanced LLMs—DeepSeek,ChatGPT,and Claude—along with their reasoning-optimized versions in addressing computational challenges.Specifically,we evaluate their proficiency in solving traditional numerical problems in scientific computing as well as leveraging scientific machine learning techniques for PDE-based problems.We designed all our experiments so that a nontrivial decision is required,e.g,defining the proper space of input functions for neural operator learning.Our findings show that reasoning and hybrid-reasoning models consistently and significantly outperform non-reasoning ones in solving challenging problems,with ChatGPT o3-mini-high generally offering the fastest reasoning speed.
基金funding from the U.S.-Kazakhstan University Partnerships program funded by the U.S.Mission to Kazakhstan and administered by American Councils[Award number SKZ100-19-CA-0149].
文摘CLIL,which stands for Content and Language Integrated Learning,is an instructional approach that gives ample curricular and pedagogical attention to content and language outcomes in multilingual educational settings.Increasingly,it is heralded as a way to responsibly enact top-down English-Medium-of-Instruction(EMI)policies at the university level,where teachers and students are tasked with developing their English proficiency while remaining competitive in the international job market.However,teachers and teacher educators hoping to implement this approach in their science,technology,engineering and mathematics(STEM)content courses face significant challenges.This article serves as an introduction to a vip-edited special issue that reports on several aspects related to a project of international collaboration called Project SCILLA,an acronym for“STEM Content Integrated with Language-Learning Activities”.We first provide a brief overview of the project,which was developed and carried out in collaboration between Michigan State University and a consortium of 10 rural universities in Kazakhstan as a way to support STEM educators who wish to adapt their teaching practices to Kazakhstan’s Ministry of Education.We then offer an overview of the six articles that comprise the special issue,and call for deliberate and dialogic international collaboration as a way to support teachers responding to language policy demands.
文摘二语写作是二语习得研究领域的重要组成部分。运用CiteSpace软件对近十年发表在Journal of Second Language Writing的231篇实证研究论文进行可视化分析,研究发现:二语写作研究整体呈波动性上升趋势,研究规模较为稳定,研究关注度逐渐提升;二语写作研究领域暂未形成明显的核心作者和机构的合作网络;研究主题主要聚焦二语写作教学方法的多元化、二语写作反馈的多焦点、二语写作评估与测试的科学化,以及学习者个体差异的多维影响等方面。基于此,提出未来该领域发展需加强学者、机构之间的相互合作;关注个体学习者写作过程的认知特征与情感因素,尤其重视青少年二语学习过程的研究;扩大二语写作纵向研究规模,推动研究的深入发展。
文摘As new-generation intelligent technologies rapidly evolve,enhancing artificial intelligence(AI)education has become a global consensus,and improving AI literacy is a key focus in higher education.To address the lack of relevant knowledge among non-computer science students,the complexity of the material,which leads to low interest and high difficulty in learning,this paper proposes a three-pronged teaching design model:“BOPPPS model+large language models(LLMs)+mind maps with 3w2h”.This model aims to assist teachers in designing practical teaching cases and engaging,interactive activities,and provides examples of its application to help teachers better teach AI and improve the AI literacy of non-computer science students.
基金supported by the National Key R&D Program of China(No.2021YFB0301200)National Natural Science Foundation of China(No.62025208).
文摘Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning paradigm.While this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader deployment.Parameter-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational efficiency.Among these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA variants.Despite these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains underexplored.This study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM architectures.Our investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do so.These insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized LLMs.Moreover,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller one.This study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM fine-tuning in resource-constrained environments.
文摘The rapid advancement of Artificial Intelligence(AI)and Large Language Models(LLMs)has led to their increasing integration into various domains,from text generation and translation to question-answering.However,a critical question remains:do these sophisticated models,much like humans,exhibit susceptibility to cognitive biases?Understanding the presence and nature of such biases in AI is paramount for assessing their reliability,enhancing their performance,and predicting their societal impact.This research specifically investigates the susceptibility of Google’s Gemini 1.5 Pro and DeepSeek,two prominent LLMs,to framing effects and confirmation bias.The study meticulously designed a series of experimental trials,systematically manipulating information proportions and presentation orders to evaluate these biases.In the framing effect experiment,a genetic testing decision-making scenario was constructed.The proportion of positive and negative information(e.g.,20%,50%,or 80%positive)and their presentation order were varied.The models’inclination towards undergoing genetic testing was recorded.For the confirmation bias experiment,two reports-one positive and one negative-about“RoboTaxi”autonomous vehicles were provided.The proportion of erroneous information within these reports(10%,30%,and 50%)and their presentation order were systematically altered,and the models’support for each report was assessed.The findings demonstrate that both Gemini 1.5 Pro and DeepSeek are susceptible to framing effects.In the genetic testing scenario,their decision-making was primarily influenced by the proportion of positive and negative information presented.When the proportion of positive information was higher,both models showed a greater inclination to recommend or proceed with genetic testing.Conversely,a higher proportion of negative information led to greater caution or a tendency not to recommend the testing.Importantly,the order in which this information was presented did not significantly influence their decisions in the framing effect scenarios.Regarding confirmation bias,the two models exhibited distinct behaviors.Gemini 1.5 Pro did not show an overall preference for either positive or negative reports.However,its judgments were significantly influenced by the order of information presentation,demonstrating a“recency effect,”meaning it tended to support the report presented later.The proportion of erroneous information within the reports had no significant impact on Gemini 1.5 Pro’s decisions.In contrast,DeepSeek exhibited an overall confirmation bias,showing a clear preference for positive reports.Similar to Gemini 1.5 Pro,DeepSeek’s decisions were also significantly affected by the order of information presentation,while the proportion of misinformation had no significant effect.These results reveal human-like cognitive vulnerabilities in advanced LLMs,highlighting critical challenges to their reliability and objectivity in decision-making processes.Gemini 1.5 Pro’s sensitivity to presentation order and DeepSeek’s general preference for positive information,coupled with its sensitivity to order,underscore the need for careful evaluation of potential cognitive biases during the development and application of AI.The study suggests that effective measures are necessary to mitigate these biases and prevent potential negative societal impacts.Future research should include a broader range of models for comparative analysis and explore more complex interactive scenarios to further understand and address these phenomena.The findings contribute significantly to understanding the limitations and capabilities of current AI systems,guiding their responsible development,and anticipating their potential societal implications.