期刊文献+
共找到34篇文章
< 1 2 >
每页显示 20 50 100
Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models 被引量:3
1
作者 Zheyi Chen Liuchang Xu +5 位作者 Hongting Zheng Luyao Chen Amr Tolba Liang Zhao Keping Yu Hailin Feng 《Computers, Materials & Continua》 SCIE EI 2024年第8期1753-1808,共56页
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ... Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field. 展开更多
关键词 Artificial intelligence large language models large multimodal models foundation models
在线阅读 下载PDF
Large multimodal models assist in psychiatry disorders prevention and diagnosis of students
2
作者 Xin-Qiao Liu Xin Wang Hui-Rui Zhang 《World Journal of Psychiatry》 SCIE 2024年第10期1415-1421,共7页
Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for... Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for us to explore new me-thods and approaches concerning the prevention and treatment of mental illne-sses.Large multimodal models(LMMs),as the most advanced artificial intelligen-ce models(i.e.ChatGPT-4),have brought new hope to the accurate prevention,diagnosis,and treatment of psychiatric disorders.The assistance of these models in the promotion of mental health is critical,as the latter necessitates a strong foundation of medical knowledge and professional skills,emotional support,stigma mitigation,the encouragement of more honest patient self-disclosure,reduced health care costs,improved medical efficiency,and greater mental health service coverage.However,these models must address challenges related to health,safety,hallucinations,and ethics simultaneously.In the future,we should address these challenges by developing relevant usage manuals,accountability rules,and legal regulations;implementing a human-centered approach;and intelligently upgrading LMMs through the deep optimization of such models,their algorithms,and other means.This effort will thus substantially contribute not only to the maintenance of students’health but also to the achievement of global sustainable development goals. 展开更多
关键词 Large multimodal models ChatGPT Psychiatric disorders Mental health STUDENT
暂未订购
Applications of Large Multimodal Models(LMMs)in STEM Education:From Visual Explanations to Virtual Experiments
3
作者 Changkui LI 《Artificial Intelligence Education Studies》 2025年第2期1-18,共18页
Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensiv... Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensive training datasets.The public release and rapid refinement of large language models(LLMs)like ChatGPT have accelerated the adoption of GAI across various medical specialties,offering new tools for education,clinical simulation,and research.Dermatology training,which heavily relies on visual pattern recognition and requires extensive exposure to diverse morphological presentations,faces persistent challenges such as uneven distribu-tion of educational resources,limited patient exposure for rare conditions,and variability in teaching quality.Exploring the integration of GAI into pedagogical frameworks offers innovative approaches to address these challenges,potentially enhancing the quality,standardization,scalability,and accessibility of dermatology ed-ucation.This comprehensive review examines the core concepts and technical foundations of GAI,highlights its specific applications within dermatology teaching and learning—including simulated case generation,per-sonalized learning pathways,and academic support—and discusses the current limitations,practical challenges,and ethical considerations surrounding its use.The aim is to provide a balanced perspective on the significant potential of GAI for transforming dermatology education and to offer evidence-based insights to guide future exploration,implementation,and policy development. 展开更多
关键词 Large multimodal models(LMMs) STEM Education Visual Explanations Virtual Laboratories/Virtual Experiments Critical AI Literacy
在线阅读 下载PDF
Rethinking Chart Understanding Using Multimodal Large Language Models
4
作者 Andreea-Maria Tanasa Simona-Vasilica Oprea 《Computers, Materials & Continua》 2025年第8期2905-2933,共29页
Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing... Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities. 展开更多
关键词 Chart understanding large language models multimodal models PDF extraction
在线阅读 下载PDF
The Synergy of Seeing and Saying: Revolutionary Advances in Multi-modality Medical Vision-Language Large Models
5
作者 Xiang LI Yu SUN +3 位作者 Jia LIN Like LI Ting FENG Shen YIN 《Artificial Intelligence Science and Engineering》 2025年第2期79-97,共19页
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si... The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given. 展开更多
关键词 large language models vision-language models medical health multimodality models
在线阅读 下载PDF
Railway-CLIP:A multimodal model for abnormal object detection in high-speed railway
6
作者 Jiayu Zhang Qingji Guan +2 位作者 Junbo Liu Yaping Huang Jianyong Guo 《High-Speed Railway》 2025年第3期194-204,共11页
Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical impo... Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical importance of this task, conventional vision-based foreign object detection methodologies have predominantly concentrated on image data, neglecting the exploration and integration of textual information. The currently popular multimodal model Contrastive Language-Image Pre-training (CLIP) employs contrastive learning to enable simultaneous understanding of both visual and textual modalities. Drawing inspiration from CLIP’s capabilities, this paper introduces a novel CLIP-based multimodal foreign object detection model tailored for railway applications, referred to as Railway-CLIP. This model leverages CLIP’s robust generalization capabilities to enhance performance in the context of catenary foreign object detection. The Railway-CLIP model is primarily composed of an image encoder and a text encoder. Initially, the Segment Anything Model (SAM) is employed to preprocess raw images, identifying candidate bounding boxes that may contain foreign objects. Both the original images and the detected candidate bounding boxes are subsequently fed into the image encoder to extract their respective visual features. In parallel, distinct prompt templates are crafted for both the original images and the candidate bounding boxes to serve as textual inputs. These prompts are then processed by the text encoder to derive textual features. The image and text encoders collaboratively project the multimodal features into a shared semantic space, facilitating the computation of similarity scores between visual and textual representations. The final detection results are determined based on these similarity scores, ensuring a robust and accurate identification of anomalous objects. Extensive experiments on our collected Railway Anomaly Dataset (RAD) demonstrate that the proposed Railway-CLIP outperforms previous state-of-the-art methods, achieving 97.25% AUROC and 92.66% F1-score, thereby validating the effectiveness and superiority of the proposed approach in real-world high-speed railway anomaly detection scenarios. 展开更多
关键词 High-speed railway catenary systems Anomalous object detection multimodal model Railway-CLIP
在线阅读 下载PDF
PKME-MLM:A Novel Multimodal Large Model for Sarcasm Detection
7
作者 Jian Luo Yaling Li +1 位作者 Xueyu Li Xuliang Hu 《Computers, Materials & Continua》 2025年第4期877-896,共20页
Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on se... Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on separate image and text modalities,which may not fully utilize the information available from both sources.To address this limitation,we propose a novel multimodal large model,i.e.,the PKME-MLM(Prior Knowledge and Multi-label Emotion analysis based Multimodal Large Model for sarcasm detection).The PKME-MLM aims to enhance sarcasm detection by integrating prior knowledge to extract useful textual information from images,which is then combined with text data for deeper analysis.This method improves the integration of image and text data,addressing the limitation of previous models that process these modalities separately.Additionally,we incorporate multi-label sentiment analysis,refining sentiment labels to improve sarcasm recognition accuracy.This design overcomes the limitations of prior models that treated sentiment classification as a single-label problem,thereby improving sarcasm recognition by distinguishing subtle emotional cues from the text.Experimental results demonstrate that our approach achieves significant performance improvements in multimodal sarcasm detection tasks,with an accuracy(Acc.)of 94.35%,and Macro-Average Precision and Recall reaching 93.92%and 94.21%,respectively.These results highlight the potential of multimodal models in improving sarcasm detection and suggest that further integration of modalities could advance future research.This work also paves the way for incorporating multimodal sentiment analysis into sarcasm detection. 展开更多
关键词 Sarcasm detection multimodal large model prior knowledge multi-label fusion
在线阅读 下载PDF
Foundation models:Insights and implications for gastrointestinal cancer
8
作者 Lei Shi Rui Huang +1 位作者 Li-Ling Zhao An-Jie Guo 《World Journal of Gastroenterology》 2025年第47期7-34,共28页
Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelli... Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies. 展开更多
关键词 Foundation models Gastrointestinal cancers Large language models Vision foundation models multimodal large language models
在线阅读 下载PDF
DeepGut:A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment
9
作者 Xiao-Han Wan Mei-Xia Liu +6 位作者 Yan Zhang Guan-Jun Kou Lei-Qi Xu Han Liu Xiao-Yun Yang Xiu-Li Zuo Yan-Qing Li 《World Journal of Gastroenterology》 2025年第31期92-100,共9页
BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and ... BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and imaging findings.Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information,resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.AIM To develop and evaluate a collaborative multimodal large language model(LLM)framework for clinical decision-making in digestive diseases.METHODS In this observational study,DeepGut,a multimodal LLM collaborative diagnostic framework,was developed to integrate four distinct large models into a four-tiered structure.The framework sequentially accomplishes multimodal infor-mation extraction,logical“chain”construction,diagnostic and treatment suggestion generation,and risk analysis.The model was evaluated using objective metrics,which assess the reliability and comprehensiveness of model-generated results,and subjective expert opinions,which examine the effectiveness of the framework in assisting physicians.RESULTS The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance,with a diagnostic accuracy of 97.8%,diagnostic completeness of 93.9%,treatment plan accuracy of 95.2%,and treatment plan completeness of 98.0%,significantly surpassing the capabilities of single-modal LLM-based diagnostic tools.Experts evaluating the framework commended the completeness,relevance,and logical coherence of its outputs.However,the collaborative multimodal LLM approach resulted in increased input and output token counts,leading to higher computational costs and extended diagnostic times.CONCLUSION The framework achieves successful integration of multimodal diagnostic data,demonstrating enhanced performance enabled by multimodal LLM collaboration,which opens new horizons for the clinical application of artificial intelligence-assisted technology. 展开更多
关键词 Gastrointestinal diseases Artificial intelligence-assisted diagnosis and treatment multimodal large language model Multiple large language model collaboration DeepGut
在线阅读 下载PDF
Large language models and large concept models in radiology:Present challenges,future directions,and critical perspectives
10
作者 Suleman A Merchant Neesha Merchant +1 位作者 Shaju L Varghese Mohd Javed S Shaikh 《World Journal of Radiology》 2025年第11期1-38,共38页
Large language models(LLMs)have emerged as transformative tools in radiology artificial intelligence(AI),offering significant capabilities in areas such as image report generation,clinical decision support,and workflo... Large language models(LLMs)have emerged as transformative tools in radiology artificial intelligence(AI),offering significant capabilities in areas such as image report generation,clinical decision support,and workflow optimization.The first part of this manuscript presents a comprehensive overview of the current state of LLM applications in radiology,including their historical evolution,technical foundations,and practical uses.Despite notable advances,inherent architectural constraints,such as token-level sequential processing,limit their ability to perform deep abstract reasoning and holistic contextual understanding,which are critical for fine-grained diagnostic interpretation.We provide a critical perspective on current LLMs and discuss key challenges,including model reliability,bias,and explainability,highlighting the pressing need for novel approaches to advance radiology AI.Large concept models(LCMs)represent a nascent and promising paradigm in radiology AI,designed to transcend the limitations of token-level processing by utilizing higher-order conceptual representations and multimodal data integration.The second part of this manuscript introduces the foundational principles and theoretical framework of LCMs,highlighting their potential to facilitate enhanced semantic reasoning,long-range context synthesis,and improved clinical decision-making.Critically,the core of this section is the proposal of a novel theoretical framework for LCMs,formalized and extended from our group’s foundational concept-based models-the world’s earliest articulation of this paradigm for medical AI.This conceptual shift has since been externally validated and propelled by the recent publication of the LCM architectural proposal by Meta AI,providing a large-scale engineering blueprint for the future development of this technology.We also outline future research directions and the transformative implications of this emerging AI paradigm for radiologic practice,aiming to provide a blueprint for advancing toward human-like conceptual understanding in AI.While challenges persist,we are at the very beginning of a new era,and it is not unreasonable to hope that future advancements will overcome these hurdles,pushing the boundaries of AI in Radiology,far beyond even the most state-of-the-art models of today. 展开更多
关键词 Radiology artificial intelligence Large language models Large concept models Medical imaging artificial intelligence Artificial intelligence in healthcare multimodal artificial intelligence models Explainable artificial intelligence Artificial intelligence model limitations and challenges Natural language processing in radiology Conceptual reasoning in artificial intelligence
在线阅读 下载PDF
Efficient User Identity Linkage Based on Aligned Multimodal Features and Temporal Correlation
11
作者 Jiaqi Gao Kangfeng Zheng +2 位作者 Xiujuan Wang Chunhua Wu Bin Wu 《Computers, Materials & Continua》 SCIE EI 2024年第10期251-270,共20页
User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore ... User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL. 展开更多
关键词 User identity linkage multimodal models attention mechanism temporal correlation
在线阅读 下载PDF
CAPGen: An MLLM-Based Framework Integrated with Iterative Optimization Mechanism for Cultural Artifacts Poster Generation
12
作者 Qianqian Hu Chuhan Li +1 位作者 Mohan Zhang Fang Liu 《Computers, Materials & Continua》 2026年第1期494-510,共17页
Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural ... Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural artifacts online.As an effective medium,posters serve to attract public attention and facilitate broader engagement with cultural artifacts.However,existing poster generation methods mainly rely on fixed templates and manual design,which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts.Therefore,we propose CAPGen,an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language Model(MLLM)with integrated iterative optimization.During our research,we collaborated with designers to define principles of graphic design for cultural artifact posters,to guide the MLLM in generating layout parameters.Later,we generated these parameters into posters.Finally,we refined the posters using an MLLM integrated with a multi-round iterative optimization mechanism.Qualitative results show that CAPGen consistently outperforms baseline methods in both visual quality and aesthetic performance.Furthermore,ablation studies indicate that the prompt,iterative optimization mechanism,and design principles significantly enhance the effectiveness of poster generation. 展开更多
关键词 Aesthetic poster generation prompt engineering multimodal large language models iterative optimization design principles
在线阅读 下载PDF
Research status and application of artificial intelligence large models in the oil and gas industry 被引量:3
13
作者 LIU He REN Yili +6 位作者 LI Xin DENG Yue WANG Yongtao CAO Qianwen DU Jinyang LIN Zhiwei WANG Wenjie 《Petroleum Exploration and Development》 SCIE 2024年第4期1049-1065,共17页
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode... This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology. 展开更多
关键词 foundation model large language mode visual large model multimodal large model large model of oil and gas industry pre-training fine-tuning
在线阅读 下载PDF
New era towards autonomous additive manufacturing:a review of recent trends and future perspectives
14
作者 Haolin Fan Chenshu Liu +10 位作者 Shijie Bian Changyu Ma Junlin Huang Xuan Liu Marshall Doyle Thomas Lu Edward Chow Lianyi Chen Jerry Ying Hsi Fuh Wen Feng Lu Bingbing Li 《International Journal of Extreme Manufacturing》 2025年第3期183-230,共48页
The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current i... The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current intelligent AM(IAM)systems face limitations such as fragmented AI tool usage and suboptimal human-machine interaction.This paper reviews existing IAM solutions,emphasizing control,monitoring,process autonomy,and end-to-end integration,and identifies key limitations,such as the absence of a high-level controller for global decision-making.To address these gaps,we propose a transition from IAM to autonomous AM,featuring a hierarchical framework with four integrated layers:knowledge,generative solution,operational,and cognitive.In the cognitive layer,AI agents notably enable machines to independently observe,analyze,plan,and execute operations that traditionally require human intervention.These capabilities streamline production processes and expand the possibilities for innovation,particularly in sectors like in-space manufacturing.Additionally,this paper discusses the role of AI in self-optimization and lifelong learning,positing that the future of AM will be characterized by a symbiotic relationship between human expertise and advanced autonomy,fostering a more adaptive,resilient manufacturing ecosystem. 展开更多
关键词 future manufacturing autonomous additive manufacturing artificial intelligence agent large multimodal models knowledge graphs
在线阅读 下载PDF
Exploring DeepSeek:A Survey on Advances,Applications,Challenges and Future Directions 被引量:6
15
作者 Zehang Deng Wanlun Ma +4 位作者 Qing-Long Han Wei Zhou Xiaogang Zhu Sheng Wen Yang Xiang 《IEEE/CAA Journal of Automatica Sinica》 2025年第5期872-893,共22页
The rapid advancement of large models has led to the development of increasingly sophisticated models capable of generating diverse,personalized,and high-quality content.Among these,DeepSeek has emerged as a pivotal o... The rapid advancement of large models has led to the development of increasingly sophisticated models capable of generating diverse,personalized,and high-quality content.Among these,DeepSeek has emerged as a pivotal open-source initiative,demonstrating high performance at significantly lower computation costs compared to closed-source counterparts.This survey provides a comprehensive overview of the DeepSeek family of models,including DeepSeek-V3 and DeepSeek-R1,covering their core innovations in architecture,system pipeline,algorithm,and infrastructure.We explore their practical applications across various domains,such as healthcare,finance,and education,highlighting their impact on both industry and society.Further-more,we examine potential security,privacy,and ethical concerns arising from the widespread deployment of these models,emphasizing the need for responsible AI development.Finally,we outline future research directions to enhance the performance,safety,and scalability of DeepSeek models,aiming to foster further advancements in the open-source large model community. 展开更多
关键词 DeepSeek large language model large multimodal model
在线阅读 下载PDF
Advancing general robotic manipulation with multimodal foundation models: An embodied AI paradigm 被引量:1
16
作者 Shifeng HUANG He WANG +3 位作者 Xing ZHOU Wenkai CHEN Haibin YANG Jianwei ZHANG 《Science China(Technological Sciences)》 2025年第5期290-292,共3页
Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in e... Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans. 展开更多
关键词 multimodal foundation models autonomous task execution robotic manipulationthe general robotic manipulation robotic programming language embodied AI operational barriers robotic technologies
原文传递
Medical multimodal large language models:A systematic review
17
作者 Yuan Hu Chenhan Xu +2 位作者 Bo Lin Weibin Yang Yuan Yan Tang 《Intelligent Oncology》 2025年第4期308-325,共18页
The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and ge... The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and genomics to enhance clinical decision-making.This systematic review explores the core methodologies and applied research frontiers of medical MLLMs,focusing on their architecture,training methods,evaluation techniques,and applications.We highlight the transformative potential of MLLMs in achieving cross-modal semantic alignment,medical knowledge integration,and robust clinical reasoning.Despite their promise,challenges such as data heterogeneity,hallucination,and computational efficiency persist.By reviewing state-of-the-art solutions and future directions,this paper provides a comprehensive technical guide for developing reliable and interpretable medical MLLMs,ultimately aiming to bridge the gap between AI and clinical practice. 展开更多
关键词 multimodal large language model HALLUCINATION Medical multimodal dataset Clinical evaluation
在线阅读 下载PDF
Federated Services:A Smart Service Ecology With Federated Security for Aligned Data Supply and Scenario-Oriented Demands
18
作者 Xiaofeng Jia Juanjuan Li +5 位作者 Shouwen Wang Hongwei Qi Fei-Yue Wang Rui Qin Min Zhang Xiaolong Liang 《IEEE/CAA Journal of Automatica Sinica》 2025年第5期925-936,共12页
This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comp... This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comprehensive researches on the theoretical foundation and technical system of federated services,aiming at advancing our understanding and implementation of this novel service paradigm.First,a thorough examination of the characteristics of federated security within federated services is conducted.Then,a five-layer technical framework is formulated under a decentralized intelligent architecture,ensuring secure,agile,and adaptable service provision.On this basis,the operational mechanisms underlying data federation and service confederation is analyzed,with emphasis on the smart supply-demand matching model.Furthermore,a scenario-oriented taxonomy of federated services accompanied by illustrative examples is proposed.Our work offers actionable insights and roadmap for realizing and advancing federated services,contributing to the refinement and wider adoption of this transformative service paradigm in the digital era. 展开更多
关键词 Decentralized autonomous organizations and operations decentralized physical infrastructure networks federated security federated services multimodal large language models smart contracts
在线阅读 下载PDF
Exploring the Taxonomy of Survey Papers on Large Language Models Using Classical Machine Learning
19
作者 Maqsudur Rahman Md.Shahjahan 《Journal of Intelligent Learning Systems and Applications》 2025年第2期68-76,共9页
The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph rep... The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph repre-sentation learning combined with classical machine learning techniques to model and interpret the structural evolution of LLM-related survey papers.By constructing attributed graphs that capture topic distributions and intercon-nections,we provide a data-driven framework to explore research trends in this domain.A dataset of 241 survey papers published between July 2021 and January 2024 is analyzed to identify thematic developments and interdiscipli-nary relationships.The results highlight key areas of specialization,including the emergence of prompting science,multimodal models,and domain-spe-cific applications in finance,education,and law.Co-occurrence analysis of survey topics reveals strong interconnections between core LLM research and fields such as software engineering,hardware architecture,and evaluation methodologies.These findings demonstrate the increasing specialization of LLM research and its growing integration across multiple disciplines.By lev-eraging graph-based methodologies,this study offers a structured approach to understanding the LLM survey landscape,facilitating efficient navigation of existing literature and identification of emerging research directions.The in-sights presented contribute to a more comprehensive understanding of the field’s trajectory,assisting researchers and practitioners in engaging with the latest developments in LLM research. 展开更多
关键词 Large Language models(LLMs) TAXONOMY multimodal models
在线阅读 下载PDF
Power allocation and mode selection methods for cooperative communication in the rectangular tunnel 被引量:2
20
作者 Zhai Wenyan Sun Yanjing +1 位作者 Xu Zhao Li Song 《International Journal of Mining Science and Technology》 SCIE EI CSCD 2015年第2期253-260,共8页
For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the ch... For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the channel capacity formula derivation was obtained.On the optimal criterion of the channel capacity,the power allocation methods of both amplifying and forwarding(AF) and decoding and forwarding(DF) cooperative communication systems were proposed in the limitation of the total power to maximize the channel capacity.The mode selection methods of single input single output(SISO) and single input multiple output(SIMO) models in the rectangular tunnel,through which the higher channel capacity can be obtained,were put forward as well.The theoretical analysis and simulation comparison show that,channel capacity of the wireless communication system in the rectangular tunnel can be effectively enhanced through the cooperative technology;channel capacity of the rectangular tunnel under complicated conditions is maximized through the proposed power allocation methods,and the optimal cooperative mode of the channel capacity can be chosen according to the cooperative mode selection methods given in the paper. 展开更多
关键词 Rectangular tunnel Multimode channel model Channel capacity Cooperative communication Power allocation Mode selection
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部