期刊文献+
共找到26篇文章
< 1 2 >
每页显示 20 50 100
Railway-CLIP:A multimodal model for abnormal object detection in high-speed railway
1
作者 Jiayu Zhang Qingji Guan +2 位作者 Junbo Liu Yaping Huang Jianyong Guo 《High-Speed Railway》 2025年第3期194-204,共11页
Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical impo... Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical importance of this task, conventional vision-based foreign object detection methodologies have predominantly concentrated on image data, neglecting the exploration and integration of textual information. The currently popular multimodal model Contrastive Language-Image Pre-training (CLIP) employs contrastive learning to enable simultaneous understanding of both visual and textual modalities. Drawing inspiration from CLIP’s capabilities, this paper introduces a novel CLIP-based multimodal foreign object detection model tailored for railway applications, referred to as Railway-CLIP. This model leverages CLIP’s robust generalization capabilities to enhance performance in the context of catenary foreign object detection. The Railway-CLIP model is primarily composed of an image encoder and a text encoder. Initially, the Segment Anything Model (SAM) is employed to preprocess raw images, identifying candidate bounding boxes that may contain foreign objects. Both the original images and the detected candidate bounding boxes are subsequently fed into the image encoder to extract their respective visual features. In parallel, distinct prompt templates are crafted for both the original images and the candidate bounding boxes to serve as textual inputs. These prompts are then processed by the text encoder to derive textual features. The image and text encoders collaboratively project the multimodal features into a shared semantic space, facilitating the computation of similarity scores between visual and textual representations. The final detection results are determined based on these similarity scores, ensuring a robust and accurate identification of anomalous objects. Extensive experiments on our collected Railway Anomaly Dataset (RAD) demonstrate that the proposed Railway-CLIP outperforms previous state-of-the-art methods, achieving 97.25% AUROC and 92.66% F1-score, thereby validating the effectiveness and superiority of the proposed approach in real-world high-speed railway anomaly detection scenarios. 展开更多
关键词 High-speed railway catenary systems Anomalous object detection multimodal model Railway-CLIP
在线阅读 下载PDF
Evolution and Prospects of Foundation Models: From Large Language Models to Large Multimodal Models 被引量:3
2
作者 Zheyi Chen Liuchang Xu +5 位作者 Hongting Zheng Luyao Chen Amr Tolba Liang Zhao Keping Yu Hailin Feng 《Computers, Materials & Continua》 SCIE EI 2024年第8期1753-1808,共56页
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ... Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field. 展开更多
关键词 Artificial intelligence large language models large multimodal models foundation models
在线阅读 下载PDF
Large multimodal models assist in psychiatry disorders prevention and diagnosis of students
3
作者 Xin-Qiao Liu Xin Wang Hui-Rui Zhang 《World Journal of Psychiatry》 SCIE 2024年第10期1415-1421,共7页
Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for... Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for us to explore new me-thods and approaches concerning the prevention and treatment of mental illne-sses.Large multimodal models(LMMs),as the most advanced artificial intelligen-ce models(i.e.ChatGPT-4),have brought new hope to the accurate prevention,diagnosis,and treatment of psychiatric disorders.The assistance of these models in the promotion of mental health is critical,as the latter necessitates a strong foundation of medical knowledge and professional skills,emotional support,stigma mitigation,the encouragement of more honest patient self-disclosure,reduced health care costs,improved medical efficiency,and greater mental health service coverage.However,these models must address challenges related to health,safety,hallucinations,and ethics simultaneously.In the future,we should address these challenges by developing relevant usage manuals,accountability rules,and legal regulations;implementing a human-centered approach;and intelligently upgrading LMMs through the deep optimization of such models,their algorithms,and other means.This effort will thus substantially contribute not only to the maintenance of students’health but also to the achievement of global sustainable development goals. 展开更多
关键词 Large multimodal models ChatGPT Psychiatric disorders Mental health STUDENT
暂未订购
Move to See More:Approaching Object With Partial Occlusion Using Large Multimodal Model and Active Object Detection
4
作者 Aoqi Wang Guohui Tian +1 位作者 Yuhao Wang Zhongyang Li 《IET Cyber-Systems and Robotics》 2025年第1期43-55,共13页
Active object detection(AOD)is a crucial task in the field of robotics.A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion,which leads to the fail... Active object detection(AOD)is a crucial task in the field of robotics.A key challenge in household environments for AOD is that the target object is often undetectable due to partial occlusion,which leads to the failure of traditional methods.To address the occlusion problem,this paper first proposes a novel occlusion handling method based on the large multimodal model(LMM).The method utilises an LMM to detect and analyse input RGB images and generates adjustment actions to progressively eliminate occlusion.After the occlusion is handled,an improved AOD method based on a deep Q-learning network(DQN)is used to complete the task.We introduce an attention mechanism to process image features,enabling the model to focus on critical regions of the input images.Additionally,a new reward function is proposed that comprehensively considers the bounding box of the target object and the robot's distance to the object,along with the actions performed by the robot.Ex-periments on the dataset and in real-world scenarios validate the effectiveness of the proposed method in performing AOD tasks under partial occlusion. 展开更多
关键词 active object detection large multimodal model reinforcement learning robots
原文传递
Rethinking Chart Understanding Using Multimodal Large Language Models
5
作者 Andreea-Maria Tanasa Simona-Vasilica Oprea 《Computers, Materials & Continua》 2025年第8期2905-2933,共29页
Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing... Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities. 展开更多
关键词 Chart understanding large language models multimodal models PDF extraction
在线阅读 下载PDF
Applications of Large Multimodal Models(LMMs)in STEM Education:From Visual Explanations to Virtual Experiments
6
作者 Changkui LI 《Artificial Intelligence Education Studies》 2025年第2期1-18,共18页
Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensiv... Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensive training datasets.The public release and rapid refinement of large language models(LLMs)like ChatGPT have accelerated the adoption of GAI across various medical specialties,offering new tools for education,clinical simulation,and research.Dermatology training,which heavily relies on visual pattern recognition and requires extensive exposure to diverse morphological presentations,faces persistent challenges such as uneven distribu-tion of educational resources,limited patient exposure for rare conditions,and variability in teaching quality.Exploring the integration of GAI into pedagogical frameworks offers innovative approaches to address these challenges,potentially enhancing the quality,standardization,scalability,and accessibility of dermatology ed-ucation.This comprehensive review examines the core concepts and technical foundations of GAI,highlights its specific applications within dermatology teaching and learning—including simulated case generation,per-sonalized learning pathways,and academic support—and discusses the current limitations,practical challenges,and ethical considerations surrounding its use.The aim is to provide a balanced perspective on the significant potential of GAI for transforming dermatology education and to offer evidence-based insights to guide future exploration,implementation,and policy development. 展开更多
关键词 Large multimodal models(LMMs) STEM Education Visual Explanations Virtual Laboratories/Virtual Experiments Critical AI Literacy
在线阅读 下载PDF
PKME-MLM:A Novel Multimodal Large Model for Sarcasm Detection
7
作者 Jian Luo Yaling Li +1 位作者 Xueyu Li Xuliang Hu 《Computers, Materials & Continua》 2025年第4期877-896,共20页
Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on se... Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on separate image and text modalities,which may not fully utilize the information available from both sources.To address this limitation,we propose a novel multimodal large model,i.e.,the PKME-MLM(Prior Knowledge and Multi-label Emotion analysis based Multimodal Large Model for sarcasm detection).The PKME-MLM aims to enhance sarcasm detection by integrating prior knowledge to extract useful textual information from images,which is then combined with text data for deeper analysis.This method improves the integration of image and text data,addressing the limitation of previous models that process these modalities separately.Additionally,we incorporate multi-label sentiment analysis,refining sentiment labels to improve sarcasm recognition accuracy.This design overcomes the limitations of prior models that treated sentiment classification as a single-label problem,thereby improving sarcasm recognition by distinguishing subtle emotional cues from the text.Experimental results demonstrate that our approach achieves significant performance improvements in multimodal sarcasm detection tasks,with an accuracy(Acc.)of 94.35%,and Macro-Average Precision and Recall reaching 93.92%and 94.21%,respectively.These results highlight the potential of multimodal models in improving sarcasm detection and suggest that further integration of modalities could advance future research.This work also paves the way for incorporating multimodal sentiment analysis into sarcasm detection. 展开更多
关键词 Sarcasm detection multimodal large model prior knowledge multi-label fusion
在线阅读 下载PDF
DeepGut:A collaborative multimodal large language model framework for digestive disease assisted diagnosis and treatment
8
作者 Xiao-Han Wan Mei-Xia Liu +6 位作者 Yan Zhang Guan-Jun Kou Lei-Qi Xu Han Liu Xiao-Yun Yang Xiu-Li Zuo Yan-Qing Li 《World Journal of Gastroenterology》 2025年第31期92-100,共9页
BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and ... BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and imaging findings.Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information,resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.AIM To develop and evaluate a collaborative multimodal large language model(LLM)framework for clinical decision-making in digestive diseases.METHODS In this observational study,DeepGut,a multimodal LLM collaborative diagnostic framework,was developed to integrate four distinct large models into a four-tiered structure.The framework sequentially accomplishes multimodal infor-mation extraction,logical“chain”construction,diagnostic and treatment suggestion generation,and risk analysis.The model was evaluated using objective metrics,which assess the reliability and comprehensiveness of model-generated results,and subjective expert opinions,which examine the effectiveness of the framework in assisting physicians.RESULTS The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance,with a diagnostic accuracy of 97.8%,diagnostic completeness of 93.9%,treatment plan accuracy of 95.2%,and treatment plan completeness of 98.0%,significantly surpassing the capabilities of single-modal LLM-based diagnostic tools.Experts evaluating the framework commended the completeness,relevance,and logical coherence of its outputs.However,the collaborative multimodal LLM approach resulted in increased input and output token counts,leading to higher computational costs and extended diagnostic times.CONCLUSION The framework achieves successful integration of multimodal diagnostic data,demonstrating enhanced performance enabled by multimodal LLM collaboration,which opens new horizons for the clinical application of artificial intelligence-assisted technology. 展开更多
关键词 Gastrointestinal diseases Artificial intelligence-assisted diagnosis and treatment multimodal large language model Multiple large language model collaboration DeepGut
在线阅读 下载PDF
The Synergy of Seeing and Saying: Revolutionary Advances in Multi-modality Medical Vision-Language Large Models
9
作者 Xiang LI Yu SUN +3 位作者 Jia LIN Like LI Ting FENG Shen YIN 《Artificial Intelligence Science and Engineering》 2025年第2期79-97,共19页
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si... The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given. 展开更多
关键词 large language models vision-language models medical health multimodality models
在线阅读 下载PDF
Efficient User Identity Linkage Based on Aligned Multimodal Features and Temporal Correlation
10
作者 Jiaqi Gao Kangfeng Zheng +2 位作者 Xiujuan Wang Chunhua Wu Bin Wu 《Computers, Materials & Continua》 SCIE EI 2024年第10期251-270,共20页
User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore ... User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL. 展开更多
关键词 User identity linkage multimodal models attention mechanism temporal correlation
在线阅读 下载PDF
Research status and application of artificial intelligence large models in the oil and gas industry 被引量:2
11
作者 LIU He REN Yili +6 位作者 LI Xin DENG Yue WANG Yongtao CAO Qianwen DU Jinyang LIN Zhiwei WANG Wenjie 《Petroleum Exploration and Development》 SCIE 2024年第4期1049-1065,共17页
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode... This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology. 展开更多
关键词 foundation model large language mode visual large model multimodal large model large model of oil and gas industry pre-training fine-tuning
在线阅读 下载PDF
New era towards autonomous additive manufacturing:a review of recent trends and future perspectives
12
作者 Haolin Fan Chenshu Liu +10 位作者 Shijie Bian Changyu Ma Junlin Huang Xuan Liu Marshall Doyle Thomas Lu Edward Chow Lianyi Chen Jerry Ying Hsi Fuh Wen Feng Lu Bingbing Li 《International Journal of Extreme Manufacturing》 2025年第3期183-230,共48页
The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current i... The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current intelligent AM(IAM)systems face limitations such as fragmented AI tool usage and suboptimal human-machine interaction.This paper reviews existing IAM solutions,emphasizing control,monitoring,process autonomy,and end-to-end integration,and identifies key limitations,such as the absence of a high-level controller for global decision-making.To address these gaps,we propose a transition from IAM to autonomous AM,featuring a hierarchical framework with four integrated layers:knowledge,generative solution,operational,and cognitive.In the cognitive layer,AI agents notably enable machines to independently observe,analyze,plan,and execute operations that traditionally require human intervention.These capabilities streamline production processes and expand the possibilities for innovation,particularly in sectors like in-space manufacturing.Additionally,this paper discusses the role of AI in self-optimization and lifelong learning,positing that the future of AM will be characterized by a symbiotic relationship between human expertise and advanced autonomy,fostering a more adaptive,resilient manufacturing ecosystem. 展开更多
关键词 future manufacturing autonomous additive manufacturing artificial intelligence agent large multimodal models knowledge graphs
在线阅读 下载PDF
Shared-weight multimodal translation model for recognizing Chinese variant characters
13
作者 Yuankang SUN Bing LI +2 位作者 Lexiang LI Peng YANG Dongmei YANG 《Frontiers of Information Technology & Electronic Engineering》 2025年第7期1066-1082,共17页
The task of recognizing Chinese variant characters aims to address the challenges of semantic ambiguity and confusion,which potentially cause risks to the security of Web content and complicate the governance of sensi... The task of recognizing Chinese variant characters aims to address the challenges of semantic ambiguity and confusion,which potentially cause risks to the security of Web content and complicate the governance of sensitive words.Most existing approaches predominantly prioritize the acquisition of contextual knowledge from Chinese corpora and vocabularies during pretraining,often overlooking the inherent phonological and morphological characteristics of the Chinese language.To address these issues,we propose a shared-weight multimodal translation model(SMTM)based on multimodal information of Chinese characters,which integrates the phonology of Pinyin and the morphology of fonts into each Chinese character token to learn the deeper semantics of variant text.Specifically,we encode the Pinyin features of Chinese characters using the embedding layer,and the font features of Chinese characters are extracted based on convolutional neural networks directly.Considering the multimodal similarity between the source and target sentences of the Chinese variant-character-recognition task,we design the shared-weight embedding mechanism to generate target sentences using the heuristic information from the source sentences in the training process.The simulation results show that our proposed SMTM achieves remarkable performance of 89.550%and 79.480%on bilingual evaluation understudy(BLEU)and F1 metrics respectively,with significant improvement compared with state-of-the-art baseline models. 展开更多
关键词 Chinese variant characters multimodal model Translation model Phonology and morphology
原文传递
Advancing general robotic manipulation with multimodal foundation models: An embodied AI paradigm 被引量:1
14
作者 Shifeng HUANG He WANG +3 位作者 Xing ZHOU Wenkai CHEN Haibin YANG Jianwei ZHANG 《Science China(Technological Sciences)》 2025年第5期290-292,共3页
Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in e... Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans. 展开更多
关键词 multimodal foundation models autonomous task execution robotic manipulationthe general robotic manipulation robotic programming language embodied AI operational barriers robotic technologies
原文传递
Exploring the Taxonomy of Survey Papers on Large Language Models Using Classical Machine Learning
15
作者 Maqsudur Rahman Md.Shahjahan 《Journal of Intelligent Learning Systems and Applications》 2025年第2期68-76,共9页
The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph rep... The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph repre-sentation learning combined with classical machine learning techniques to model and interpret the structural evolution of LLM-related survey papers.By constructing attributed graphs that capture topic distributions and intercon-nections,we provide a data-driven framework to explore research trends in this domain.A dataset of 241 survey papers published between July 2021 and January 2024 is analyzed to identify thematic developments and interdiscipli-nary relationships.The results highlight key areas of specialization,including the emergence of prompting science,multimodal models,and domain-spe-cific applications in finance,education,and law.Co-occurrence analysis of survey topics reveals strong interconnections between core LLM research and fields such as software engineering,hardware architecture,and evaluation methodologies.These findings demonstrate the increasing specialization of LLM research and its growing integration across multiple disciplines.By lev-eraging graph-based methodologies,this study offers a structured approach to understanding the LLM survey landscape,facilitating efficient navigation of existing literature and identification of emerging research directions.The in-sights presented contribute to a more comprehensive understanding of the field’s trajectory,assisting researchers and practitioners in engaging with the latest developments in LLM research. 展开更多
关键词 Large Language models(LLMs) TAXONOMY multimodal models
在线阅读 下载PDF
Current Trends and Future Prospects of Large-Scale Foundation Model in K-12 Education 被引量:1
16
作者 Qiannan Zhu Mei Wang +1 位作者 Ting Zhang Hua Huang 《Frontiers of Digital Education》 2025年第2期9-31,共23页
The rapid advancement of artificial intelligence has significantly impacted education,with largescalefoundatiomn odels(LFMs)emerging as transformative tools.While LFMs have demonstrated exceptional performance across ... The rapid advancement of artificial intelligence has significantly impacted education,with largescalefoundatiomn odels(LFMs)emerging as transformative tools.While LFMs have demonstrated exceptional performance across diverse domains,their integration into K-12 education remains in its early stages,requiring alignment with pedagogical principles,cognitive development,and curriculum standards.This paper provides a comprehensive technological review of LFM applications in K-12 education,examining current workflows,challenges,and future opportunities.We explore how LFMs facilitate personalized learning,teacher-student collaboration,and automated assessment while highlighting critical issues such as motivation,engagement,and_age-appropriate instructional strategies.By analyzing global developments,this study offers valuable insights for educators seeking to optimize AI-driven teaching methods and for students leveraging AI for self-directed learning.Our findings aim to inform future research and drive innovation in educational Al,ensuring the effective and ethical integration of LFMs into the evolving K-12 educational landscape. 展开更多
关键词 large-scalefoundationmodels(LFMs) multimodal large language model large language model K-12 education
在线阅读 下载PDF
CLIP4Video-Sampling: Global Semantics-Guided Multi-Granularity Frame Sampling for Video-Text Retrieval
17
作者 Tao Zhang Yu Zhang 《Journal of Computer and Communications》 2024年第11期26-36,共11页
Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role in improving retrieval per... Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role in improving retrieval performance, as it determines the quality of the visual content representation. Traditional sampling methods, such as uniform sampling and optical flow-based techniques, often fail to capture the full semantic range of videos, leading to redundancy and inefficiencies. In this work, we propose CLIP4Video-Sampling: Global Semantics-Guided Multi-Granularity Frame Sampling for Video-Text Retrieval, a global semantics-guided multi-granularity frame sampling strategy designed to optimize both computational efficiency and retrieval accuracy. By integrating multi-scale global and local temporal sampling and leveraging the CLIP (Contrastive Language-Image Pre-training) model’s powerful feature extraction capabilities, our method significantly outperforms existing approaches in both zero-shot and fine-tuned video-text retrieval tasks on popular datasets. CLIP4Video-Sampling reduces redundancy, ensures keyframe coverage, and serves as an adaptable pre-processing module for multimodal models. 展开更多
关键词 Video Sampling multimodal Large Language model Text-Video Retrieval CLIP model
在线阅读 下载PDF
Large investment model
18
作者 Jian GUO Heung-Yeung SHUM 《Frontiers of Information Technology & Electronic Engineering》 2025年第10期1771-1792,共22页
Traditional quantitative investment research is encountering diminishing returns alongside rising labor and time costs.To overcome these challenges,we introduce the large investment model(LIM),a novel research paradig... Traditional quantitative investment research is encountering diminishing returns alongside rising labor and time costs.To overcome these challenges,we introduce the large investment model(LIM),a novel research paradigm designed to enhance both performance and efficiency at scale.LIM employs end-to-end learning and universal modeling to create an upstream foundation model,which is capable of autonomously learning comprehensive signal patterns from diverse financial data spanning multiple exchanges,instruments,and frequencies.These“global patterns”are subsequently transferred to downstream strategy modeling,optimizing performance for specific tasks.We detail the system architecture design of LIM,address the technical challenges inherent in this approach,and outline potential directions for future research. 展开更多
关键词 Artificial general intelligence END-TO-END Large investment model Quantitative investment Foundation model multimodal large language model
原文传递
Power allocation and mode selection methods for cooperative communication in the rectangular tunnel 被引量:2
19
作者 Zhai Wenyan Sun Yanjing +1 位作者 Xu Zhao Li Song 《International Journal of Mining Science and Technology》 SCIE EI CSCD 2015年第2期253-260,共8页
For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the ch... For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the channel capacity formula derivation was obtained.On the optimal criterion of the channel capacity,the power allocation methods of both amplifying and forwarding(AF) and decoding and forwarding(DF) cooperative communication systems were proposed in the limitation of the total power to maximize the channel capacity.The mode selection methods of single input single output(SISO) and single input multiple output(SIMO) models in the rectangular tunnel,through which the higher channel capacity can be obtained,were put forward as well.The theoretical analysis and simulation comparison show that,channel capacity of the wireless communication system in the rectangular tunnel can be effectively enhanced through the cooperative technology;channel capacity of the rectangular tunnel under complicated conditions is maximized through the proposed power allocation methods,and the optimal cooperative mode of the channel capacity can be chosen according to the cooperative mode selection methods given in the paper. 展开更多
关键词 Rectangular tunnel Multimode channel model Channel capacity Cooperative communication Power allocation Mode selection
在线阅读 下载PDF
An aligned mixture probabilistic principal component analysis for fault detection of multimode chemical processes 被引量:5
20
作者 杨雅伟 马玉鑫 +1 位作者 宋冰 侍洪波 《Chinese Journal of Chemical Engineering》 SCIE EI CAS CSCD 2015年第8期1357-1363,共7页
A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the... A novel approach named aligned mixture probabilistic principal component analysis(AMPPCA) is proposed in this study for fault detection of multimode chemical processes. In order to exploit within-mode correlations,the AMPPCA algorithm first estimates a statistical description for each operating mode by applying mixture probabilistic principal component analysis(MPPCA). As a comparison, the combined MPPCA is employed where monitoring results are softly integrated according to posterior probabilities of the test sample in each local model. For exploiting the cross-mode correlations, which may be useful but are inadvertently neglected due to separately held monitoring approaches, a global monitoring model is constructed by aligning all local models together. In this way, both within-mode and cross-mode correlations are preserved in this integrated space. Finally, the utility and feasibility of AMPPCA are demonstrated through a non-isothermal continuous stirred tank reactor and the TE benchmark process. 展开更多
关键词 Multimode process monitoring Mixture probabilistic principal component analysis model alignment Fault detection
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部