With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process...With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process industry.This paper provides a comprehensive overview of the challenges and opportunities presented by the use of foundation models in the process industry,including the frameworks,core applications,and future prospects.First,this paper proposes a framework for foundation models for the process industry.Second,it summarizes the key capabilities of industrial foundation models and their practical applications.Finally,it highlights future research directions and identifies unresolved open issues related to the use of foundation models in the process industry.展开更多
Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-spe...Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-specific preprocessing,which frequently leads to the development of large and complex models.Inspired by the success of Large Language Models(LLMs),transformer-based foundation models have been developed for time series(TSFM).These models have been proven to reconstruct time series in a zero-shot manner,being able to capture different patterns that effectively characterize time series.This paper proposes the use of TSFM to generate embeddings of the input data space,making them more interpretable for machine learning models.To evaluate the effectiveness of our approach,we trained three classical machine learning algorithms and one neural network using the embeddings generated by the TSFM called Moment for predicting the remaining useful life of aircraft engines.We test the models trained with both the full training dataset and only 10%of the training samples.Our results show that training simple models,such as support vector regressors or neural networks,with embeddings generated by Moment not only accelerates the training process but also enhances performance in few-shot learning scenarios,where data is scarce.This suggests a promising alternative to complex deep learning architectures,particularly in industrial contexts with limited labeled data.展开更多
Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the m...Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the model,making it possible to apply models without any training.Therefore,we proposed a workflow based on foundation models and zero training to solve the tasks of photoacoustic(PA)image processing.We employed the Segment Anything Model(SAM)by setting simple prompts and integrating the model's outputs with prior knowledge of the imaged objects to accomplish various tasks,including:(1)removing the skin signal in three-dimensional PA image rendering;(2)dual speed-of-sound reconstruction,and(3)segmentation ofnger blood vessels.Through these demonstrations,we have concluded that FMs can be directly applied in PA imaging without the requirement for network design and training.This potentially allows for a hands-on,convenient approach to achieving efficient and accurate segmentation of PA images.This paper serves as a comprehensive tutorial,facilitating the mastery of the technique through the provision of code and sample datasets.展开更多
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence...In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.展开更多
Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in e...Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.展开更多
Artificial intelligence(AI),particularly deep learning,has demonstrated remarkable performance in medical imaging across a variety of modalities,including X-ray,computed tomography(CT),magnetic resonance imaging(MRI),...Artificial intelligence(AI),particularly deep learning,has demonstrated remarkable performance in medical imaging across a variety of modalities,including X-ray,computed tomography(CT),magnetic resonance imaging(MRI),ultrasound,positron emission tomography(PET),and pathological imaging.However,most existing state-of-the-art AI techniques are task-specific and focus on a limited range of imaging modalities.Compared to these task-specific models,emerging foundation models represent a significant milestone in AI development.These models can learn generalized representations of medical images and apply them to downstream tasks through zero-shot or few-shot fine-tuning.Foundation models have the potential to address the comprehensive and multifactorial challenges encountered in clinical practice.This article reviews the clinical applications of both task-specific and foundation models,highlighting their differences,complementarities,and clinical relevance.We also examine their future research directions and potential challenges.Unlike the replacement relationship seen between deep learning and traditional machine learning,task-specific and foundation models are complementary,despite inherent differences.While foundation models primarily focus on segmentation and classification,task-specific models are integrated into nearly all medical image analyses.However,with further advancements,foundation models could be applied to other clinical scenarios.In conclusion,all indications suggest that task-specific and foundation models,especially the latter,have the potential to drive breakthroughs in medical imaging,from image processing to clinical workflows.展开更多
Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make...Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps,such as AI agents and high-level reinforcement learning.Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision,language,and sensory data—into a cohesive decision-making process.Foundation models(FMs)have become pivotal in science and industry,transforming decision-making and research capabilities.Their large-scale,multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare,life sciences,and education.This survey examines IDM’s evolution,advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains,highlighting the challenges and opportunities in building efficient,adaptive,and ethical decision systems.展开更多
1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models tra...1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.展开更多
INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adopti...INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adoption of this technology.With technological advancements,this situation is undergoing a fundamental transformation.The recent release of DeepSeek-V31 has sparked extensive discussions.Through innovative architectural design and efficient training strategies,it has significantly reduced training costswhile achieving performance comparable to top-tier closed-source models.The pre-training cost of DeepSeek-V3is only$5.576 million,far lower than the hundreds ofmillions of dollars required formodels like GPT-4.As shwon in Figure 1,this breakthrough not onlymarks the democratization of LFM technology but also opens up opportunities for more small-and medium-sized enterprises and research institutions to participate in AI innovation.In the future,LFMs will no longer be a game for the few.展开更多
DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing ...DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing techniques to unleash their application value.1 Generalist EO intelligence refers to the ability to provide unified support for qualitative interpretation,quantitative inversion,and interactive dialogue across diverse EO data and tasks.It has attracted significant attention recently,prompting academia,industry,and government to invest substantial resources.2 Through developing remote sensing foundation models(RSFMs),generalist EO intelligence can ultimately offer humanity a shared spatial-temporal intelligence service in various fields(e.g.,agriculture,forestry,and oceanography).3 However,a critical question remains:have we truly unleashed the potential of RSFMs for generalist EO intelligence?Despite the vast volume of remote sensing data,their distribution is often fragmented and decentralized due to privacy concerns,storage bottlenecks,industrial competition,and geo-information security.This fragmentation leads to data islands,which limit the full utilization of multi-source remote sensing data.Moreover,computility(i.e.,computational resources)typically develops in isolation,inadequately supporting the large-scale training and application of RSFMs.展开更多
Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to ma...Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality.In contrast,decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality.The practical integration of forecast values into forecasting models is challenging,particularly when addressing complex applications with diverse instances,such as buildings.This becomes even more complicated when instances possess specific characteristics that require instance-specific,tailored predictions to increase the forecast value.To tackle this challenge,we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem.To obtain more robust predictions for scarce building data,we use Moirai as a state-of-the-art foundation model,which offers robust and generalized results with few-shot parameter-efficient fine-tuning.Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Moirai,we observe an improvement of 9.45%in Average Daily Total Costs.展开更多
Single-cell RNA sequencing(scRNA-seq)provides unprecedented insights into plant cellular diversity by enabling high-resolution analyses of gene expression at the single-cell level.However,the complexity of scRNA-seq d...Single-cell RNA sequencing(scRNA-seq)provides unprecedented insights into plant cellular diversity by enabling high-resolution analyses of gene expression at the single-cell level.However,the complexity of scRNA-seq data,including challenges in batch integration,cell type annotation,and gene regulatory network(GRN)inference,demands advanced computational approaches.To address these challenges,we developed scPlantLLM,a Transformer model trained on millions of plant single-cell data points.Using a sequential pretraining strategy incorporating masked language modeling and cell type annotation tasks,scPlantLLM generates robust and interpretable single-cell data embeddings.When applied to Arabidopsis thaliana datasets,scPlantLLM excels in clustering,cell type annotation,and batch integration,achieving an accuracy of up to 0.91 in zero-shot learning scenarios.Furthermore,the model demonstrates an ability to identify biologically meaningful GRNs and subtle cellular subtypes,showcasing its potential to advance plant biology research.Compared to traditional methods,scPlantLLM outperforms in key metrics such as adjusted rand index(ARI),normalized mutual information(NMI),and silhouette score(SIL),highlighting its superior clustering accuracy and biological relevance.scPlantLLM represents a foundation model for exploring plant single-cell expression atlases,offering unprecedented capabilities to resolve cellular heterogeneity and regulatory dynamics across diverse plant systems.The code used in this study is available at https://github.com/compbioNJU/scPlantLLM.展开更多
Lightning is a significant natural hazard that poses considerable risks to both human safety and industrial operations.Accurate,fine-scale lightning forecasting is crucial for effective disaster prevention.Traditional...Lightning is a significant natural hazard that poses considerable risks to both human safety and industrial operations.Accurate,fine-scale lightning forecasting is crucial for effective disaster prevention.Traditional forecasting methods primarily rely on numerical weather prediction(NWP),which demands substantial computational resources to solve complex atmospheric evolution equations.Recently,deep learning-based weather prediction models—particularly weather foundation models(WFMs)—have demonstrated promising results,achieving performance comparable to NWP while requiring substantially fewer computational resources.However,existing WFMs are unable to directly generate lightning forecasts and struggle to satisfy the high spatial resolution required for fine-scale prediction.To address these limitations,this paper investigates a fine-scale lightning forecasting approach based on WFMs and proposes a dual-source data-driven forecasting framework that integrates the strengths of both WFMs and recent lightning observations to enhance predictive performance.Furthermore,a gated spatiotemporal fusion network(gSTFNet)is designed to address the challenges of cross-temporal and cross-modal fusion inherent in dual-source data integration.gSTFNet employs a dual-encoding structure to separately encode features from WFMs and lightning observations,effectively narrowing the modal gap in the latent feature space.A gated spatiotemporal fusion module is then introduced to model the spatiotemporal correlations between the two types of features,facilitating seamless cross-temporal fusion.The fused features are subsequently processed by a deconvolutional network to generate accurate lightning forecasts.We evaluate the proposed gSTFNet using real-world lightning observation data collected in Guangdong from 2018 to 2022.Experimental results demonstrate that:(1)In terms of the ETS score,the dual-source framework achieves a 50% improvement over models trained solely on WFMs,and a 300% improvement over the HRES lightning forecasting product released by the European Centre for Medium-Range Weather Forecasts(ECMWF);(2)gSTFNet outperforms several state-of-the-art deep learning baselines that utilize dual-source inputs,clearly demonstrating superior forecasting accuracy.展开更多
Although plant disease recognition is highly important in agricultural production,traditional methods face challenges due to the high costs associated with data collection and the scarcity of samples.Few-shot plant di...Although plant disease recognition is highly important in agricultural production,traditional methods face challenges due to the high costs associated with data collection and the scarcity of samples.Few-shot plant disease identification tasks,which are based on transfer learning,can learn feature representations from a small amount of data;however,most of these methods require pretraining within the relevant domain.Recently,foundation models have demonstrated excellent performance in zero-shot and few-shot learning scenarios.In this study,we explore the potential of foundation models in plant disease recognition by proposing an efficient few-shot plant disease recognition model(PlantCaFo)based on foundation models.This model operates on an end-to-end network structure,integrating prior knowledge from multiple pretraining models.Specifically,we design a lightweight dilated contextual adapter(DCon-Adapter)to learn new knowledge from training data and use a weight decomposition matrix(WDM)to update the text weights.We test the proposed model on a public dataset,PlantVillage,and show that the model achieves an accuracy of 93.53%in a"38-way 16-shot"setting.In addition,we conduct experiments on images collected from natural environments(Cassava dataset),achieving an accuracy improvement of 6.80%over the baseline.To validate the model's generalization performance,we prepare an out-of-distribution dataset with 21 categories,and our model notably increases the accuracy of this dataset.Extensive experiments demonstrate that our model exhibits superior performance over other models in few-shot plant disease identification.展开更多
Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) system...Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) systems. Moreover, a newresearch paradigm has emerged as visualizationtechniques are incorporated into these models. Thisstudy divides these intersections into two researchareas: visualization for foundation model (VIS4FM)and foundation model for visualization (FM4VIS).In terms of VIS4FM, we explore the primary roleof visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FMaddresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in termsof FM4VIS, we highlight how foundation models canbe used to advance the visualization field itself. Theintersection of foundation models with visualizations ispromising but also introduces a set of challenges. Byhighlighting these challenges and promising opportunities, this study aims to provide a starting point forthe continued exploration of this research avenue.展开更多
Optical multilayer thin film structures have been widely used in numerous photonic applications.However,existing inverse design methods have many drawbacks because they either fail to quickly adapt to different design...Optical multilayer thin film structures have been widely used in numerous photonic applications.However,existing inverse design methods have many drawbacks because they either fail to quickly adapt to different design targets,or are difficult to suit for different types of structures,e.g.,designing for different materials at each layer.These methods also cannot accommodate versatile design situations under different angles and polarizations.In addition,how to benefit practical fabrications and manufacturing has not been extensively considered yet.In this work,we introduce OptoGPT(Opto Generative Pretrained Transformer),a decoder-only transformer,to solve all these drawbacks and issues simultaneously.展开更多
Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large-scale general datasets to pretrain models with foundational knowledge that can then be trans...Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large-scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks.More recently,there has been a growth in the adoption of transfer learning approaches in biological fields,where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications.However,unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth,biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real-world physical constraints.This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.展开更多
Transformer-based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics.In this perspective,we first discuss about the direct application of textual fo...Transformer-based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics.In this perspective,we first discuss about the direct application of textual foundation models on bioinformatics tasks,focusing on how to make the most out of canonical large language models and mitigate their inherent flaws.Meanwhile,we go through the transformer-based,bioinformaticstailored foundation models for both sequence and non-sequence data.In particular,we envision the further development directions as well as challenges for bioinformatics foundation models.展开更多
This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmen...This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external vector database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. Furthermore, we present a targeted scaling study to quantify the algorithmic performance of the framework as the number of prompt tokens increases. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future improvements. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.展开更多
基金supported by the National Natural Science Foundation of China(62225302,623B2014,and 62173023).
文摘With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process industry.This paper provides a comprehensive overview of the challenges and opportunities presented by the use of foundation models in the process industry,including the frameworks,core applications,and future prospects.First,this paper proposes a framework for foundation models for the process industry.Second,it summarizes the key capabilities of industrial foundation models and their practical applications.Finally,it highlights future research directions and identifies unresolved open issues related to the use of foundation models in the process industry.
基金Funded by the Spanish Government and FEDER funds(AEI/FEDER,UE)under grant PID2021-124502OB-C42(PRESECREL)the predoctoral program“Concepción Arenal del Programa de Personal Investigador en formación Predoctoral”funded by Universidad de Cantabria and Cantabria’s Government(BOC 18-10-2021).
文摘Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-specific preprocessing,which frequently leads to the development of large and complex models.Inspired by the success of Large Language Models(LLMs),transformer-based foundation models have been developed for time series(TSFM).These models have been proven to reconstruct time series in a zero-shot manner,being able to capture different patterns that effectively characterize time series.This paper proposes the use of TSFM to generate embeddings of the input data space,making them more interpretable for machine learning models.To evaluate the effectiveness of our approach,we trained three classical machine learning algorithms and one neural network using the embeddings generated by the TSFM called Moment for predicting the remaining useful life of aircraft engines.We test the models trained with both the full training dataset and only 10%of the training samples.Our results show that training simple models,such as support vector regressors or neural networks,with embeddings generated by Moment not only accelerates the training process but also enhances performance in few-shot learning scenarios,where data is scarce.This suggests a promising alternative to complex deep learning architectures,particularly in industrial contexts with limited labeled data.
基金support from Strategic Project of Precision Surgery,Tsinghua UniversityInitiative Scientific Research Program,Institute for Intelligent Healthcare,Tsinghua University+5 种基金Tsinghua-Foshan Institute of Advanced ManufacturingNational Natural Science Foundation of China(61735016)Beijing Nova Program(20230484308)Young Elite Scientists Sponsorship Program by CAST(2023QNRC001)Youth Elite Program of Beijing Friendship Hospital(YYQCJH2022-9)Science and Technology Program of Beijing Tongzhou District(KJ2023CX012).
文摘Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the model,making it possible to apply models without any training.Therefore,we proposed a workflow based on foundation models and zero training to solve the tasks of photoacoustic(PA)image processing.We employed the Segment Anything Model(SAM)by setting simple prompts and integrating the model's outputs with prior knowledge of the imaged objects to accomplish various tasks,including:(1)removing the skin signal in three-dimensional PA image rendering;(2)dual speed-of-sound reconstruction,and(3)segmentation ofnger blood vessels.Through these demonstrations,we have concluded that FMs can be directly applied in PA imaging without the requirement for network design and training.This potentially allows for a hands-on,convenient approach to achieving efficient and accurate segmentation of PA images.This paper serves as a comprehensive tutorial,facilitating the mastery of the technique through the provision of code and sample datasets.
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
基金supported in part by the National Natural Science Foundation of China under Grant(62001246,62231017,62201277,62071255)the Natural Science Foundation of Jiangsu Province under Grant BK20220390+3 种基金Key R and D Program of Jiangsu Province Key project and topics under Grant(BE2021095,BE2023035)the Natural Science Research Startup Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications(Grant No.NY221011)National Science Foundation of Xiamen,China(No.3502Z202372013)Open Project of the Key Laboratory of Underwater Acoustic Communication and Marine Information Technology(Xiamen University)of the Ministry of Education,China(No.UAC202304)。
文摘In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.
基金supported by the Guangdong Provincial Science and Technology Program(Grant No.2023A0505030003).
文摘Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.
基金supported by grants from Beijing Hospitals Authority’s Ascent Plan(No.DFL20220303)Beijing Municipal Science&Technology Commission(No.Z221100003522008).
文摘Artificial intelligence(AI),particularly deep learning,has demonstrated remarkable performance in medical imaging across a variety of modalities,including X-ray,computed tomography(CT),magnetic resonance imaging(MRI),ultrasound,positron emission tomography(PET),and pathological imaging.However,most existing state-of-the-art AI techniques are task-specific and focus on a limited range of imaging modalities.Compared to these task-specific models,emerging foundation models represent a significant milestone in AI development.These models can learn generalized representations of medical images and apply them to downstream tasks through zero-shot or few-shot fine-tuning.Foundation models have the potential to address the comprehensive and multifactorial challenges encountered in clinical practice.This article reviews the clinical applications of both task-specific and foundation models,highlighting their differences,complementarities,and clinical relevance.We also examine their future research directions and potential challenges.Unlike the replacement relationship seen between deep learning and traditional machine learning,task-specific and foundation models are complementary,despite inherent differences.While foundation models primarily focus on segmentation and classification,task-specific models are integrated into nearly all medical image analyses.However,with further advancements,foundation models could be applied to other clinical scenarios.In conclusion,all indications suggest that task-specific and foundation models,especially the latter,have the potential to drive breakthroughs in medical imaging,from image processing to clinical workflows.
基金supported by the National Natural Science Foundation of China under grant nos.62372470,72225011,62402414,U23B2059,62173034,32222070,62402017,72421002,62206303,62476264,62406312,62102266,52173241,and U23A20468the National Key Research and Development Program of China(2023YFD1900604)+8 种基金the Strategic Priority Research Program of the Chinese Academy of Science(XDB0680301)the Youth Innovation Promotion Association CAS(2023112)the National High Level Hospital Clinical Research funding(2022-PUMCH-A-014),the Beijing Natural Science Foundation(4244098)the Science and Technology Innovation Program of Hunan Province(2023RC3009)the Key Research and Development Program of Yunnan Province(202202AE090034)the MNR Key Laboratory for Geo-Environmental Monitoring of Greater Bay Area(GEMLab-2023001)the Science and Technology Innovation Key R&D Program of Chongqing(CSTB2024TIAD-STX0024)the China National Postdoctoral Program for Innovative Talents(BX20240385)the River Talent Recruitment Program of Guangdong Province(2019ZT08X603).
文摘Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps,such as AI agents and high-level reinforcement learning.Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision,language,and sensory data—into a cohesive decision-making process.Foundation models(FMs)have become pivotal in science and industry,transforming decision-making and research capabilities.Their large-scale,multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare,life sciences,and education.This survey examines IDM’s evolution,advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains,highlighting the challenges and opportunities in building efficient,adaptive,and ethical decision systems.
文摘1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.
基金supported by the National Natural Science Foundation of China under grant nos.62206266 and 62372430the Youth Innovation Promotion Association CAS no.2023112.
文摘INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adoption of this technology.With technological advancements,this situation is undergoing a fundamental transformation.The recent release of DeepSeek-V31 has sparked extensive discussions.Through innovative architectural design and efficient training strategies,it has significantly reduced training costswhile achieving performance comparable to top-tier closed-source models.The pre-training cost of DeepSeek-V3is only$5.576 million,far lower than the hundreds ofmillions of dollars required formodels like GPT-4.As shwon in Figure 1,this breakthrough not onlymarks the democratization of LFM technology but also opens up opportunities for more small-and medium-sized enterprises and research institutions to participate in AI innovation.In the future,LFMs will no longer be a game for the few.
基金supported by the National Natural Science Foundation of China under grants 42030102 and 42371321 and by the Ant Group。
文摘DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing techniques to unleash their application value.1 Generalist EO intelligence refers to the ability to provide unified support for qualitative interpretation,quantitative inversion,and interactive dialogue across diverse EO data and tasks.It has attracted significant attention recently,prompting academia,industry,and government to invest substantial resources.2 Through developing remote sensing foundation models(RSFMs),generalist EO intelligence can ultimately offer humanity a shared spatial-temporal intelligence service in various fields(e.g.,agriculture,forestry,and oceanography).3 However,a critical question remains:have we truly unleashed the potential of RSFMs for generalist EO intelligence?Despite the vast volume of remote sensing data,their distribution is often fragmented and decentralized due to privacy concerns,storage bottlenecks,industrial competition,and geo-information security.This fragmentation leads to data islands,which limit the full utilization of multi-source remote sensing data.Moreover,computility(i.e.,computational resources)typically develops in isolation,inadequately supporting the large-scale training and application of RSFMs.
基金funded by the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI,the Helmholtz Association under the Program“Energy System Design”the German Research Foundation(DFG)as part of the Research Training Group 2153“En-ergy Status Data:Informatics Methods for its Collection,Analysis and Exploitation”+1 种基金supported by the Helmholtz Association Initiative and Networking Fund on the HAICORE@KIT partitionsupport by the KIT-Publication Fund of the Karlsruhe Institute of Technology.
文摘Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality.In contrast,decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality.The practical integration of forecast values into forecasting models is challenging,particularly when addressing complex applications with diverse instances,such as buildings.This becomes even more complicated when instances possess specific characteristics that require instance-specific,tailored predictions to increase the forecast value.To tackle this challenge,we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem.To obtain more robust predictions for scarce building data,we use Moirai as a state-of-the-art foundation model,which offers robust and generalized results with few-shot parameter-efficient fine-tuning.Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Moirai,we observe an improvement of 9.45%in Average Daily Total Costs.
基金supported by the National Natural Science Foundation of China(Grant No.32070656 to Dijun Chen)the National Key R&D Program of China(Grant No.2021YFE0112000 to He Zhang).
文摘Single-cell RNA sequencing(scRNA-seq)provides unprecedented insights into plant cellular diversity by enabling high-resolution analyses of gene expression at the single-cell level.However,the complexity of scRNA-seq data,including challenges in batch integration,cell type annotation,and gene regulatory network(GRN)inference,demands advanced computational approaches.To address these challenges,we developed scPlantLLM,a Transformer model trained on millions of plant single-cell data points.Using a sequential pretraining strategy incorporating masked language modeling and cell type annotation tasks,scPlantLLM generates robust and interpretable single-cell data embeddings.When applied to Arabidopsis thaliana datasets,scPlantLLM excels in clustering,cell type annotation,and batch integration,achieving an accuracy of up to 0.91 in zero-shot learning scenarios.Furthermore,the model demonstrates an ability to identify biologically meaningful GRNs and subtle cellular subtypes,showcasing its potential to advance plant biology research.Compared to traditional methods,scPlantLLM outperforms in key metrics such as adjusted rand index(ARI),normalized mutual information(NMI),and silhouette score(SIL),highlighting its superior clustering accuracy and biological relevance.scPlantLLM represents a foundation model for exploring plant single-cell expression atlases,offering unprecedented capabilities to resolve cellular heterogeneity and regulatory dynamics across diverse plant systems.The code used in this study is available at https://github.com/compbioNJU/scPlantLLM.
基金supported by the Open Grants of Key Laboratory of Lightning,China Meteorological Administration(Grant Nos.2023KELL-B002 and 2024KELL-A001)the National Natural Science Foundation of China(Grant Nos.62306028,42075088 and U2342215)the Science and Technology Program of Shenzhen,China(Grant No.KJZD20240903102742055)。
文摘Lightning is a significant natural hazard that poses considerable risks to both human safety and industrial operations.Accurate,fine-scale lightning forecasting is crucial for effective disaster prevention.Traditional forecasting methods primarily rely on numerical weather prediction(NWP),which demands substantial computational resources to solve complex atmospheric evolution equations.Recently,deep learning-based weather prediction models—particularly weather foundation models(WFMs)—have demonstrated promising results,achieving performance comparable to NWP while requiring substantially fewer computational resources.However,existing WFMs are unable to directly generate lightning forecasts and struggle to satisfy the high spatial resolution required for fine-scale prediction.To address these limitations,this paper investigates a fine-scale lightning forecasting approach based on WFMs and proposes a dual-source data-driven forecasting framework that integrates the strengths of both WFMs and recent lightning observations to enhance predictive performance.Furthermore,a gated spatiotemporal fusion network(gSTFNet)is designed to address the challenges of cross-temporal and cross-modal fusion inherent in dual-source data integration.gSTFNet employs a dual-encoding structure to separately encode features from WFMs and lightning observations,effectively narrowing the modal gap in the latent feature space.A gated spatiotemporal fusion module is then introduced to model the spatiotemporal correlations between the two types of features,facilitating seamless cross-temporal fusion.The fused features are subsequently processed by a deconvolutional network to generate accurate lightning forecasts.We evaluate the proposed gSTFNet using real-world lightning observation data collected in Guangdong from 2018 to 2022.Experimental results demonstrate that:(1)In terms of the ETS score,the dual-source framework achieves a 50% improvement over models trained solely on WFMs,and a 300% improvement over the HRES lightning forecasting product released by the European Centre for Medium-Range Weather Forecasts(ECMWF);(2)gSTFNet outperforms several state-of-the-art deep learning baselines that utilize dual-source inputs,clearly demonstrating superior forecasting accuracy.
基金supported by the National Key Research and Development Program of China(2023YFF1000100).
文摘Although plant disease recognition is highly important in agricultural production,traditional methods face challenges due to the high costs associated with data collection and the scarcity of samples.Few-shot plant disease identification tasks,which are based on transfer learning,can learn feature representations from a small amount of data;however,most of these methods require pretraining within the relevant domain.Recently,foundation models have demonstrated excellent performance in zero-shot and few-shot learning scenarios.In this study,we explore the potential of foundation models in plant disease recognition by proposing an efficient few-shot plant disease recognition model(PlantCaFo)based on foundation models.This model operates on an end-to-end network structure,integrating prior knowledge from multiple pretraining models.Specifically,we design a lightweight dilated contextual adapter(DCon-Adapter)to learn new knowledge from training data and use a weight decomposition matrix(WDM)to update the text weights.We test the proposed model on a public dataset,PlantVillage,and show that the model achieves an accuracy of 93.53%in a"38-way 16-shot"setting.In addition,we conduct experiments on images collected from natural environments(Cassava dataset),achieving an accuracy improvement of 6.80%over the baseline.To validate the model's generalization performance,we prepare an out-of-distribution dataset with 21 categories,and our model notably increases the accuracy of this dataset.Extensive experiments demonstrate that our model exhibits superior performance over other models in few-shot plant disease identification.
基金supported by the National Natural Science Foundation of China(Grant Nos.U21A20469 and 61936002)the National Key R&D Program of China(Grant No.2020YFB2104100)grants from the Institute Guo Qiang,THUIBCS,and BLBCI.
文摘Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) systems. Moreover, a newresearch paradigm has emerged as visualizationtechniques are incorporated into these models. Thisstudy divides these intersections into two researchareas: visualization for foundation model (VIS4FM)and foundation model for visualization (FM4VIS).In terms of VIS4FM, we explore the primary roleof visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FMaddresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in termsof FM4VIS, we highlight how foundation models canbe used to advance the visualization field itself. Theintersection of foundation models with visualizations ispromising but also introduces a set of challenges. Byhighlighting these challenges and promising opportunities, this study aims to provide a starting point forthe continued exploration of this research avenue.
基金the National Science Foundation(PFI-008513 and FET-2309403)for the support of this work.
文摘Optical multilayer thin film structures have been widely used in numerous photonic applications.However,existing inverse design methods have many drawbacks because they either fail to quickly adapt to different design targets,or are difficult to suit for different types of structures,e.g.,designing for different materials at each layer.These methods also cannot accommodate versatile design situations under different angles and polarizations.In addition,how to benefit practical fabrications and manufacturing has not been extensively considered yet.In this work,we introduce OptoGPT(Opto Generative Pretrained Transformer),a decoder-only transformer,to solve all these drawbacks and issues simultaneously.
文摘Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large-scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks.More recently,there has been a growth in the adoption of transfer learning approaches in biological fields,where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications.However,unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth,biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real-world physical constraints.This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.
基金National Key Research and Development Program of China,Grant/Award Number:2022ZD0115004。
文摘Transformer-based foundation models such as ChatGPTs have revolutionized our daily life and affected many fields including bioinformatics.In this perspective,we first discuss about the direct application of textual foundation models on bioinformatics tasks,focusing on how to make the most out of canonical large language models and mitigate their inherent flaws.Meanwhile,we go through the transformer-based,bioinformaticstailored foundation models for both sequence and non-sequence data.In particular,we envision the further development directions as well as challenges for bioinformatics foundation models.
基金support from the Defense Threat Reduction Agency(DTRA)under Grant No.HDTRA12110012with Dr.Richard Fry as the Program Officer,and partial project support from the Air Force Office of Scientific Research(AFOSR)under Grant No.FA9550-24-1-0017with Dr.Chiping Li as the Program Officer.
文摘This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external vector database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. Furthermore, we present a targeted scaling study to quantify the algorithmic performance of the framework as the number of prompt tokens increases. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future improvements. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.