Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelli...Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies.展开更多
Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-spe...Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-specific preprocessing,which frequently leads to the development of large and complex models.Inspired by the success of Large Language Models(LLMs),transformer-based foundation models have been developed for time series(TSFM).These models have been proven to reconstruct time series in a zero-shot manner,being able to capture different patterns that effectively characterize time series.This paper proposes the use of TSFM to generate embeddings of the input data space,making them more interpretable for machine learning models.To evaluate the effectiveness of our approach,we trained three classical machine learning algorithms and one neural network using the embeddings generated by the TSFM called Moment for predicting the remaining useful life of aircraft engines.We test the models trained with both the full training dataset and only 10%of the training samples.Our results show that training simple models,such as support vector regressors or neural networks,with embeddings generated by Moment not only accelerates the training process but also enhances performance in few-shot learning scenarios,where data is scarce.This suggests a promising alternative to complex deep learning architectures,particularly in industrial contexts with limited labeled data.展开更多
Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the m...Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the model,making it possible to apply models without any training.Therefore,we proposed a workflow based on foundation models and zero training to solve the tasks of photoacoustic(PA)image processing.We employed the Segment Anything Model(SAM)by setting simple prompts and integrating the model's outputs with prior knowledge of the imaged objects to accomplish various tasks,including:(1)removing the skin signal in three-dimensional PA image rendering;(2)dual speed-of-sound reconstruction,and(3)segmentation ofnger blood vessels.Through these demonstrations,we have concluded that FMs can be directly applied in PA imaging without the requirement for network design and training.This potentially allows for a hands-on,convenient approach to achieving efficient and accurate segmentation of PA images.This paper serves as a comprehensive tutorial,facilitating the mastery of the technique through the provision of code and sample datasets.展开更多
With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process...With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process industry.This paper provides a comprehensive overview of the challenges and opportunities presented by the use of foundation models in the process industry,including the frameworks,core applications,and future prospects.First,this paper proposes a framework for foundation models for the process industry.Second,it summarizes the key capabilities of industrial foundation models and their practical applications.Finally,it highlights future research directions and identifies unresolved open issues related to the use of foundation models in the process industry.展开更多
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence...In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.展开更多
Do we need a foundation model(FM)for spatial transcriptomic analysis?To answer this question,we prepared this perspective as a primer.We first review the current progress of developing FMs for modeling spatial transcr...Do we need a foundation model(FM)for spatial transcriptomic analysis?To answer this question,we prepared this perspective as a primer.We first review the current progress of developing FMs for modeling spatial transcriptomic data and then discuss possible tasks that can be addressed by FMs.Finally,we explore future directions of developing such models for understanding spatial transcriptomics by describing both opportunities and challenges.In particular,we expect that a successful FM should boost research productivity,increase novel biological discoveries,and provide user-friendly access.展开更多
Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make...Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps,such as AI agents and high-level reinforcement learning.Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision,language,and sensory data—into a cohesive decision-making process.Foundation models(FMs)have become pivotal in science and industry,transforming decision-making and research capabilities.Their large-scale,multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare,life sciences,and education.This survey examines IDM’s evolution,advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains,highlighting the challenges and opportunities in building efficient,adaptive,and ethical decision systems.展开更多
Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in e...Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.展开更多
1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models tra...1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.展开更多
INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adopti...INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adoption of this technology.With technological advancements,this situation is undergoing a fundamental transformation.The recent release of DeepSeek-V31 has sparked extensive discussions.Through innovative architectural design and efficient training strategies,it has significantly reduced training costswhile achieving performance comparable to top-tier closed-source models.The pre-training cost of DeepSeek-V3is only$5.576 million,far lower than the hundreds ofmillions of dollars required formodels like GPT-4.As shwon in Figure 1,this breakthrough not onlymarks the democratization of LFM technology but also opens up opportunities for more small-and medium-sized enterprises and research institutions to participate in AI innovation.In the future,LFMs will no longer be a game for the few.展开更多
DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing ...DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing techniques to unleash their application value.1 Generalist EO intelligence refers to the ability to provide unified support for qualitative interpretation,quantitative inversion,and interactive dialogue across diverse EO data and tasks.It has attracted significant attention recently,prompting academia,industry,and government to invest substantial resources.2 Through developing remote sensing foundation models(RSFMs),generalist EO intelligence can ultimately offer humanity a shared spatial-temporal intelligence service in various fields(e.g.,agriculture,forestry,and oceanography).3 However,a critical question remains:have we truly unleashed the potential of RSFMs for generalist EO intelligence?Despite the vast volume of remote sensing data,their distribution is often fragmented and decentralized due to privacy concerns,storage bottlenecks,industrial competition,and geo-information security.This fragmentation leads to data islands,which limit the full utilization of multi-source remote sensing data.Moreover,computility(i.e.,computational resources)typically develops in isolation,inadequately supporting the large-scale training and application of RSFMs.展开更多
Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to ma...Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality.In contrast,decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality.The practical integration of forecast values into forecasting models is challenging,particularly when addressing complex applications with diverse instances,such as buildings.This becomes even more complicated when instances possess specific characteristics that require instance-specific,tailored predictions to increase the forecast value.To tackle this challenge,we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem.To obtain more robust predictions for scarce building data,we use Moirai as a state-of-the-art foundation model,which offers robust and generalized results with few-shot parameter-efficient fine-tuning.Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Moirai,we observe an improvement of 9.45%in Average Daily Total Costs.展开更多
With the rapid development of artificial intelligence,computational pathology has been seamlessly integrated into the entire clinical workflow,which encompasses diagnosis,treatment,prognosis,and biomarker discovery.Th...With the rapid development of artificial intelligence,computational pathology has been seamlessly integrated into the entire clinical workflow,which encompasses diagnosis,treatment,prognosis,and biomarker discovery.This integration has significantly enhanced clinical accuracy and efficiency while reducing the workload for clinicians.Traditionally,research in this field has depended on the collection and labeling of large datasets for specific tasks,followed by the development of task-specific computational pathology models.However,this approach is labor intensive and does not scale efficiently for open-set identification or rare diseases.Given the diversity of clinical tasks,training individual models from scratch to address the whole spectrum of clinical tasks in the pathology workflow is impractical,which highlights the urgent need to transition from task-specific models to foundation models(FMs).In recent years,pathological FMs have proliferated.These FMs can be classified into three categories,namely,pathology image FMs,pathology image-text FMs,and pathology image-gene FMs,each of which results in distinct functionalities and application scenarios.This review provides an overview of the latest research advancements in pathological FMs,with a particular emphasis on their applications in oncology.The key challenges and opportunities presented by pathological FMs in precision oncology are also explored.展开更多
Although plant disease recognition is highly important in agricultural production,traditional methods face challenges due to the high costs associated with data collection and the scarcity of samples.Few-shot plant di...Although plant disease recognition is highly important in agricultural production,traditional methods face challenges due to the high costs associated with data collection and the scarcity of samples.Few-shot plant disease identification tasks,which are based on transfer learning,can learn feature representations from a small amount of data;however,most of these methods require pretraining within the relevant domain.Recently,foundation models have demonstrated excellent performance in zero-shot and few-shot learning scenarios.In this study,we explore the potential of foundation models in plant disease recognition by proposing an efficient few-shot plant disease recognition model(PlantCaFo)based on foundation models.This model operates on an end-to-end network structure,integrating prior knowledge from multiple pretraining models.Specifically,we design a lightweight dilated contextual adapter(DCon-Adapter)to learn new knowledge from training data and use a weight decomposition matrix(WDM)to update the text weights.We test the proposed model on a public dataset,PlantVillage,and show that the model achieves an accuracy of 93.53%in a"38-way 16-shot"setting.In addition,we conduct experiments on images collected from natural environments(Cassava dataset),achieving an accuracy improvement of 6.80%over the baseline.To validate the model's generalization performance,we prepare an out-of-distribution dataset with 21 categories,and our model notably increases the accuracy of this dataset.Extensive experiments demonstrate that our model exhibits superior performance over other models in few-shot plant disease identification.展开更多
Artificial intelligence(AI),particularly deep learning,has demonstrated remarkable performance in medical imaging across a variety of modalities,including X-ray,computed tomography(CT),magnetic resonance imaging(MRI),...Artificial intelligence(AI),particularly deep learning,has demonstrated remarkable performance in medical imaging across a variety of modalities,including X-ray,computed tomography(CT),magnetic resonance imaging(MRI),ultrasound,positron emission tomography(PET),and pathological imaging.However,most existing state-of-the-art AI techniques are task-specific and focus on a limited range of imaging modalities.Compared to these task-specific models,emerging foundation models represent a significant milestone in AI development.These models can learn generalized representations of medical images and apply them to downstream tasks through zero-shot or few-shot fine-tuning.Foundation models have the potential to address the comprehensive and multifactorial challenges encountered in clinical practice.This article reviews the clinical applications of both task-specific and foundation models,highlighting their differences,complementarities,and clinical relevance.We also examine their future research directions and potential challenges.Unlike the replacement relationship seen between deep learning and traditional machine learning,task-specific and foundation models are complementary,despite inherent differences.While foundation models primarily focus on segmentation and classification,task-specific models are integrated into nearly all medical image analyses.However,with further advancements,foundation models could be applied to other clinical scenarios.In conclusion,all indications suggest that task-specific and foundation models,especially the latter,have the potential to drive breakthroughs in medical imaging,from image processing to clinical workflows.展开更多
Lightning is a significant natural hazard that poses considerable risks to both human safety and industrial operations.Accurate,fine-scale lightning forecasting is crucial for effective disaster prevention.Traditional...Lightning is a significant natural hazard that poses considerable risks to both human safety and industrial operations.Accurate,fine-scale lightning forecasting is crucial for effective disaster prevention.Traditional forecasting methods primarily rely on numerical weather prediction(NWP),which demands substantial computational resources to solve complex atmospheric evolution equations.Recently,deep learning-based weather prediction models—particularly weather foundation models(WFMs)—have demonstrated promising results,achieving performance comparable to NWP while requiring substantially fewer computational resources.However,existing WFMs are unable to directly generate lightning forecasts and struggle to satisfy the high spatial resolution required for fine-scale prediction.To address these limitations,this paper investigates a fine-scale lightning forecasting approach based on WFMs and proposes a dual-source data-driven forecasting framework that integrates the strengths of both WFMs and recent lightning observations to enhance predictive performance.Furthermore,a gated spatiotemporal fusion network(gSTFNet)is designed to address the challenges of cross-temporal and cross-modal fusion inherent in dual-source data integration.gSTFNet employs a dual-encoding structure to separately encode features from WFMs and lightning observations,effectively narrowing the modal gap in the latent feature space.A gated spatiotemporal fusion module is then introduced to model the spatiotemporal correlations between the two types of features,facilitating seamless cross-temporal fusion.The fused features are subsequently processed by a deconvolutional network to generate accurate lightning forecasts.We evaluate the proposed gSTFNet using real-world lightning observation data collected in Guangdong from 2018 to 2022.Experimental results demonstrate that:(1)In terms of the ETS score,the dual-source framework achieves a 50% improvement over models trained solely on WFMs,and a 300% improvement over the HRES lightning forecasting product released by the European Centre for Medium-Range Weather Forecasts(ECMWF);(2)gSTFNet outperforms several state-of-the-art deep learning baselines that utilize dual-source inputs,clearly demonstrating superior forecasting accuracy.展开更多
Machine learning has demonstrated remarkable breakthroughs in predicting the state of health(SOH)for lithium-ion batteries.However,conventional methods face critical challenges in cross-domain adaptation,dataset inter...Machine learning has demonstrated remarkable breakthroughs in predicting the state of health(SOH)for lithium-ion batteries.However,conventional methods face critical challenges in cross-domain adaptation,dataset inter-generalization,and long-horizon forecasting due to variations in usage conditions and electrochemical characteristics.Inspired by the success of large language models(LLMs),time-series foundation models(TSFMs)offer an alternative solution to overcome the issues above.Nevertheless,studies to explore the generalization enhancement capability of TSFMs for battery SOH forecasting under different cross domain factors remain insufficient.Therefore,a novel TSFMs based framework named BatteryTSFM is proposed for SOH forecasting.First,we introduce backbone-aware temporal resampling that dynamically adapts preprocessing to structural characteristics of diverse TSFMs,enabling optimal cross-domain generalization through feature scaling.Second,Monte Carlo dropout is integrated into autoregressive inference to quantify the multi-step prediction errors.Across four public datasets,BatteryTSFM reduces RMSE by an average of 35%in cross-condition tasks and 88%in cross-chemistry tasks,indicating that foundation-model methods can deliver reliable long-horizon SOH forecasts for energy systems.We also conduct exploratory analyses that link generalization to fine-tuning dataset size and resampling granularity,yielding practical guidance for deployment.展开更多
Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) system...Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) systems. Moreover, a newresearch paradigm has emerged as visualizationtechniques are incorporated into these models. Thisstudy divides these intersections into two researchareas: visualization for foundation model (VIS4FM)and foundation model for visualization (FM4VIS).In terms of VIS4FM, we explore the primary roleof visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FMaddresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in termsof FM4VIS, we highlight how foundation models canbe used to advance the visualization field itself. Theintersection of foundation models with visualizations ispromising but also introduces a set of challenges. Byhighlighting these challenges and promising opportunities, this study aims to provide a starting point forthe continued exploration of this research avenue.展开更多
Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large-scale general datasets to pretrain models with foundational knowledge that can then be trans...Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large-scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks.More recently,there has been a growth in the adoption of transfer learning approaches in biological fields,where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications.However,unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth,biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real-world physical constraints.This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.展开更多
基金Supported by the Open Project Program of Panxi Crops Research and Utilization Key Laboratory of Sichuan Province,No.SZKF202302the Fundamental Research Funds for the Central Universities No.2019CDYGYB024.
文摘Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies.
基金Funded by the Spanish Government and FEDER funds(AEI/FEDER,UE)under grant PID2021-124502OB-C42(PRESECREL)the predoctoral program“Concepción Arenal del Programa de Personal Investigador en formación Predoctoral”funded by Universidad de Cantabria and Cantabria’s Government(BOC 18-10-2021).
文摘Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-specific preprocessing,which frequently leads to the development of large and complex models.Inspired by the success of Large Language Models(LLMs),transformer-based foundation models have been developed for time series(TSFM).These models have been proven to reconstruct time series in a zero-shot manner,being able to capture different patterns that effectively characterize time series.This paper proposes the use of TSFM to generate embeddings of the input data space,making them more interpretable for machine learning models.To evaluate the effectiveness of our approach,we trained three classical machine learning algorithms and one neural network using the embeddings generated by the TSFM called Moment for predicting the remaining useful life of aircraft engines.We test the models trained with both the full training dataset and only 10%of the training samples.Our results show that training simple models,such as support vector regressors or neural networks,with embeddings generated by Moment not only accelerates the training process but also enhances performance in few-shot learning scenarios,where data is scarce.This suggests a promising alternative to complex deep learning architectures,particularly in industrial contexts with limited labeled data.
基金support from Strategic Project of Precision Surgery,Tsinghua UniversityInitiative Scientific Research Program,Institute for Intelligent Healthcare,Tsinghua University+5 种基金Tsinghua-Foshan Institute of Advanced ManufacturingNational Natural Science Foundation of China(61735016)Beijing Nova Program(20230484308)Young Elite Scientists Sponsorship Program by CAST(2023QNRC001)Youth Elite Program of Beijing Friendship Hospital(YYQCJH2022-9)Science and Technology Program of Beijing Tongzhou District(KJ2023CX012).
文摘Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the model,making it possible to apply models without any training.Therefore,we proposed a workflow based on foundation models and zero training to solve the tasks of photoacoustic(PA)image processing.We employed the Segment Anything Model(SAM)by setting simple prompts and integrating the model's outputs with prior knowledge of the imaged objects to accomplish various tasks,including:(1)removing the skin signal in three-dimensional PA image rendering;(2)dual speed-of-sound reconstruction,and(3)segmentation ofnger blood vessels.Through these demonstrations,we have concluded that FMs can be directly applied in PA imaging without the requirement for network design and training.This potentially allows for a hands-on,convenient approach to achieving efficient and accurate segmentation of PA images.This paper serves as a comprehensive tutorial,facilitating the mastery of the technique through the provision of code and sample datasets.
基金supported by the National Natural Science Foundation of China(62225302,623B2014,and 62173023).
文摘With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process industry.This paper provides a comprehensive overview of the challenges and opportunities presented by the use of foundation models in the process industry,including the frameworks,core applications,and future prospects.First,this paper proposes a framework for foundation models for the process industry.Second,it summarizes the key capabilities of industrial foundation models and their practical applications.Finally,it highlights future research directions and identifies unresolved open issues related to the use of foundation models in the process industry.
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
基金supported in part by the National Natural Science Foundation of China under Grant(62001246,62231017,62201277,62071255)the Natural Science Foundation of Jiangsu Province under Grant BK20220390+3 种基金Key R and D Program of Jiangsu Province Key project and topics under Grant(BE2021095,BE2023035)the Natural Science Research Startup Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications(Grant No.NY221011)National Science Foundation of Xiamen,China(No.3502Z202372013)Open Project of the Key Laboratory of Underwater Acoustic Communication and Marine Information Technology(Xiamen University)of the Ministry of Education,China(No.UAC202304)。
文摘In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.
基金Foundation for the National Institutes of Health,Grant/Award Number:U01 HG013840。
文摘Do we need a foundation model(FM)for spatial transcriptomic analysis?To answer this question,we prepared this perspective as a primer.We first review the current progress of developing FMs for modeling spatial transcriptomic data and then discuss possible tasks that can be addressed by FMs.Finally,we explore future directions of developing such models for understanding spatial transcriptomics by describing both opportunities and challenges.In particular,we expect that a successful FM should boost research productivity,increase novel biological discoveries,and provide user-friendly access.
基金supported by the National Natural Science Foundation of China under grant nos.62372470,72225011,62402414,U23B2059,62173034,32222070,62402017,72421002,62206303,62476264,62406312,62102266,52173241,and U23A20468the National Key Research and Development Program of China(2023YFD1900604)+8 种基金the Strategic Priority Research Program of the Chinese Academy of Science(XDB0680301)the Youth Innovation Promotion Association CAS(2023112)the National High Level Hospital Clinical Research funding(2022-PUMCH-A-014),the Beijing Natural Science Foundation(4244098)the Science and Technology Innovation Program of Hunan Province(2023RC3009)the Key Research and Development Program of Yunnan Province(202202AE090034)the MNR Key Laboratory for Geo-Environmental Monitoring of Greater Bay Area(GEMLab-2023001)the Science and Technology Innovation Key R&D Program of Chongqing(CSTB2024TIAD-STX0024)the China National Postdoctoral Program for Innovative Talents(BX20240385)the River Talent Recruitment Program of Guangdong Province(2019ZT08X603).
文摘Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps,such as AI agents and high-level reinforcement learning.Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision,language,and sensory data—into a cohesive decision-making process.Foundation models(FMs)have become pivotal in science and industry,transforming decision-making and research capabilities.Their large-scale,multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare,life sciences,and education.This survey examines IDM’s evolution,advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains,highlighting the challenges and opportunities in building efficient,adaptive,and ethical decision systems.
基金supported by the Guangdong Provincial Science and Technology Program(Grant No.2023A0505030003).
文摘Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.
文摘1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.
基金supported by the National Natural Science Foundation of China under grant nos.62206266 and 62372430the Youth Innovation Promotion Association CAS no.2023112.
文摘INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adoption of this technology.With technological advancements,this situation is undergoing a fundamental transformation.The recent release of DeepSeek-V31 has sparked extensive discussions.Through innovative architectural design and efficient training strategies,it has significantly reduced training costswhile achieving performance comparable to top-tier closed-source models.The pre-training cost of DeepSeek-V3is only$5.576 million,far lower than the hundreds ofmillions of dollars required formodels like GPT-4.As shwon in Figure 1,this breakthrough not onlymarks the democratization of LFM technology but also opens up opportunities for more small-and medium-sized enterprises and research institutions to participate in AI innovation.In the future,LFMs will no longer be a game for the few.
基金supported by the National Natural Science Foundation of China under grants 42030102 and 42371321 and by the Ant Group。
文摘DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing techniques to unleash their application value.1 Generalist EO intelligence refers to the ability to provide unified support for qualitative interpretation,quantitative inversion,and interactive dialogue across diverse EO data and tasks.It has attracted significant attention recently,prompting academia,industry,and government to invest substantial resources.2 Through developing remote sensing foundation models(RSFMs),generalist EO intelligence can ultimately offer humanity a shared spatial-temporal intelligence service in various fields(e.g.,agriculture,forestry,and oceanography).3 However,a critical question remains:have we truly unleashed the potential of RSFMs for generalist EO intelligence?Despite the vast volume of remote sensing data,their distribution is often fragmented and decentralized due to privacy concerns,storage bottlenecks,industrial competition,and geo-information security.This fragmentation leads to data islands,which limit the full utilization of multi-source remote sensing data.Moreover,computility(i.e.,computational resources)typically develops in isolation,inadequately supporting the large-scale training and application of RSFMs.
基金funded by the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI,the Helmholtz Association under the Program“Energy System Design”the German Research Foundation(DFG)as part of the Research Training Group 2153“En-ergy Status Data:Informatics Methods for its Collection,Analysis and Exploitation”+1 种基金supported by the Helmholtz Association Initiative and Networking Fund on the HAICORE@KIT partitionsupport by the KIT-Publication Fund of the Karlsruhe Institute of Technology.
文摘Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality.In contrast,decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality.The practical integration of forecast values into forecasting models is challenging,particularly when addressing complex applications with diverse instances,such as buildings.This becomes even more complicated when instances possess specific characteristics that require instance-specific,tailored predictions to increase the forecast value.To tackle this challenge,we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem.To obtain more robust predictions for scarce building data,we use Moirai as a state-of-the-art foundation model,which offers robust and generalized results with few-shot parameter-efficient fine-tuning.Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Moirai,we observe an improvement of 9.45%in Average Daily Total Costs.
基金funded by the Science and Technology Innovation Key R&D Program of Chongqing(No.CSTB-2022TIAD-STX0008)the Natural Science Foundation of China(Nos.62402473 and 62271465)Suzhou Basic Research Program(No.SYG202338).
文摘With the rapid development of artificial intelligence,computational pathology has been seamlessly integrated into the entire clinical workflow,which encompasses diagnosis,treatment,prognosis,and biomarker discovery.This integration has significantly enhanced clinical accuracy and efficiency while reducing the workload for clinicians.Traditionally,research in this field has depended on the collection and labeling of large datasets for specific tasks,followed by the development of task-specific computational pathology models.However,this approach is labor intensive and does not scale efficiently for open-set identification or rare diseases.Given the diversity of clinical tasks,training individual models from scratch to address the whole spectrum of clinical tasks in the pathology workflow is impractical,which highlights the urgent need to transition from task-specific models to foundation models(FMs).In recent years,pathological FMs have proliferated.These FMs can be classified into three categories,namely,pathology image FMs,pathology image-text FMs,and pathology image-gene FMs,each of which results in distinct functionalities and application scenarios.This review provides an overview of the latest research advancements in pathological FMs,with a particular emphasis on their applications in oncology.The key challenges and opportunities presented by pathological FMs in precision oncology are also explored.
基金supported by the National Key Research and Development Program of China(2023YFF1000100).
文摘Although plant disease recognition is highly important in agricultural production,traditional methods face challenges due to the high costs associated with data collection and the scarcity of samples.Few-shot plant disease identification tasks,which are based on transfer learning,can learn feature representations from a small amount of data;however,most of these methods require pretraining within the relevant domain.Recently,foundation models have demonstrated excellent performance in zero-shot and few-shot learning scenarios.In this study,we explore the potential of foundation models in plant disease recognition by proposing an efficient few-shot plant disease recognition model(PlantCaFo)based on foundation models.This model operates on an end-to-end network structure,integrating prior knowledge from multiple pretraining models.Specifically,we design a lightweight dilated contextual adapter(DCon-Adapter)to learn new knowledge from training data and use a weight decomposition matrix(WDM)to update the text weights.We test the proposed model on a public dataset,PlantVillage,and show that the model achieves an accuracy of 93.53%in a"38-way 16-shot"setting.In addition,we conduct experiments on images collected from natural environments(Cassava dataset),achieving an accuracy improvement of 6.80%over the baseline.To validate the model's generalization performance,we prepare an out-of-distribution dataset with 21 categories,and our model notably increases the accuracy of this dataset.Extensive experiments demonstrate that our model exhibits superior performance over other models in few-shot plant disease identification.
基金supported by grants from Beijing Hospitals Authority’s Ascent Plan(No.DFL20220303)Beijing Municipal Science&Technology Commission(No.Z221100003522008).
文摘Artificial intelligence(AI),particularly deep learning,has demonstrated remarkable performance in medical imaging across a variety of modalities,including X-ray,computed tomography(CT),magnetic resonance imaging(MRI),ultrasound,positron emission tomography(PET),and pathological imaging.However,most existing state-of-the-art AI techniques are task-specific and focus on a limited range of imaging modalities.Compared to these task-specific models,emerging foundation models represent a significant milestone in AI development.These models can learn generalized representations of medical images and apply them to downstream tasks through zero-shot or few-shot fine-tuning.Foundation models have the potential to address the comprehensive and multifactorial challenges encountered in clinical practice.This article reviews the clinical applications of both task-specific and foundation models,highlighting their differences,complementarities,and clinical relevance.We also examine their future research directions and potential challenges.Unlike the replacement relationship seen between deep learning and traditional machine learning,task-specific and foundation models are complementary,despite inherent differences.While foundation models primarily focus on segmentation and classification,task-specific models are integrated into nearly all medical image analyses.However,with further advancements,foundation models could be applied to other clinical scenarios.In conclusion,all indications suggest that task-specific and foundation models,especially the latter,have the potential to drive breakthroughs in medical imaging,from image processing to clinical workflows.
基金supported by the Open Grants of Key Laboratory of Lightning,China Meteorological Administration(Grant Nos.2023KELL-B002 and 2024KELL-A001)the National Natural Science Foundation of China(Grant Nos.62306028,42075088 and U2342215)the Science and Technology Program of Shenzhen,China(Grant No.KJZD20240903102742055)。
文摘Lightning is a significant natural hazard that poses considerable risks to both human safety and industrial operations.Accurate,fine-scale lightning forecasting is crucial for effective disaster prevention.Traditional forecasting methods primarily rely on numerical weather prediction(NWP),which demands substantial computational resources to solve complex atmospheric evolution equations.Recently,deep learning-based weather prediction models—particularly weather foundation models(WFMs)—have demonstrated promising results,achieving performance comparable to NWP while requiring substantially fewer computational resources.However,existing WFMs are unable to directly generate lightning forecasts and struggle to satisfy the high spatial resolution required for fine-scale prediction.To address these limitations,this paper investigates a fine-scale lightning forecasting approach based on WFMs and proposes a dual-source data-driven forecasting framework that integrates the strengths of both WFMs and recent lightning observations to enhance predictive performance.Furthermore,a gated spatiotemporal fusion network(gSTFNet)is designed to address the challenges of cross-temporal and cross-modal fusion inherent in dual-source data integration.gSTFNet employs a dual-encoding structure to separately encode features from WFMs and lightning observations,effectively narrowing the modal gap in the latent feature space.A gated spatiotemporal fusion module is then introduced to model the spatiotemporal correlations between the two types of features,facilitating seamless cross-temporal fusion.The fused features are subsequently processed by a deconvolutional network to generate accurate lightning forecasts.We evaluate the proposed gSTFNet using real-world lightning observation data collected in Guangdong from 2018 to 2022.Experimental results demonstrate that:(1)In terms of the ETS score,the dual-source framework achieves a 50% improvement over models trained solely on WFMs,and a 300% improvement over the HRES lightning forecasting product released by the European Centre for Medium-Range Weather Forecasts(ECMWF);(2)gSTFNet outperforms several state-of-the-art deep learning baselines that utilize dual-source inputs,clearly demonstrating superior forecasting accuracy.
基金supported by the National Major Science and Tech-nology Projects of China[NMSTP](T3142811SN)the Technology and Engineering Center for Space Utilization,Chinese Academy of Sciences[CSU-CAS](CSU-CXXD-ZD-2025-001).
文摘Machine learning has demonstrated remarkable breakthroughs in predicting the state of health(SOH)for lithium-ion batteries.However,conventional methods face critical challenges in cross-domain adaptation,dataset inter-generalization,and long-horizon forecasting due to variations in usage conditions and electrochemical characteristics.Inspired by the success of large language models(LLMs),time-series foundation models(TSFMs)offer an alternative solution to overcome the issues above.Nevertheless,studies to explore the generalization enhancement capability of TSFMs for battery SOH forecasting under different cross domain factors remain insufficient.Therefore,a novel TSFMs based framework named BatteryTSFM is proposed for SOH forecasting.First,we introduce backbone-aware temporal resampling that dynamically adapts preprocessing to structural characteristics of diverse TSFMs,enabling optimal cross-domain generalization through feature scaling.Second,Monte Carlo dropout is integrated into autoregressive inference to quantify the multi-step prediction errors.Across four public datasets,BatteryTSFM reduces RMSE by an average of 35%in cross-condition tasks and 88%in cross-chemistry tasks,indicating that foundation-model methods can deliver reliable long-horizon SOH forecasts for energy systems.We also conduct exploratory analyses that link generalization to fine-tuning dataset size and resampling granularity,yielding practical guidance for deployment.
基金supported by the National Natural Science Foundation of China(Grant Nos.U21A20469 and 61936002)the National Key R&D Program of China(Grant No.2020YFB2104100)grants from the Institute Guo Qiang,THUIBCS,and BLBCI.
文摘Recent studies have indicated that foundation models, such as BERT and GPT, excel atadapting to various downstream tasks. This adaptability has made them a dominant force in buildingartificial intelligence (AI) systems. Moreover, a newresearch paradigm has emerged as visualizationtechniques are incorporated into these models. Thisstudy divides these intersections into two researchareas: visualization for foundation model (VIS4FM)and foundation model for visualization (FM4VIS).In terms of VIS4FM, we explore the primary roleof visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FMaddresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in termsof FM4VIS, we highlight how foundation models canbe used to advance the visualization field itself. Theintersection of foundation models with visualizations ispromising but also introduces a set of challenges. Byhighlighting these challenges and promising opportunities, this study aims to provide a starting point forthe continued exploration of this research avenue.
文摘Transfer learning has revolutionized fields including natural language understanding and computer vision by leveraging large-scale general datasets to pretrain models with foundational knowledge that can then be transferred to improve predictions in a vast range of downstream tasks.More recently,there has been a growth in the adoption of transfer learning approaches in biological fields,where models have been pretrained on massive amounts of biological data and employed to make predictions in a broad range of biological applications.However,unlike in natural language where humans are best suited to evaluate models given a clear understanding of the ground truth,biology presents the unique challenge of being in a setting where there are a plethora of unknowns while at the same time needing to abide by real-world physical constraints.This perspective provides a discussion of some key points we should consider as a field in designing benchmarks for foundation models in network biology.