Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelli...Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies.展开更多
Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-spe...Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-specific preprocessing,which frequently leads to the development of large and complex models.Inspired by the success of Large Language Models(LLMs),transformer-based foundation models have been developed for time series(TSFM).These models have been proven to reconstruct time series in a zero-shot manner,being able to capture different patterns that effectively characterize time series.This paper proposes the use of TSFM to generate embeddings of the input data space,making them more interpretable for machine learning models.To evaluate the effectiveness of our approach,we trained three classical machine learning algorithms and one neural network using the embeddings generated by the TSFM called Moment for predicting the remaining useful life of aircraft engines.We test the models trained with both the full training dataset and only 10%of the training samples.Our results show that training simple models,such as support vector regressors or neural networks,with embeddings generated by Moment not only accelerates the training process but also enhances performance in few-shot learning scenarios,where data is scarce.This suggests a promising alternative to complex deep learning architectures,particularly in industrial contexts with limited labeled data.展开更多
Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the m...Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the model,making it possible to apply models without any training.Therefore,we proposed a workflow based on foundation models and zero training to solve the tasks of photoacoustic(PA)image processing.We employed the Segment Anything Model(SAM)by setting simple prompts and integrating the model's outputs with prior knowledge of the imaged objects to accomplish various tasks,including:(1)removing the skin signal in three-dimensional PA image rendering;(2)dual speed-of-sound reconstruction,and(3)segmentation ofnger blood vessels.Through these demonstrations,we have concluded that FMs can be directly applied in PA imaging without the requirement for network design and training.This potentially allows for a hands-on,convenient approach to achieving efficient and accurate segmentation of PA images.This paper serves as a comprehensive tutorial,facilitating the mastery of the technique through the provision of code and sample datasets.展开更多
With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process...With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process industry.This paper provides a comprehensive overview of the challenges and opportunities presented by the use of foundation models in the process industry,including the frameworks,core applications,and future prospects.First,this paper proposes a framework for foundation models for the process industry.Second,it summarizes the key capabilities of industrial foundation models and their practical applications.Finally,it highlights future research directions and identifies unresolved open issues related to the use of foundation models in the process industry.展开更多
Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence...In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.展开更多
Background:Vision and vision-language foundation models,a subset of advanced artificial intelligence(AI)frameworks,have shown transformative potential in various medical fields.In ophthalmology,these models,particular...Background:Vision and vision-language foundation models,a subset of advanced artificial intelligence(AI)frameworks,have shown transformative potential in various medical fields.In ophthalmology,these models,particularly large language models and vision-based models,have demonstrated great potential to improve diagnostic accuracy,enhance treatment planning,and streamline clinical workflows.However,their deployment in ophthalmology has faced several challenges,particularly regarding generalizability and integration into clinical practice.This systematic review aims to summarize the current evidence on the use of vision and visionlanguage foundation models in ophthalmology,identifying key applications,outcomes,and challenges.Main text:A comprehensive search on PubMed,Web of Science,Scopus,and Google Scholar was conducted to identify studies published between January 2020 and July 2025.Studies were included if they developed or applied foundation models,such as vision-based models and large language models,to clinically relevant ophthalmic applications.A total of 10 studies met the inclusion criteria,covering areas such as retinal diseases,glaucoma,and ocular surface tumor.The primary outcome measures are model performance metrics,integration into clinical workflows,and the clinical utility of the models.Additionally,the review explored the limitations of foundation models,such as the reliance on large datasets,computational resources,and interpretability challenges.The majority of studies demonstrated that foundation models could achieve high diagnostic accuracy,with several reports indicating excellent performance comparable to or exceeding those of experienced clinicians.Foundation models achieved high accuracy rates up to 95%for diagnosing retinal diseases,and similar performances for detecting glaucoma progression.Despite promising results,concerns about algorithmic bias,overfitting,and the need for diverse training data were common.High computational demands,EHR compatibility,and the need for clinician validation also posed challenges.Additionally,model interpretability issues hindered clinician trust and adoption.Conclusions:Vision and vision-language foundation models in ophthalmology show significant potential for advancing diagnostic accuracy and treatment strategies,particularly in retinal diseases,glaucoma,and ocular oncology.However,challenges such as data quality,transparency,and ethical considerations must be addressed.Future research should focus on refining model performance,improving interpretability and generalizability,and exploring strategies for integrating these models into routine clinical practice to maximize their impact in clinical ophthalmology.展开更多
Deep learning’s continuous evolution has driven the creation of increasingly large foundation models,such as GPT-3,which requires optimized performance on large-scale computing platforms.The new Sunway Supercomputer,...Deep learning’s continuous evolution has driven the creation of increasingly large foundation models,such as GPT-3,which requires optimized performance on large-scale computing platforms.The new Sunway Supercomputer,equipped with numerous SW26010pro processors,supports AI workloads in both all-shared and single-CG(core group)modes.However,existing optimizations primarily target AI operators like Generalized Matrix Multiplication(GEMM)in the single-CG mode,leaving challenges in scaling performance across all six CGs in the all-shared mode.This paper introduces SwFormer,a framework designed to accelerate foundation models via intra-op tiling and inter-op scheduling.The intra-op tiling method breaks down operators into fine-grained tiled kernels and employs an offline profiling-based approach to determine the optimal tiling strategy.The inter-op scheduling method employs heuristic graph traversal algorithms to automatically reorder the computation of these tiled kernels,thereby maximizing hardware utilization.Compared with operator libraries for the all-shared mode such as SWDNNv2 and SWattention,SwFormer’s intra-op tiling method accelerates end-to-end GPT-36.7B and 13B models training by up to 1.27x.Evaluated with GPT-style models,the inter-op scheduling method further outperforms the intra-op tiling method by up to 1.32x.展开更多
Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make...Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps,such as AI agents and high-level reinforcement learning.Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision,language,and sensory data—into a cohesive decision-making process.Foundation models(FMs)have become pivotal in science and industry,transforming decision-making and research capabilities.Their large-scale,multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare,life sciences,and education.This survey examines IDM’s evolution,advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains,highlighting the challenges and opportunities in building efficient,adaptive,and ethical decision systems.展开更多
Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in e...Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.展开更多
Universal machine-learning interatomic potentials(uMLIPs)are emerging as foundation models for atomistic simulation,offering near-ab initio accuracy at far lower cost.Their safe,broad deployment is limited by the abse...Universal machine-learning interatomic potentials(uMLIPs)are emerging as foundation models for atomistic simulation,offering near-ab initio accuracy at far lower cost.Their safe,broad deployment is limited by the absence of reliable,general uncertainty estimates.We present a unified,scalable uncertainty metric,U,built from a heterogeneous ensemble that reuses existing pretrained MLIPs.Across diverse chemistries and structures,U strongly tracks true prediction errors and robustly ranks configuration-level risk.Using U,we perform uncertainty-aware distillation to train system-specific potentials with far fewer labels:for tungsten,we match full density-functional-theory(DFT)training using 4%of the DFT data;for MoNbTaW,a dataset distilled by U supports high-accuracy potential training.By filtering numerical label noise,the distilled models can in some cases exceed the accuracy of the MLIPs trained on DFT data.This framework provides a practical reliability monitor and guides data selection and fine-tuning,enabling cost-efficient,accurate,and safer deployment of foundation models.展开更多
1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models tra...1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.展开更多
INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adopti...INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adoption of this technology.With technological advancements,this situation is undergoing a fundamental transformation.The recent release of DeepSeek-V31 has sparked extensive discussions.Through innovative architectural design and efficient training strategies,it has significantly reduced training costswhile achieving performance comparable to top-tier closed-source models.The pre-training cost of DeepSeek-V3is only$5.576 million,far lower than the hundreds ofmillions of dollars required formodels like GPT-4.As shwon in Figure 1,this breakthrough not onlymarks the democratization of LFM technology but also opens up opportunities for more small-and medium-sized enterprises and research institutions to participate in AI innovation.In the future,LFMs will no longer be a game for the few.展开更多
Large language models,commonly known as LLMs,are showing promise in tacking some of the most complex tasks in AI.In this perspective,we review the wider field of foundation models-of which LLMs are a component-and the...Large language models,commonly known as LLMs,are showing promise in tacking some of the most complex tasks in AI.In this perspective,we review the wider field of foundation models-of which LLMs are a component-and their application to the field of materials discovery.In addition to the current state of the art-including applications to property prediction,synthesis planning and molecular generation-we also take a look to the future,and posit how new methods of data capture,and indeed modalities of data,will influence the direction of this emerging field.展开更多
DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing ...DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing techniques to unleash their application value.1 Generalist EO intelligence refers to the ability to provide unified support for qualitative interpretation,quantitative inversion,and interactive dialogue across diverse EO data and tasks.It has attracted significant attention recently,prompting academia,industry,and government to invest substantial resources.2 Through developing remote sensing foundation models(RSFMs),generalist EO intelligence can ultimately offer humanity a shared spatial-temporal intelligence service in various fields(e.g.,agriculture,forestry,and oceanography).3 However,a critical question remains:have we truly unleashed the potential of RSFMs for generalist EO intelligence?Despite the vast volume of remote sensing data,their distribution is often fragmented and decentralized due to privacy concerns,storage bottlenecks,industrial competition,and geo-information security.This fragmentation leads to data islands,which limit the full utilization of multi-source remote sensing data.Moreover,computility(i.e.,computational resources)typically develops in isolation,inadequately supporting the large-scale training and application of RSFMs.展开更多
For neural network potentials(NNPs)to gain widespread use,researchers must be able to trust model outputs.However,the blackbox nature of neural networks and their inherent stochasticity are often deterrents,especially...For neural network potentials(NNPs)to gain widespread use,researchers must be able to trust model outputs.However,the blackbox nature of neural networks and their inherent stochasticity are often deterrents,especially for foundationmodels trained over broad swaths of chemical space.Uncertainty information provided at the time of prediction can help reduce aversion to NNPs.In this work,we detail two uncertainty quantification(UQ)methods.Readout ensembling,by finetuning the readout layers of an ensemble of foundation models,provides information about model uncertainty,while quantile regression,by replacing point predictions with distributional predictions,provides information about uncertainty within the underlying training data.We demonstrate our approach with the MACE-MP-0 model,applying UQ to the foundation model and a series of finetuned models.The uncertainties produced by the readout ensemble and quantile methods are demonstrated to be distinct measures by which the quality of the NNP output can be judged.展开更多
Machine-learned interatomic potentials are revolutionising atomistic materials simulations by providing accurate and scalable predictions within the scope covered by the training data.However,generation of an accurate...Machine-learned interatomic potentials are revolutionising atomistic materials simulations by providing accurate and scalable predictions within the scope covered by the training data.However,generation of an accurate and robust training data set remains a challenge,often requiring thousands of first-principles calculations to achieve high accuracy.Foundation models have started to emerge with the ambition to create universally applicable potentials across a wide range of materials.While foundation models can be robust and transferable,they do not yet achieve the accuracy required to predict reaction barriers,phase transitions,and material stability.This work demonstrates that foundation model potentials can reach chemical accuracy when fine-tuned using transfer learning with partially frozen weights and biases.For two challenging datasets on reactive chemistry at surfaces and stability and elastic properties of tertiary alloys,we show that frozen transfer learning with 10–20%of the data(hundreds of datapoints)achieves similar accuracies to models trained from scratch(on thousands of datapoints).Moreover,we show that an equally accurate,but significantly more efficient surrogate model can be built using the transfer learned potential as the ground truth.In combination,we present a simulation workflow for machine learning potentials that improves data efficiency and computational efficiency.展开更多
Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to ma...Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality.In contrast,decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality.The practical integration of forecast values into forecasting models is challenging,particularly when addressing complex applications with diverse instances,such as buildings.This becomes even more complicated when instances possess specific characteristics that require instance-specific,tailored predictions to increase the forecast value.To tackle this challenge,we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem.To obtain more robust predictions for scarce building data,we use Moirai as a state-of-the-art foundation model,which offers robust and generalized results with few-shot parameter-efficient fine-tuning.Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Moirai,we observe an improvement of 9.45%in Average Daily Total Costs.展开更多
This study investigates the generalisation and explainability challenges of Robotic Foundation Models(RFMs)in industrial applications,using Octo as a representative case study.Motivated by the scarcity of domain-speci...This study investigates the generalisation and explainability challenges of Robotic Foundation Models(RFMs)in industrial applications,using Octo as a representative case study.Motivated by the scarcity of domain-specific data and the need for safe evaluation environments,we adopt a simulation-first approach:instead of transitioning from simulation to real-world scenarios,we aim to adapt real-world-trained RFMs to synthetic,simulated environments—a critical step towards their safe and effective industrial deployment.While Octo promises zero-shot generalisation,our experiments reveal significant performance degradation when applied in simulation,despite minimal task and observation domain shifts.To explain this behaviour,we introduce a modified Grad-CAM technique that enables insight into Octo’s internal reasoning and focus areas.Our results highlight key limitations in Octo’s visual generalisation and language grounding capabilities under distribution shifts.We further identify architectural and benchmarking challenges across the broader RFM landscape.Based on our findings,we propose concrete guidelines for future RFM development,with an emphasis on explainability,modularity,and robust benchmarking—critical enablers for applying RFMs in safety-critical and data-scarce industrial environments.展开更多
With the rapid development of artificial intelligence,computational pathology has been seamlessly integrated into the entire clinical workflow,which encompasses diagnosis,treatment,prognosis,and biomarker discovery.Th...With the rapid development of artificial intelligence,computational pathology has been seamlessly integrated into the entire clinical workflow,which encompasses diagnosis,treatment,prognosis,and biomarker discovery.This integration has significantly enhanced clinical accuracy and efficiency while reducing the workload for clinicians.Traditionally,research in this field has depended on the collection and labeling of large datasets for specific tasks,followed by the development of task-specific computational pathology models.However,this approach is labor intensive and does not scale efficiently for open-set identification or rare diseases.Given the diversity of clinical tasks,training individual models from scratch to address the whole spectrum of clinical tasks in the pathology workflow is impractical,which highlights the urgent need to transition from task-specific models to foundation models(FMs).In recent years,pathological FMs have proliferated.These FMs can be classified into three categories,namely,pathology image FMs,pathology image-text FMs,and pathology image-gene FMs,each of which results in distinct functionalities and application scenarios.This review provides an overview of the latest research advancements in pathological FMs,with a particular emphasis on their applications in oncology.The key challenges and opportunities presented by pathological FMs in precision oncology are also explored.展开更多
基金Supported by the Open Project Program of Panxi Crops Research and Utilization Key Laboratory of Sichuan Province,No.SZKF202302the Fundamental Research Funds for the Central Universities No.2019CDYGYB024.
文摘Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies.
基金Funded by the Spanish Government and FEDER funds(AEI/FEDER,UE)under grant PID2021-124502OB-C42(PRESECREL)the predoctoral program“Concepción Arenal del Programa de Personal Investigador en formación Predoctoral”funded by Universidad de Cantabria and Cantabria’s Government(BOC 18-10-2021).
文摘Predictive maintenance often involves imbalanced multivariate time series datasets with scarce failure events,posing challenges for model training due to the high dimensionality of the data and the need for domain-specific preprocessing,which frequently leads to the development of large and complex models.Inspired by the success of Large Language Models(LLMs),transformer-based foundation models have been developed for time series(TSFM).These models have been proven to reconstruct time series in a zero-shot manner,being able to capture different patterns that effectively characterize time series.This paper proposes the use of TSFM to generate embeddings of the input data space,making them more interpretable for machine learning models.To evaluate the effectiveness of our approach,we trained three classical machine learning algorithms and one neural network using the embeddings generated by the TSFM called Moment for predicting the remaining useful life of aircraft engines.We test the models trained with both the full training dataset and only 10%of the training samples.Our results show that training simple models,such as support vector regressors or neural networks,with embeddings generated by Moment not only accelerates the training process but also enhances performance in few-shot learning scenarios,where data is scarce.This suggests a promising alternative to complex deep learning architectures,particularly in industrial contexts with limited labeled data.
基金support from Strategic Project of Precision Surgery,Tsinghua UniversityInitiative Scientific Research Program,Institute for Intelligent Healthcare,Tsinghua University+5 种基金Tsinghua-Foshan Institute of Advanced ManufacturingNational Natural Science Foundation of China(61735016)Beijing Nova Program(20230484308)Young Elite Scientists Sponsorship Program by CAST(2023QNRC001)Youth Elite Program of Beijing Friendship Hospital(YYQCJH2022-9)Science and Technology Program of Beijing Tongzhou District(KJ2023CX012).
文摘Foundation models(FMs)have rapidly evolved and have achieved signicant accomplishments in computer vision tasks.Specically,the prompt mechanism conveniently allows users to integrate image prior information into the model,making it possible to apply models without any training.Therefore,we proposed a workflow based on foundation models and zero training to solve the tasks of photoacoustic(PA)image processing.We employed the Segment Anything Model(SAM)by setting simple prompts and integrating the model's outputs with prior knowledge of the imaged objects to accomplish various tasks,including:(1)removing the skin signal in three-dimensional PA image rendering;(2)dual speed-of-sound reconstruction,and(3)segmentation ofnger blood vessels.Through these demonstrations,we have concluded that FMs can be directly applied in PA imaging without the requirement for network design and training.This potentially allows for a hands-on,convenient approach to achieving efficient and accurate segmentation of PA images.This paper serves as a comprehensive tutorial,facilitating the mastery of the technique through the provision of code and sample datasets.
基金supported by the National Natural Science Foundation of China(62225302,623B2014,and 62173023).
文摘With the emergence of general foundational models,such as Chat Generative Pre-trained Transformer(ChatGPT),researchers have shown considerable interest in the potential applications of foundation models in the process industry.This paper provides a comprehensive overview of the challenges and opportunities presented by the use of foundation models in the process industry,including the frameworks,core applications,and future prospects.First,this paper proposes a framework for foundation models for the process industry.Second,it summarizes the key capabilities of industrial foundation models and their practical applications.Finally,it highlights future research directions and identifies unresolved open issues related to the use of foundation models in the process industry.
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
基金supported in part by the National Natural Science Foundation of China under Grant(62001246,62231017,62201277,62071255)the Natural Science Foundation of Jiangsu Province under Grant BK20220390+3 种基金Key R and D Program of Jiangsu Province Key project and topics under Grant(BE2021095,BE2023035)the Natural Science Research Startup Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications(Grant No.NY221011)National Science Foundation of Xiamen,China(No.3502Z202372013)Open Project of the Key Laboratory of Underwater Acoustic Communication and Marine Information Technology(Xiamen University)of the Ministry of Education,China(No.UAC202304)。
文摘In the future development direction of the sixth generation(6G)mobile communication,several communication models are proposed to face the growing challenges of the task.The rapid development of artificial intelligence(AI)foundation models provides significant support for efficient and intelligent communication interactions.In this paper,we propose an innovative semantic communication paradigm called task-oriented semantic communication system with foundation models.First,we segment the image by using task prompts based on the segment anything model(SAM)and contrastive language-image pretraining(CLIP).Meanwhile,we adopt Bezier curve to enhance the mask to improve the segmentation accuracy.Second,we have differentiated semantic compression and transmission approaches for segmented content.Third,we fuse different semantic information based on the conditional diffusion model to generate high-quality images that satisfy the users'specific task requirements.Finally,the experimental results show that the proposed system compresses the semantic information effectively and improves the robustness of semantic communication.
基金supported by Natural Science Foundation of China(grant number 82201195).
文摘Background:Vision and vision-language foundation models,a subset of advanced artificial intelligence(AI)frameworks,have shown transformative potential in various medical fields.In ophthalmology,these models,particularly large language models and vision-based models,have demonstrated great potential to improve diagnostic accuracy,enhance treatment planning,and streamline clinical workflows.However,their deployment in ophthalmology has faced several challenges,particularly regarding generalizability and integration into clinical practice.This systematic review aims to summarize the current evidence on the use of vision and visionlanguage foundation models in ophthalmology,identifying key applications,outcomes,and challenges.Main text:A comprehensive search on PubMed,Web of Science,Scopus,and Google Scholar was conducted to identify studies published between January 2020 and July 2025.Studies were included if they developed or applied foundation models,such as vision-based models and large language models,to clinically relevant ophthalmic applications.A total of 10 studies met the inclusion criteria,covering areas such as retinal diseases,glaucoma,and ocular surface tumor.The primary outcome measures are model performance metrics,integration into clinical workflows,and the clinical utility of the models.Additionally,the review explored the limitations of foundation models,such as the reliance on large datasets,computational resources,and interpretability challenges.The majority of studies demonstrated that foundation models could achieve high diagnostic accuracy,with several reports indicating excellent performance comparable to or exceeding those of experienced clinicians.Foundation models achieved high accuracy rates up to 95%for diagnosing retinal diseases,and similar performances for detecting glaucoma progression.Despite promising results,concerns about algorithmic bias,overfitting,and the need for diverse training data were common.High computational demands,EHR compatibility,and the need for clinician validation also posed challenges.Additionally,model interpretability issues hindered clinician trust and adoption.Conclusions:Vision and vision-language foundation models in ophthalmology show significant potential for advancing diagnostic accuracy and treatment strategies,particularly in retinal diseases,glaucoma,and ocular oncology.However,challenges such as data quality,transparency,and ethical considerations must be addressed.Future research should focus on refining model performance,improving interpretability and generalizability,and exploring strategies for integrating these models into routine clinical practice to maximize their impact in clinical ophthalmology.
基金supported by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDB0500102supported by Laoshan Laboratory under Grant No.LSKJ202300305.
文摘Deep learning’s continuous evolution has driven the creation of increasingly large foundation models,such as GPT-3,which requires optimized performance on large-scale computing platforms.The new Sunway Supercomputer,equipped with numerous SW26010pro processors,supports AI workloads in both all-shared and single-CG(core group)modes.However,existing optimizations primarily target AI operators like Generalized Matrix Multiplication(GEMM)in the single-CG mode,leaving challenges in scaling performance across all six CGs in the all-shared mode.This paper introduces SwFormer,a framework designed to accelerate foundation models via intra-op tiling and inter-op scheduling.The intra-op tiling method breaks down operators into fine-grained tiled kernels and employs an offline profiling-based approach to determine the optimal tiling strategy.The inter-op scheduling method employs heuristic graph traversal algorithms to automatically reorder the computation of these tiled kernels,thereby maximizing hardware utilization.Compared with operator libraries for the all-shared mode such as SWDNNv2 and SWattention,SwFormer’s intra-op tiling method accelerates end-to-end GPT-36.7B and 13B models training by up to 1.27x.Evaluated with GPT-style models,the inter-op scheduling method further outperforms the intra-op tiling method by up to 1.32x.
基金supported by the National Natural Science Foundation of China under grant nos.62372470,72225011,62402414,U23B2059,62173034,32222070,62402017,72421002,62206303,62476264,62406312,62102266,52173241,and U23A20468the National Key Research and Development Program of China(2023YFD1900604)+8 种基金the Strategic Priority Research Program of the Chinese Academy of Science(XDB0680301)the Youth Innovation Promotion Association CAS(2023112)the National High Level Hospital Clinical Research funding(2022-PUMCH-A-014),the Beijing Natural Science Foundation(4244098)the Science and Technology Innovation Program of Hunan Province(2023RC3009)the Key Research and Development Program of Yunnan Province(202202AE090034)the MNR Key Laboratory for Geo-Environmental Monitoring of Greater Bay Area(GEMLab-2023001)the Science and Technology Innovation Key R&D Program of Chongqing(CSTB2024TIAD-STX0024)the China National Postdoctoral Program for Innovative Talents(BX20240385)the River Talent Recruitment Program of Guangdong Province(2019ZT08X603).
文摘Intelligent decision-making(IDM)is a cornerstone of artificial intelligence(AI)designed to automate or augment decision processes.Modern IDM paradigms integrate advanced frameworks to enable intelligent agents to make effective and adaptive choices and decompose complex tasks into manageable steps,such as AI agents and high-level reinforcement learning.Recent advances in multimodal foundation-based approaches unify diverse input modalities—such as vision,language,and sensory data—into a cohesive decision-making process.Foundation models(FMs)have become pivotal in science and industry,transforming decision-making and research capabilities.Their large-scale,multimodal data-processing abilities foster adaptability and interdisciplinary breakthroughs across fields such as healthcare,life sciences,and education.This survey examines IDM’s evolution,advanced paradigms with FMs and their transformative impact on decision-making across diverse scientific and industrial domains,highlighting the challenges and opportunities in building efficient,adaptive,and ethical decision systems.
基金supported by the Guangdong Provincial Science and Technology Program(Grant No.2023A0505030003).
文摘Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.
基金sponsored by Nederlandse Organisatie voor WetenschappelijkOnderzoek (The Netherlands Organization for Scientific Research, NWO) domain Science for the use of supercomputer facilities. The authors also acknowledge the use of DelftBlue supercomputer, provided by Delft High Performance Computing Center (https://www.tudelft.nl/dhpc).
文摘Universal machine-learning interatomic potentials(uMLIPs)are emerging as foundation models for atomistic simulation,offering near-ab initio accuracy at far lower cost.Their safe,broad deployment is limited by the absence of reliable,general uncertainty estimates.We present a unified,scalable uncertainty metric,U,built from a heterogeneous ensemble that reuses existing pretrained MLIPs.Across diverse chemistries and structures,U strongly tracks true prediction errors and robustly ranks configuration-level risk.Using U,we perform uncertainty-aware distillation to train system-specific potentials with far fewer labels:for tungsten,we match full density-functional-theory(DFT)training using 4%of the DFT data;for MoNbTaW,a dataset distilled by U supports high-accuracy potential training.By filtering numerical label noise,the distilled models can in some cases exceed the accuracy of the MLIPs trained on DFT data.This framework provides a practical reliability monitor and guides data selection and fine-tuning,enabling cost-efficient,accurate,and safer deployment of foundation models.
文摘1 Introduction With rapid development in computing power and breakthroughs in deep learning,the concept of“foundation models”has been introduced into the AI community.Generally,foundation models are large models trained on massive data and can be easily adapted to different domains for various tasks.With specific prompts,foundation models can generate texts and images,or even animate scenarios based on the given descriptions.Due to powerful capabilities,there is a growing trend to build agents based on foundation models.In this paper,we conduct an investigation into agents empowered by the foundation models.
基金supported by the National Natural Science Foundation of China under grant nos.62206266 and 62372430the Youth Innovation Promotion Association CAS no.2023112.
文摘INTRODUCTION In recent years,the development of large-scale foundationmodels(LFMs)has made great advances.However,the high training costs and computational demands have long been a bottleneck for the widespread adoption of this technology.With technological advancements,this situation is undergoing a fundamental transformation.The recent release of DeepSeek-V31 has sparked extensive discussions.Through innovative architectural design and efficient training strategies,it has significantly reduced training costswhile achieving performance comparable to top-tier closed-source models.The pre-training cost of DeepSeek-V3is only$5.576 million,far lower than the hundreds ofmillions of dollars required formodels like GPT-4.As shwon in Figure 1,this breakthrough not onlymarks the democratization of LFM technology but also opens up opportunities for more small-and medium-sized enterprises and research institutions to participate in AI innovation.In the future,LFMs will no longer be a game for the few.
文摘Large language models,commonly known as LLMs,are showing promise in tacking some of the most complex tasks in AI.In this perspective,we review the wider field of foundation models-of which LLMs are a component-and their application to the field of materials discovery.In addition to the current state of the art-including applications to property prediction,synthesis planning and molecular generation-we also take a look to the future,and posit how new methods of data capture,and indeed modalities of data,will influence the direction of this emerging field.
基金supported by the National Natural Science Foundation of China under grants 42030102 and 42371321 and by the Ant Group。
文摘DATA AND COMPUTILITY ISLANDS IN REMOTE SENSING FOR EO The rapid advancement of Earth observation(EO)capabilities is driving an explosive increase in remote sensing data.There is an urgent need for advanced processing techniques to unleash their application value.1 Generalist EO intelligence refers to the ability to provide unified support for qualitative interpretation,quantitative inversion,and interactive dialogue across diverse EO data and tasks.It has attracted significant attention recently,prompting academia,industry,and government to invest substantial resources.2 Through developing remote sensing foundation models(RSFMs),generalist EO intelligence can ultimately offer humanity a shared spatial-temporal intelligence service in various fields(e.g.,agriculture,forestry,and oceanography).3 However,a critical question remains:have we truly unleashed the potential of RSFMs for generalist EO intelligence?Despite the vast volume of remote sensing data,their distribution is often fragmented and decentralized due to privacy concerns,storage bottlenecks,industrial competition,and geo-information security.This fragmentation leads to data islands,which limit the full utilization of multi-source remote sensing data.Moreover,computility(i.e.,computational resources)typically develops in isolation,inadequately supporting the large-scale training and application of RSFMs.
基金supported by the"Transferring exascale computational chemistry to cloud computing environment and emerging hardware technologies(TEC4)"project,which is funded by the U.S.Department of Energy,Office of Science,Office of Basic Energy Sciences,the Division of Chemical Sciences,Geosciences,and Biosciences(under FWP 82037)supported by the U.S.Department of Energy(DOE),Office of Science,Office of Basic Energy Sciences,Division of Chemical Sciences,Geosciences&Biosciences(under FWP 47319)Pacific Northwest National Laboratory(PNNL)is a multiprogram national laboratory operated for the U.S.Department of Energy(DOE)by Battelle Memorial Institute under Contract No.DE-AC05-76RL0-1830.
文摘For neural network potentials(NNPs)to gain widespread use,researchers must be able to trust model outputs.However,the blackbox nature of neural networks and their inherent stochasticity are often deterrents,especially for foundationmodels trained over broad swaths of chemical space.Uncertainty information provided at the time of prediction can help reduce aversion to NNPs.In this work,we detail two uncertainty quantification(UQ)methods.Readout ensembling,by finetuning the readout layers of an ensemble of foundation models,provides information about model uncertainty,while quantile regression,by replacing point predictions with distributional predictions,provides information about uncertainty within the underlying training data.We demonstrate our approach with the MACE-MP-0 model,applying UQ to the foundation model and a series of finetuned models.The uncertainties produced by the readout ensemble and quantile methods are demonstrated to be distinct measures by which the quality of the NNP output can be judged.
基金support from the UKRI Future Leaders Fellowship program(MR/S016023/1,MR/X023109/1)a UKRI frontier research grant(ERC StG,EP/X014088/1)+1 种基金the CASTEP-USER grant funded by UK Research and Innovation(EP/W030438/1)the EPSRC-funded Centre for Doctoral Training in Modelling of Heterogeneous Systems(Het-Sys CDT,EP/S022848/1).High-performance computing resources were provided via the Scientific Computing Research Technology Platformof the University of Warwick.
文摘Machine-learned interatomic potentials are revolutionising atomistic materials simulations by providing accurate and scalable predictions within the scope covered by the training data.However,generation of an accurate and robust training data set remains a challenge,often requiring thousands of first-principles calculations to achieve high accuracy.Foundation models have started to emerge with the ambition to create universally applicable potentials across a wide range of materials.While foundation models can be robust and transferable,they do not yet achieve the accuracy required to predict reaction barriers,phase transitions,and material stability.This work demonstrates that foundation model potentials can reach chemical accuracy when fine-tuned using transfer learning with partially frozen weights and biases.For two challenging datasets on reactive chemistry at surfaces and stability and elastic properties of tertiary alloys,we show that frozen transfer learning with 10–20%of the data(hundreds of datapoints)achieves similar accuracies to models trained from scratch(on thousands of datapoints).Moreover,we show that an equally accurate,but significantly more efficient surrogate model can be built using the transfer learned potential as the ground truth.In combination,we present a simulation workflow for machine learning potentials that improves data efficiency and computational efficiency.
基金funded by the Helmholtz Association’s Initiative and Networking Fund through Helmholtz AI,the Helmholtz Association under the Program“Energy System Design”the German Research Foundation(DFG)as part of the Research Training Group 2153“En-ergy Status Data:Informatics Methods for its Collection,Analysis and Exploitation”+1 种基金supported by the Helmholtz Association Initiative and Networking Fund on the HAICORE@KIT partitionsupport by the KIT-Publication Fund of the Karlsruhe Institute of Technology.
文摘Time series foundation models provide a universal solution for generating forecasts to support optimization problems in energy systems.Those foundation models are typically trained in a prediction-focused manner to maximize forecast quality.In contrast,decision-focused learning directly improves the resulting value of the forecast in downstream optimization rather than merely maximizing forecasting quality.The practical integration of forecast values into forecasting models is challenging,particularly when addressing complex applications with diverse instances,such as buildings.This becomes even more complicated when instances possess specific characteristics that require instance-specific,tailored predictions to increase the forecast value.To tackle this challenge,we use decision-focused fine-tuning within time series foundation models to offer a scalable and efficient solution for decision-focused learning applied to the dispatchable feeder optimization problem.To obtain more robust predictions for scarce building data,we use Moirai as a state-of-the-art foundation model,which offers robust and generalized results with few-shot parameter-efficient fine-tuning.Comparing the decision-focused fine-tuned Moirai with a state-of-the-art classical prediction-focused fine-tuning Moirai,we observe an improvement of 9.45%in Average Daily Total Costs.
文摘This study investigates the generalisation and explainability challenges of Robotic Foundation Models(RFMs)in industrial applications,using Octo as a representative case study.Motivated by the scarcity of domain-specific data and the need for safe evaluation environments,we adopt a simulation-first approach:instead of transitioning from simulation to real-world scenarios,we aim to adapt real-world-trained RFMs to synthetic,simulated environments—a critical step towards their safe and effective industrial deployment.While Octo promises zero-shot generalisation,our experiments reveal significant performance degradation when applied in simulation,despite minimal task and observation domain shifts.To explain this behaviour,we introduce a modified Grad-CAM technique that enables insight into Octo’s internal reasoning and focus areas.Our results highlight key limitations in Octo’s visual generalisation and language grounding capabilities under distribution shifts.We further identify architectural and benchmarking challenges across the broader RFM landscape.Based on our findings,we propose concrete guidelines for future RFM development,with an emphasis on explainability,modularity,and robust benchmarking—critical enablers for applying RFMs in safety-critical and data-scarce industrial environments.
基金funded by the Science and Technology Innovation Key R&D Program of Chongqing(No.CSTB-2022TIAD-STX0008)the Natural Science Foundation of China(Nos.62402473 and 62271465)Suzhou Basic Research Program(No.SYG202338).
文摘With the rapid development of artificial intelligence,computational pathology has been seamlessly integrated into the entire clinical workflow,which encompasses diagnosis,treatment,prognosis,and biomarker discovery.This integration has significantly enhanced clinical accuracy and efficiency while reducing the workload for clinicians.Traditionally,research in this field has depended on the collection and labeling of large datasets for specific tasks,followed by the development of task-specific computational pathology models.However,this approach is labor intensive and does not scale efficiently for open-set identification or rare diseases.Given the diversity of clinical tasks,training individual models from scratch to address the whole spectrum of clinical tasks in the pathology workflow is impractical,which highlights the urgent need to transition from task-specific models to foundation models(FMs).In recent years,pathological FMs have proliferated.These FMs can be classified into three categories,namely,pathology image FMs,pathology image-text FMs,and pathology image-gene FMs,each of which results in distinct functionalities and application scenarios.This review provides an overview of the latest research advancements in pathological FMs,with a particular emphasis on their applications in oncology.The key challenges and opportunities presented by pathological FMs in precision oncology are also explored.