Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved percepti...Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.展开更多
Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step...Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step generation processes are often inefficient and difficult to control.To address these challenges,we propose CAFE-GAN,a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination,which incorporates a pretrained CLIP model along with several key architectural innovations.First,we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation.Second,we introduce a trainable linear projection layer after the CLIP text encoder,which aligns textual embeddings with the generator’s semantic space.Third,we design a multi-scale discriminator that leverages pre-trained visual features and integrates a feature regularization strategy,thereby improving training stability and discrimination performance.Experiments on the CUB and COCO datasets demonstrate that CAFE-GAN outperforms existing text-to-image generation methods,achieving lower Fréchet Inception Distance(FID)scores and generating images with superior visual quality and semantic fidelity,with FID scores of 9.84 and 5.62 on the CUB and COCO datasets,respectively,surpassing current state-of-the-art text-to-image models by varying degrees.These findings offer valuable insights for future research on efficient,controllable text-to-image synthesis.展开更多
Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprec...Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice.展开更多
Microdispersion technology is crucial for a variety of applications in both the chemical and biomedical fields.The precise and rapid characterization of microdroplets and microbubbles is essential for research as well...Microdispersion technology is crucial for a variety of applications in both the chemical and biomedical fields.The precise and rapid characterization of microdroplets and microbubbles is essential for research as well as for optimizing and controlling industrial processes.Traditional methods often rely on time-consuming manual analysis.Although some deep learning-based computer vision methods have been proposed for automated identification and characterization,these approaches often rely on supervised learning,which requires labeled data for model training.This dependency on labeled data can be time-consuming and expensive,especially when working with large and complex datasets.To address these challenges,we propose Micro Flow SAM,an innovative,motion-prompted,annotation-free,and training-free instance segmentation approach.By utilizing motion of microdroplets and microbubbles as prompts,our method directs large-scale vision models to perform accurate instance segmentation without the need for annotated data or model training.This approach eliminates the need for human intervention in data labeling and reduces computational costs,significantly streamlining the data analysis process.We demonstrate the effectiveness of Micro Flow SAM across 12 diverse datasets,achieving outstanding segmentation results that are competitive with traditional methods.This novel approach not only accelerates the analysis process but also establishes a foundation for efficient process control and optimization in microfluidic applications.Micro Flow SAM represents a breakthrough in reducing the complexities and resource demands of instance segmentation,enabling faster insights and advancements in the microdispersion field.展开更多
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.I...The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.In response,astronomers are turning to deep learning techniques,but these methods are limited by their specific training sets,leading to considerable duplicate workloads.To overcome this issue,we built a framework for the general analysis of galaxy images based on a large vision model(LVM)plus downstream tasks(DST),including galaxy morphological classification,image restoration object detection,parameter extraction,and more.Considering the low signal-to-noise ratios of galaxy images and the imbalanced distribution of galaxy categories,we designed our LVM to incorporate a Human-in-the-loop(HITL)module,which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively.The proposed framework exhibits notable fewshot learning capabilities and versatile adaptability for all the abovementioned tasks on galaxy images in the DESI Legacy Imaging Surveys.In particular,for the object detection task,which was trained using 1000 data points,our DST in the LVM achieved an accuracy of 96.7%,while ResNet50 plus Mask R-CNN reached an accuracy of 93.1%.For morphological classification,to obtain an area under the curve(AUC)of~0.9,LVM plus DST and HITL only requested 1/50 of the training sets that ResNet18 requested.In addition,multimodal data can be integrated,which creates possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-messenger astronomy.展开更多
文摘Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.
文摘Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step generation processes are often inefficient and difficult to control.To address these challenges,we propose CAFE-GAN,a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination,which incorporates a pretrained CLIP model along with several key architectural innovations.First,we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation.Second,we introduce a trainable linear projection layer after the CLIP text encoder,which aligns textual embeddings with the generator’s semantic space.Third,we design a multi-scale discriminator that leverages pre-trained visual features and integrates a feature regularization strategy,thereby improving training stability and discrimination performance.Experiments on the CUB and COCO datasets demonstrate that CAFE-GAN outperforms existing text-to-image generation methods,achieving lower Fréchet Inception Distance(FID)scores and generating images with superior visual quality and semantic fidelity,with FID scores of 9.84 and 5.62 on the CUB and COCO datasets,respectively,surpassing current state-of-the-art text-to-image models by varying degrees.These findings offer valuable insights for future research on efficient,controllable text-to-image synthesis.
基金This work was supported by the National Key R&D Program of China(2023YFC2415200)National Natural Science Foundation of China(82361168664,82372053,82441018,U24A20759,62222609,62076236,32350010,82302407,82302296)+3 种基金Beijing Natural Science Foundation(JQ24048,7232346)Beijing Nova Program(20240484528)Science and Technology Development Fund of Macao Special Administrative Region(0006/2023/AFJ)China Postdoctoral Science Foundation(2022M720357).
文摘Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice.
基金the financial support from National Natural Science Foundation of China (21991104)。
文摘Microdispersion technology is crucial for a variety of applications in both the chemical and biomedical fields.The precise and rapid characterization of microdroplets and microbubbles is essential for research as well as for optimizing and controlling industrial processes.Traditional methods often rely on time-consuming manual analysis.Although some deep learning-based computer vision methods have been proposed for automated identification and characterization,these approaches often rely on supervised learning,which requires labeled data for model training.This dependency on labeled data can be time-consuming and expensive,especially when working with large and complex datasets.To address these challenges,we propose Micro Flow SAM,an innovative,motion-prompted,annotation-free,and training-free instance segmentation approach.By utilizing motion of microdroplets and microbubbles as prompts,our method directs large-scale vision models to perform accurate instance segmentation without the need for annotated data or model training.This approach eliminates the need for human intervention in data labeling and reduces computational costs,significantly streamlining the data analysis process.We demonstrate the effectiveness of Micro Flow SAM across 12 diverse datasets,achieving outstanding segmentation results that are competitive with traditional methods.This novel approach not only accelerates the analysis process but also establishes a foundation for efficient process control and optimization in microfluidic applications.Micro Flow SAM represents a breakthrough in reducing the complexities and resource demands of instance segmentation,enabling faster insights and advancements in the microdispersion field.
基金the support from the National Natural Science Foundation of China(Grant Nos.12173027,12303105,12173062)the National Key R&D Program of China(Grant Nos.2023YFF0725300,2022YFF0503402)+5 种基金the Science Research Grants from the Square Kilometre Array(SKA)(2020SKA0110100)the Science Research Grants from the China Manned Space Project(Grant Nos.CMS-CSST-2021-A01,CMS-CSST-2021-A07,CMS-CSST-2021-B05)the CAS Project for Young Scientists in Basic ResearchChina(Grant No.YSBR-062)supported by the Young Data Scientist Project of the National Astronomical Data Centerthe Program of Science and Education Integration at the School of Astronomy and Space Science,University of Chinese Academy of Sciences,China。
文摘The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.In response,astronomers are turning to deep learning techniques,but these methods are limited by their specific training sets,leading to considerable duplicate workloads.To overcome this issue,we built a framework for the general analysis of galaxy images based on a large vision model(LVM)plus downstream tasks(DST),including galaxy morphological classification,image restoration object detection,parameter extraction,and more.Considering the low signal-to-noise ratios of galaxy images and the imbalanced distribution of galaxy categories,we designed our LVM to incorporate a Human-in-the-loop(HITL)module,which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively.The proposed framework exhibits notable fewshot learning capabilities and versatile adaptability for all the abovementioned tasks on galaxy images in the DESI Legacy Imaging Surveys.In particular,for the object detection task,which was trained using 1000 data points,our DST in the LVM achieved an accuracy of 96.7%,while ResNet50 plus Mask R-CNN reached an accuracy of 93.1%.For morphological classification,to obtain an area under the curve(AUC)of~0.9,LVM plus DST and HITL only requested 1/50 of the training sets that ResNet18 requested.In addition,multimodal data can be integrated,which creates possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-messenger astronomy.