期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
Special Topic on Security of Large Models
1
作者 SU Zhou DU Linkang 《ZTE Communications》 2025年第3期1-2,共2页
Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved percepti... Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry. 展开更多
关键词 large modelssuch SECURITY multimodal agentshave multimodal rea soningand large language models llms vision language data provenance copyright auditing backdoor vulnerabilities vision language models
在线阅读 下载PDF
CAFE-GAN: CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination
2
作者 Xuanhong Wang Hongyu Guo +3 位作者 Jiazhen Li Mingchen Wang Xian Wang Yijun Zhang 《Computers, Materials & Continua》 2026年第1期1742-1760,共19页
Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step... Over the past decade,large-scale pre-trained autoregressive and diffusion models rejuvenated the field of text-guided image generation.However,these models require enormous datasets and parameters,and their multi-step generation processes are often inefficient and difficult to control.To address these challenges,we propose CAFE-GAN,a CLIP-Projected GAN with Attention-Aware Generation and Multi-Scale Discrimination,which incorporates a pretrained CLIP model along with several key architectural innovations.First,we embed a coordinate attention mechanism into the generator to capture long-range dependencies and enhance feature representation.Second,we introduce a trainable linear projection layer after the CLIP text encoder,which aligns textual embeddings with the generator’s semantic space.Third,we design a multi-scale discriminator that leverages pre-trained visual features and integrates a feature regularization strategy,thereby improving training stability and discrimination performance.Experiments on the CUB and COCO datasets demonstrate that CAFE-GAN outperforms existing text-to-image generation methods,achieving lower Fréchet Inception Distance(FID)scores and generating images with superior visual quality and semantic fidelity,with FID scores of 9.84 and 5.62 on the CUB and COCO datasets,respectively,surpassing current state-of-the-art text-to-image models by varying degrees.These findings offer valuable insights for future research on efficient,controllable text-to-image synthesis. 展开更多
关键词 large vision language models deep learning computer vision text-to-image generation
在线阅读 下载PDF
Large models in medical imaging:Advances and prospects 被引量:4
3
作者 Mengjie Fang Zipei Wang +8 位作者 Sitian Pan Xin Feng Yunpeng Zhao Dongzhi Hou Ling Wu Xuebin Xie Xu-Yao Zhang Jie Tian Di Dong 《Chinese Medical Journal》 2025年第14期1647-1664,共18页
Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprec... Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice. 展开更多
关键词 Artificial intelligence large language model large vision model Multimodal data SEGMENTATION DIAGNOSIS Interactive system
原文传递
MicroFlowSAM:A motion-prompted instance segmentation approach in microfluidics with zero annotation and training
4
作者 Wenle Xu Lin Sheng +2 位作者 Tong Qiu Kai Wang Guangsheng Luo 《Chinese Journal of Chemical Engineering》 2025年第11期103-114,共12页
Microdispersion technology is crucial for a variety of applications in both the chemical and biomedical fields.The precise and rapid characterization of microdroplets and microbubbles is essential for research as well... Microdispersion technology is crucial for a variety of applications in both the chemical and biomedical fields.The precise and rapid characterization of microdroplets and microbubbles is essential for research as well as for optimizing and controlling industrial processes.Traditional methods often rely on time-consuming manual analysis.Although some deep learning-based computer vision methods have been proposed for automated identification and characterization,these approaches often rely on supervised learning,which requires labeled data for model training.This dependency on labeled data can be time-consuming and expensive,especially when working with large and complex datasets.To address these challenges,we propose Micro Flow SAM,an innovative,motion-prompted,annotation-free,and training-free instance segmentation approach.By utilizing motion of microdroplets and microbubbles as prompts,our method directs large-scale vision models to perform accurate instance segmentation without the need for annotated data or model training.This approach eliminates the need for human intervention in data labeling and reduces computational costs,significantly streamlining the data analysis process.We demonstrate the effectiveness of Micro Flow SAM across 12 diverse datasets,achieving outstanding segmentation results that are competitive with traditional methods.This novel approach not only accelerates the analysis process but also establishes a foundation for efficient process control and optimization in microfluidic applications.Micro Flow SAM represents a breakthrough in reducing the complexities and resource demands of instance segmentation,enabling faster insights and advancements in the microdispersion field. 展开更多
关键词 MICROFLUIDICS Microdispersion Instance segmentation large vision model Prompt engineering
在线阅读 下载PDF
A versatile framework for analyzing galaxy image data by incorporating Human-in-the-loop in a large vision model
5
作者 Ming-Xiang Fu Yu Song +14 位作者 Jia-Meng Lv Liang Cao Peng Jia Nan Li Xiang-Ru Li Ji-Feng Liu A-Li Luo Bo Qiu Shi-Yin Shen Liang-Ping Tu Li-Li Wang Shou-Lin Wei Hai-Feng Yang Zhen-Ping Yi Zhi-Qiang Zou 《Chinese Physics C》 SCIE CAS CSCD 2024年第9期176-187,共12页
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.I... The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe.However,effectively analyzing this vast amount of data poses a significant challenge.In response,astronomers are turning to deep learning techniques,but these methods are limited by their specific training sets,leading to considerable duplicate workloads.To overcome this issue,we built a framework for the general analysis of galaxy images based on a large vision model(LVM)plus downstream tasks(DST),including galaxy morphological classification,image restoration object detection,parameter extraction,and more.Considering the low signal-to-noise ratios of galaxy images and the imbalanced distribution of galaxy categories,we designed our LVM to incorporate a Human-in-the-loop(HITL)module,which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively.The proposed framework exhibits notable fewshot learning capabilities and versatile adaptability for all the abovementioned tasks on galaxy images in the DESI Legacy Imaging Surveys.In particular,for the object detection task,which was trained using 1000 data points,our DST in the LVM achieved an accuracy of 96.7%,while ResNet50 plus Mask R-CNN reached an accuracy of 93.1%.For morphological classification,to obtain an area under the curve(AUC)of~0.9,LVM plus DST and HITL only requested 1/50 of the training sets that ResNet18 requested.In addition,multimodal data can be integrated,which creates possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-messenger astronomy. 展开更多
关键词 artificial intelligence large vision model human-in-the-loop ASTRONOMY galaxies
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部