Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding ...Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.展开更多
Existing sandstone rock structure evaluation methods rely on visual inspection,with low efficiency,semi-quantitative analysis of roundness,and inability to perform classified statistics in particle size analysis.This ...Existing sandstone rock structure evaluation methods rely on visual inspection,with low efficiency,semi-quantitative analysis of roundness,and inability to perform classified statistics in particle size analysis.This study presents an intelligent evaluation method for sandstone rock structure based on the Segment Anything Model(SAM).By developing a lightweight SAM fine-tuning method with rank-decomposition matrix adapters,a multispectral rock particle segmentation model named CoreSAM is constructed,which achieves rock particle edge extraction and type identification.Building upon this,we propose a comprehensive quantitative evaluation system for rock structure,assessing parameters including particle size,sorting,roundness,particle contact and cementation types.The experimental results demonstrate that CoreSAM outperforms existing methods in rock particle segmentation accuracy while showing excellent generalization across different image types such as CT scans and core photographs.The proposed method enables full-sample,classified particle size analysis and quantitative characterization of parameters like roundness,advancing reservoir evaluation towards more precise,quantitative,intuitive,and comprehensive development.展开更多
The use of AI technologies in remote sensing(RS)tasks has been the focus of many individuals in both the professional and academic domains.Having more accessible interfaces and tools that allow people of little or no ...The use of AI technologies in remote sensing(RS)tasks has been the focus of many individuals in both the professional and academic domains.Having more accessible interfaces and tools that allow people of little or no experience to intuitively interact with RS data of multiple formats is a potential provided by this integration.However,the use of AI and AI agents to help automate RS-related tasks is still in its infancy stage,with some frameworks and interfaces built on top of well-known vision language models(VLM)such as GPT-4,segment anything model(SAM),and grounding DINO.These tools do promise and draw guidelines on the potentials and limitations of existing solutions concerning the use of said models.In this work,the state of the art AI foundation models(FM)are reviewed and used in a multi-modal manner to ingest RS imagery input and perform zero-shot object detection using natural language.The natural language input is then used to define the classes or labels the model should look for,then,both inputs are fed to the pipeline.The pipeline presented in this work makes up for the shortcomings of the general knowledge FMs by stacking pre-processing and post-processing applications on top of the FMs;these applications include tiling to produce uniform patches of the original image for faster detection,outlier rejection of redundant bounding boxes using statistical and machine learning methods.The pipeline was tested with UAV,aerial and satellite images taken over multiple areas.The accuracy for the semantic segmentation showed improvement from the original 64%to approximately 80%-99%by utilizing the pipeline and techniques proposed in this work.GitHub Repository:MohanadDiab/LangRS.展开更多
Large-scale unsupervised semantic segmentation(LUSS)is a sophisticated process that aims to segment similar areas within an image without relying on labeled training data.While existing methodologies have made substan...Large-scale unsupervised semantic segmentation(LUSS)is a sophisticated process that aims to segment similar areas within an image without relying on labeled training data.While existing methodologies have made substantial progress in this area,there is ample scope for enhancement.We thus introduce the PASS-SAM model,a comprehensive solution that amalgamates the benefits of various models to improve segmentation performance.展开更多
Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in ord...Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.展开更多
“精灵圈”是海岸带盐沼植被生态系统中的一种“空间自组织”结构,对盐沼湿地的生产力、稳定性和恢复力有重要影响。无人机影像是实现“精灵圈”空间位置高精度识别及解译其时空演化趋势与规律的重要数据源,但“精灵圈”像素与背景像素...“精灵圈”是海岸带盐沼植被生态系统中的一种“空间自组织”结构,对盐沼湿地的生产力、稳定性和恢复力有重要影响。无人机影像是实现“精灵圈”空间位置高精度识别及解译其时空演化趋势与规律的重要数据源,但“精灵圈”像素与背景像素在色彩信息和外形特征上差异较小,如何从二维影像中智能精准地识别“精灵圈”像素并对识别的单个像素形成个体“精灵圈”是目前的技术难点。本文提出了一种结合分割万物模型(Segment Anything Model,SAM)视觉分割模型与随机森林机器学习的无人机影像“精灵圈”分割及分类方法,实现了单个“精灵圈”的识别和提取。首先,通过构建索伦森-骰子系数(S?rensen-Dice coefficient,Dice)和交并比(Intersection over Union,IOU)评价指标,从SAM中筛选预训练模型并对其参数进行优化,实现全自动影像分割,得到无属性信息的分割掩码/分割类;然后,利用红、绿、蓝(RGB)三通道信息及空间二维坐标将分割掩码与原图像进行信息匹配,构造分割掩码的特征指标,并根据袋外数据(Out of Bag,OOB)误差减小及特征分布规律对特征进行分析和筛选;最后,利用筛选的特征对随机森林模型进行训练,实现“精灵圈”植被、普通植被和光滩的自动识别与分类。实验结果表明:本文方法“精灵圈”平均正确提取率96.1%,平均错误提取率为9.5%,为精准刻画“精灵圈”时空格局及海岸带无人机遥感图像处理提供了方法和技术支撑。展开更多
X-ray Computed Tomography(XCT)enables non-destructive acquisition of the internal structure of materials,and image segmentation plays a crucial role in analyzing material XCT images.This paper proposes an image segmen...X-ray Computed Tomography(XCT)enables non-destructive acquisition of the internal structure of materials,and image segmentation plays a crucial role in analyzing material XCT images.This paper proposes an image segmentation method based on the Segment Anything model(SAM).We constructed a dataset of carbide in nickel-based single crystal superalloys XCT images and preprocessed the images using median filtering,histogram equalization,and gamma correction.Subsequently,SAM was fine-tuned to adapt to the task of material XCT image segmentation,resulting in Material-SAM.We compared the performance of threshold segmentation,SAM,U-Net model,and Material-SAM.Our method achieved 88.45%Class Pixel Accuracy(CPA)and 88.77%Dice Similarity Coefficient(DSC)on the test set,outperforming SAM by 5.25%and 8.81%,respectively,and achieving the highest evaluation.Material-SAM demonstrated lower input requirements compared to SAM,as it only required three reference points for completing the segmentation task,which is one-fifth of the requirement of SAM.Material-SAM exhibited promising results,highlighting its potential as a novel method for material XCT image segmentation.展开更多
Water leakage inspection in the tunnels is a critical engineering job that has attracted increasing concerns.Leakage area detection via manual inspection techniques is time-consuming and might produce unreliablefindin...Water leakage inspection in the tunnels is a critical engineering job that has attracted increasing concerns.Leakage area detection via manual inspection techniques is time-consuming and might produce unreliablefindings, so that automated techniques should be created to increase reliability and efficiency. Pre-trainedfoundational segmentation models for large datasets have attracted great interests recently. This paper proposes a novel SAM-based network for accurate automated water leakage inspection. The contributions of thispaper include the efficient adaptation of the SAM (Segment Anything Model) for shield tunnel water leakagesegmentation and the demonstration of the application effect by data experiments. Tunnel SAM Adapter hassatisfactory performance, achieving 76.2 % mIoU and 77.5 % Dice. Experimental results demonstrate that ourapproach has advantages over peer studies and guarantees the integrity and safety of these vital assets whilestreamlining tunnel maintenance.展开更多
Recently,Meta AI Research approaches a general,promptable segment anything model(SAM)pre-trained on an unprecedentedly large segmentation dataset(SA-1B).Without a doubt,the emergence of SAM will yield significant bene...Recently,Meta AI Research approaches a general,promptable segment anything model(SAM)pre-trained on an unprecedentedly large segmentation dataset(SA-1B).Without a doubt,the emergence of SAM will yield significant benefits for a wide array of practical image segmentation applications.In this study,we conduct a series of intriguing investigations into the performance of SAM across various applications,particularly in the fields of natural images,agriculture,manufacturing,remote sensing and healthcare.We analyze and discuss the benefits and limitations of SAM,while also presenting an outlook on its future development in segmentation tasks.By doing so,we aim to give a comprehensive understanding of SAM's practical applications.This work is expected to provide insights that facilitate future research activities toward generic segmentation.Source code is publicly available at https://github.com/LiuTingWed/SAM-Not-Perfect.展开更多
This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model(TV-SAM)without any manual annotations.The TV-SAM incorporates and integrates th...This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model(TV-SAM)without any manual annotations.The TV-SAM incorporates and integrates the large language model GPT-4,the vision language model GLIP,and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images,thereby enhancing the SAM’s capability for zero-shot segmentation.Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training.TV-SAM significantly outperforms SAM AUTO(p<0.01)and GSAM(p<0.05),closely matching the performance of SAM BBOX with gold standard bounding box prompts(p=0.07),and surpasses the state-of-the-art methods on specific datasets such as ISIC(0.853 versus 0.802)and WBC(0.968 versus 0.883).The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm,highlighting the significant contribution of GPT-4 to zero-shot segmentation.By integrating foundational models such as GPT-4,GLIP,and SAM,the ability to address complex problems in specialized domains can be enhanced.展开更多
基金supported by Natural Science Foundation Programme of Gansu Province(No.24JRRA231)National Natural Science Foundation of China(No.62061023)Gansu Provincial Science and Technology Plan Key Research and Development Program Project(No.24YFFA024).
文摘Despite its remarkable performance on natural images,the segment anything model(SAM)lacks domain-specific information in medical imaging.and faces the challenge of losing local multi-scale information in the encoding phase.This paper presents a medical image segmentation model based on SAM with a local multi-scale feature encoder(LMSFE-SAM)to address the issues above.Firstly,based on the SAM,a local multi-scale feature encoder is introduced to improve the representation of features within local receptive field,thereby supplying the Vision Transformer(ViT)branch in SAM with enriched local multi-scale contextual information.At the same time,a multiaxial Hadamard product module(MHPM)is incorporated into the local multi-scale feature encoder in a lightweight manner to reduce the quadratic complexity and noise interference.Subsequently,a cross-branch balancing adapter is designed to balance the local and global information between the local multi-scale feature encoder and the ViT encoder in SAM.Finally,to obtain smaller input image size and to mitigate overlapping in patch embeddings,the size of the input image is reduced from 1024×1024 pixels to 256×256 pixels,and a multidimensional information adaptation component is developed,which includes feature adapters,position adapters,and channel-spatial adapters.This component effectively integrates the information from small-sized medical images into SAM,enhancing its suitability for clinical deployment.The proposed model demonstrates an average enhancement ranging from 0.0387 to 0.3191 across six objective evaluation metrics on BUSI,DDTI,and TN3K datasets compared to eight other representative image segmentation models.This significantly enhances the performance of the SAM on medical images,providing clinicians with a powerful tool in clinical diagnosis.
基金Supported by the National Natural Science Foundation of China(42372175,72088101)PetroChina Science and Technology Project of(2023DJ84)Basic Research Cooperation Project between China National Petroleum Corporation and Peking University.
文摘Existing sandstone rock structure evaluation methods rely on visual inspection,with low efficiency,semi-quantitative analysis of roundness,and inability to perform classified statistics in particle size analysis.This study presents an intelligent evaluation method for sandstone rock structure based on the Segment Anything Model(SAM).By developing a lightweight SAM fine-tuning method with rank-decomposition matrix adapters,a multispectral rock particle segmentation model named CoreSAM is constructed,which achieves rock particle edge extraction and type identification.Building upon this,we propose a comprehensive quantitative evaluation system for rock structure,assessing parameters including particle size,sorting,roundness,particle contact and cementation types.The experimental results demonstrate that CoreSAM outperforms existing methods in rock particle segmentation accuracy while showing excellent generalization across different image types such as CT scans and core photographs.The proposed method enables full-sample,classified particle size analysis and quantitative characterization of parameters like roundness,advancing reservoir evaluation towards more precise,quantitative,intuitive,and comprehensive development.
文摘The use of AI technologies in remote sensing(RS)tasks has been the focus of many individuals in both the professional and academic domains.Having more accessible interfaces and tools that allow people of little or no experience to intuitively interact with RS data of multiple formats is a potential provided by this integration.However,the use of AI and AI agents to help automate RS-related tasks is still in its infancy stage,with some frameworks and interfaces built on top of well-known vision language models(VLM)such as GPT-4,segment anything model(SAM),and grounding DINO.These tools do promise and draw guidelines on the potentials and limitations of existing solutions concerning the use of said models.In this work,the state of the art AI foundation models(FM)are reviewed and used in a multi-modal manner to ingest RS imagery input and perform zero-shot object detection using natural language.The natural language input is then used to define the classes or labels the model should look for,then,both inputs are fed to the pipeline.The pipeline presented in this work makes up for the shortcomings of the general knowledge FMs by stacking pre-processing and post-processing applications on top of the FMs;these applications include tiling to produce uniform patches of the original image for faster detection,outlier rejection of redundant bounding boxes using statistical and machine learning methods.The pipeline was tested with UAV,aerial and satellite images taken over multiple areas.The accuracy for the semantic segmentation showed improvement from the original 64%to approximately 80%-99%by utilizing the pipeline and techniques proposed in this work.GitHub Repository:MohanadDiab/LangRS.
文摘Large-scale unsupervised semantic segmentation(LUSS)is a sophisticated process that aims to segment similar areas within an image without relying on labeled training data.While existing methodologies have made substantial progress in this area,there is ample scope for enhancement.We thus introduce the PASS-SAM model,a comprehensive solution that amalgamates the benefits of various models to improve segmentation performance.
基金Natural Science Foundation of Zhejiang Province,Grant/Award Number:LY23F020025Science and Technology Commissioner Program of Huzhou,Grant/Award Number:2023GZ42Sichuan Provincial Science and Technology Support Program,Grant/Award Numbers:2023ZHCG0005,2023ZHCG0008。
文摘Data augmentation plays an important role in training deep neural model by expanding the size and diversity of the dataset.Initially,data augmentation mainly involved some simple transformations of images.Later,in order to increase the diversity and complexity of data,more advanced methods appeared and evolved to sophisticated generative models.However,these methods required a mass of computation of training or searching.In this paper,a novel training-free method that utilises the Pre-Trained Segment Anything Model(SAM)model as a data augmentation tool(PTSAM-DA)is proposed to generate the augmented annotations for images.Without the need for training,it obtains prompt boxes from the original annotations and then feeds the boxes to the pre-trained SAM to generate diverse and improved annotations.In this way,annotations are augmented more ingenious than simple manipulations without incurring huge computation for training a data augmentation model.Multiple comparative experiments on three datasets are conducted,including an in-house dataset,ADE20K and COCO2017.On this in-house dataset,namely Agricultural Plot Segmentation Dataset,maximum improvements of 3.77%and 8.92%are gained in two mainstream metrics,mIoU and mAcc,respectively.Consequently,large vision models like SAM are proven to be promising not only in image segmentation but also in data augmentation.
文摘“精灵圈”是海岸带盐沼植被生态系统中的一种“空间自组织”结构,对盐沼湿地的生产力、稳定性和恢复力有重要影响。无人机影像是实现“精灵圈”空间位置高精度识别及解译其时空演化趋势与规律的重要数据源,但“精灵圈”像素与背景像素在色彩信息和外形特征上差异较小,如何从二维影像中智能精准地识别“精灵圈”像素并对识别的单个像素形成个体“精灵圈”是目前的技术难点。本文提出了一种结合分割万物模型(Segment Anything Model,SAM)视觉分割模型与随机森林机器学习的无人机影像“精灵圈”分割及分类方法,实现了单个“精灵圈”的识别和提取。首先,通过构建索伦森-骰子系数(S?rensen-Dice coefficient,Dice)和交并比(Intersection over Union,IOU)评价指标,从SAM中筛选预训练模型并对其参数进行优化,实现全自动影像分割,得到无属性信息的分割掩码/分割类;然后,利用红、绿、蓝(RGB)三通道信息及空间二维坐标将分割掩码与原图像进行信息匹配,构造分割掩码的特征指标,并根据袋外数据(Out of Bag,OOB)误差减小及特征分布规律对特征进行分析和筛选;最后,利用筛选的特征对随机森林模型进行训练,实现“精灵圈”植被、普通植被和光滩的自动识别与分类。实验结果表明:本文方法“精灵圈”平均正确提取率96.1%,平均错误提取率为9.5%,为精准刻画“精灵圈”时空格局及海岸带无人机遥感图像处理提供了方法和技术支撑。
基金This work was supported by the National Natural Science Foundation of China(Grant Number 52073030)National Natural Science Foundation of China-Guangxi Joint Fund(U20A20276).
文摘X-ray Computed Tomography(XCT)enables non-destructive acquisition of the internal structure of materials,and image segmentation plays a crucial role in analyzing material XCT images.This paper proposes an image segmentation method based on the Segment Anything model(SAM).We constructed a dataset of carbide in nickel-based single crystal superalloys XCT images and preprocessed the images using median filtering,histogram equalization,and gamma correction.Subsequently,SAM was fine-tuned to adapt to the task of material XCT image segmentation,resulting in Material-SAM.We compared the performance of threshold segmentation,SAM,U-Net model,and Material-SAM.Our method achieved 88.45%Class Pixel Accuracy(CPA)and 88.77%Dice Similarity Coefficient(DSC)on the test set,outperforming SAM by 5.25%and 8.81%,respectively,and achieving the highest evaluation.Material-SAM demonstrated lower input requirements compared to SAM,as it only required three reference points for completing the segmentation task,which is one-fifth of the requirement of SAM.Material-SAM exhibited promising results,highlighting its potential as a novel method for material XCT image segmentation.
基金funded by the National Natural Science Foundation of China(Nos.62171114,52222810)the Fundamental Research Funds for the Central Universities(No.DUT22RC(3)099).
文摘Water leakage inspection in the tunnels is a critical engineering job that has attracted increasing concerns.Leakage area detection via manual inspection techniques is time-consuming and might produce unreliablefindings, so that automated techniques should be created to increase reliability and efficiency. Pre-trainedfoundational segmentation models for large datasets have attracted great interests recently. This paper proposes a novel SAM-based network for accurate automated water leakage inspection. The contributions of thispaper include the efficient adaptation of the SAM (Segment Anything Model) for shield tunnel water leakagesegmentation and the demonstration of the application effect by data experiments. Tunnel SAM Adapter hassatisfactory performance, achieving 76.2 % mIoU and 77.5 % Dice. Experimental results demonstrate that ourapproach has advantages over peer studies and guarantees the integrity and safety of these vital assets whilestreamlining tunnel maintenance.
基金supported by the Mitacs,CFI-JELF and NSERC Discovery grants.
文摘Recently,Meta AI Research approaches a general,promptable segment anything model(SAM)pre-trained on an unprecedentedly large segmentation dataset(SA-1B).Without a doubt,the emergence of SAM will yield significant benefits for a wide array of practical image segmentation applications.In this study,we conduct a series of intriguing investigations into the performance of SAM across various applications,particularly in the fields of natural images,agriculture,manufacturing,remote sensing and healthcare.We analyze and discuss the benefits and limitations of SAM,while also presenting an outlook on its future development in segmentation tasks.By doing so,we aim to give a comprehensive understanding of SAM's practical applications.This work is expected to provide insights that facilitate future research activities toward generic segmentation.Source code is publicly available at https://github.com/LiuTingWed/SAM-Not-Perfect.
基金supported by the National Science and Technology Major Project(No.2021YFF1201200)Chinese National Science Foundation(No.62372316)+2 种基金Sichuan Science and Technology Program(Nos.2022YFS0048,2023YFG0126,and 2024YFHZ0091)1·3·5 Project for Disciplines of Excellence,West China Hospital,Sichuan University(No.ZYYC21004)Chongqing Technology Innovation and Application Development Project(No.CSTB2022TIAD-KPX0067).
文摘This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model(TV-SAM)without any manual annotations.The TV-SAM incorporates and integrates the large language model GPT-4,the vision language model GLIP,and the SAM to autonomously generate descriptive text prompts and visual bounding box prompts from medical images,thereby enhancing the SAM’s capability for zero-shot segmentation.Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training.TV-SAM significantly outperforms SAM AUTO(p<0.01)and GSAM(p<0.05),closely matching the performance of SAM BBOX with gold standard bounding box prompts(p=0.07),and surpasses the state-of-the-art methods on specific datasets such as ISIC(0.853 versus 0.802)and WBC(0.968 versus 0.883).The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm,highlighting the significant contribution of GPT-4 to zero-shot segmentation.By integrating foundational models such as GPT-4,GLIP,and SAM,the ability to address complex problems in specialized domains can be enhanced.