Due to the high cost of data collection and limited experimental conditions,sonar images are often scarce and of poor quality,which hinders effective feature learning and limits the performance of existing detection m...Due to the high cost of data collection and limited experimental conditions,sonar images are often scarce and of poor quality,which hinders effective feature learning and limits the performance of existing detection methods.To address this,we propose an improved YOLO model,i.e.Swin transformer-cascaded group attention YOLO(STCYOLO),for sonar image target detection,which integrates diffusion-based sample generation with a Swin transformer and cascaded group attention(CGA)mechanism.First,we fine-tune stable diffusion via LoRA and incorporate semantic features from the bootstrapping language-image pre-training text model to generate high-quality and diverse sonar images for dataset expansion.Then,we introduce Swin transformer into the YOLOv8 backbone to enhance multi-scale feature extraction for small targets,while integrating the CGA mechanism into the C2f module to improve small object perception.Additionally,the skewed intersection-over-union(SIoU)loss function is utilized to better adapt to the complexities of underwater environments.Experimental results indicate that the trained generative model is capable of producing diverse and realistic samples even in data-scarce scenarios.Compared to the original YOLOv8 model,the enhanced STC-YOLO model exhibits a 5%increase in detection accuracy and a 12.6%improvement in mean average precision,achieving high-precision detection of small underwater targets.展开更多
基金supported by the National Natural Science Foundation of China(U2441254,62571179).
文摘Due to the high cost of data collection and limited experimental conditions,sonar images are often scarce and of poor quality,which hinders effective feature learning and limits the performance of existing detection methods.To address this,we propose an improved YOLO model,i.e.Swin transformer-cascaded group attention YOLO(STCYOLO),for sonar image target detection,which integrates diffusion-based sample generation with a Swin transformer and cascaded group attention(CGA)mechanism.First,we fine-tune stable diffusion via LoRA and incorporate semantic features from the bootstrapping language-image pre-training text model to generate high-quality and diverse sonar images for dataset expansion.Then,we introduce Swin transformer into the YOLOv8 backbone to enhance multi-scale feature extraction for small targets,while integrating the CGA mechanism into the C2f module to improve small object perception.Additionally,the skewed intersection-over-union(SIoU)loss function is utilized to better adapt to the complexities of underwater environments.Experimental results indicate that the trained generative model is capable of producing diverse and realistic samples even in data-scarce scenarios.Compared to the original YOLOv8 model,the enhanced STC-YOLO model exhibits a 5%increase in detection accuracy and a 12.6%improvement in mean average precision,achieving high-precision detection of small underwater targets.