In practical orchards,the challenges posed by fruit overlapping,branch and leaf occlusion,significantly impede the successful implementation of automated picking,particularly for bagging pears.To address this issue,th...In practical orchards,the challenges posed by fruit overlapping,branch and leaf occlusion,significantly impede the successful implementation of automated picking,particularly for bagging pears.To address this issue,this paper introduces the multi-scale cross-modal feature fusion and cost-sensitive classification loss function network(MCCNet),specifically designed to accurately detect bagging pears with various occlusion categories.The network designs a dual-stream convolutional neural network as its backbone,enabling the parallel extraction of multi-modal features.Meanwhile,we propose a novel lightweight cross-modal feature fusion method,inspired by enhancing shared features between modalities while extracting specific features from RGB and depth modalities.The cross-modal method enhances the perceptual capabilities of the model by facilitating the fusion of complementary information from multimodal bagging pear image pairs.Furthermore,we optimize the classification loss function by transforming it into a cost-sensitive loss function,aiming to improve detection classification efficiency and reduce instances of missing and false detections during the picking process.Experimental results on a bagging pear dataset demonstrate that our MCCNet achieves mAP0.5 and mAP0.5:0.95 values of 97.3%and 80.3%,respectively,representing improvements of 3.6%and 6.3%over the classical YOLOv10m model.When benchmarked against several state-of-the-art detection models,our MCCNet network has only 19.5 million parameters while maintaining superior inference speed.展开更多
基金sponsored by the National Natural Science Foundation of China(No.32371993)the Natural Science Research Key Project of Anhui Provincial University(No.2022AH040125,2023AH040135,2024AH050471&2024AH050462)the Key Research and Development Plan of Anhui Province(No.202204c06020022&2023n06020057).
文摘In practical orchards,the challenges posed by fruit overlapping,branch and leaf occlusion,significantly impede the successful implementation of automated picking,particularly for bagging pears.To address this issue,this paper introduces the multi-scale cross-modal feature fusion and cost-sensitive classification loss function network(MCCNet),specifically designed to accurately detect bagging pears with various occlusion categories.The network designs a dual-stream convolutional neural network as its backbone,enabling the parallel extraction of multi-modal features.Meanwhile,we propose a novel lightweight cross-modal feature fusion method,inspired by enhancing shared features between modalities while extracting specific features from RGB and depth modalities.The cross-modal method enhances the perceptual capabilities of the model by facilitating the fusion of complementary information from multimodal bagging pear image pairs.Furthermore,we optimize the classification loss function by transforming it into a cost-sensitive loss function,aiming to improve detection classification efficiency and reduce instances of missing and false detections during the picking process.Experimental results on a bagging pear dataset demonstrate that our MCCNet achieves mAP0.5 and mAP0.5:0.95 values of 97.3%and 80.3%,respectively,representing improvements of 3.6%and 6.3%over the classical YOLOv10m model.When benchmarked against several state-of-the-art detection models,our MCCNet network has only 19.5 million parameters while maintaining superior inference speed.