In the field of automated fruit harvesting,precise and efficient fruit target recognition and localization play a pivotal role in enhancing the efficiency of harvesting robots.However,this domain faces two core challe...In the field of automated fruit harvesting,precise and efficient fruit target recognition and localization play a pivotal role in enhancing the efficiency of harvesting robots.However,this domain faces two core challenges:firstly,the dynamic nature of the automatic picking process requires fruit target detection algorithms to adapt to multi-view characteristics,ensuring effective recognition of the same fruit from different perspectives.Secondly,fruits in natural environments often suffer from interference factors such as overlapping,occlusion,and illumination fluctuations,which increase the difficulty of image capture and recognition.To address these challenges,this study conducted an in-depth analysis of the key features in fruit recognition and discovered that the stem,body,and base serve as constant and core information in fruit identification,exhibiting long-term dependent semantic relationships during the recognition process.These invariant features provide a stable foundation for dynamic fruit recognition,contributing to improved recognition accuracy and robustness.Specifically,the morphology and position of the stem,body,and base are relatively fixed,and the effective extraction of these features plays a crucial role in fruit recognition.This paper proposes a novel model,TransSSA,and designs two innovative modules to effectively extract fruit image features.The Self-Attention Core Feature Extraction(SAF)module integrates YOLOV8 and Swin Transformer as backbone networks and introduces the Shuffle Attention self-attention mechanism,significantly enhancing the ability to extract core features.This module focuses on constant features such as the stem,body,and base,ensuring accurate fruit recognition in different environments.On the other hand,the Squeeze and Excitation Aggregation(SAE)module combines the network’s ability to capture channel patterns with global knowledge,further optimizing the extraction of effective features.Additionally,to improve detection accuracy,this studymodifies the regression loss function to EIOU.To validate the effectiveness of the TransSSA model,this study conducted extensive visualization analysis to support the interpretability of the SAF and SAE modules.Experimental results demonstrate that TransSSA achieves a performance of 91.3%on a tomato dataset,fully proving its innovative capabilities.Through this research,we provide amore effective solution for using fruit harvesting robots in complex environments.展开更多
基金supported in part by the Basic Research Project of Science and Technology Department of Jilin Province,China(Grant No.202002044JC).
文摘In the field of automated fruit harvesting,precise and efficient fruit target recognition and localization play a pivotal role in enhancing the efficiency of harvesting robots.However,this domain faces two core challenges:firstly,the dynamic nature of the automatic picking process requires fruit target detection algorithms to adapt to multi-view characteristics,ensuring effective recognition of the same fruit from different perspectives.Secondly,fruits in natural environments often suffer from interference factors such as overlapping,occlusion,and illumination fluctuations,which increase the difficulty of image capture and recognition.To address these challenges,this study conducted an in-depth analysis of the key features in fruit recognition and discovered that the stem,body,and base serve as constant and core information in fruit identification,exhibiting long-term dependent semantic relationships during the recognition process.These invariant features provide a stable foundation for dynamic fruit recognition,contributing to improved recognition accuracy and robustness.Specifically,the morphology and position of the stem,body,and base are relatively fixed,and the effective extraction of these features plays a crucial role in fruit recognition.This paper proposes a novel model,TransSSA,and designs two innovative modules to effectively extract fruit image features.The Self-Attention Core Feature Extraction(SAF)module integrates YOLOV8 and Swin Transformer as backbone networks and introduces the Shuffle Attention self-attention mechanism,significantly enhancing the ability to extract core features.This module focuses on constant features such as the stem,body,and base,ensuring accurate fruit recognition in different environments.On the other hand,the Squeeze and Excitation Aggregation(SAE)module combines the network’s ability to capture channel patterns with global knowledge,further optimizing the extraction of effective features.Additionally,to improve detection accuracy,this studymodifies the regression loss function to EIOU.To validate the effectiveness of the TransSSA model,this study conducted extensive visualization analysis to support the interpretability of the SAF and SAE modules.Experimental results demonstrate that TransSSA achieves a performance of 91.3%on a tomato dataset,fully proving its innovative capabilities.Through this research,we provide amore effective solution for using fruit harvesting robots in complex environments.